Practice Problem: Cleaning House Listings 2

Screen Link:

My Code:

def clean_num_rooms_col():
    bad_chars = ["__","(",")"]    
    with open('listings.csv') as f:
        reader = csv.reader(f)
        rows = list(reader)
        listings = rows[1:]
        for listing in listings:
            num_rooms = listing[2]
            for char in bad_chars:
                num_rooms = num_rooms.replace(char,"")
            listing[2] = num_rooms
        write_csv(rows)
        
clean_num_rooms_col()
with open('listings_clean.csv') as f:
    reader = csv.reader(f)
    rows = list(reader)
print(rows)

What I expected to happen: Above is my original solution to the problem which threw up an error message (that the output of ‘listings_clean.csv’) wasn’t as expected although all the actual dataset that was returned looked correct and all of the incorrect characters had been removed from the column.

What actually happened: I’ve now solved the problem by referring to the answer provided which focuses on adding in the correct digits as opposed to removing the bad characters (as per my solution). Just to check my understanding of this, is my solution incorrect because I may potentially be omitting bad characters from the list I’ve provided? Is that why we need to iterate through the numbers themselves instead of leaning on the str.replace() method?

1 Like

Hi @colleen.mccaskell

For scenarios like these the question is what is easier to implement and what is more efficient. If you are for example 100% positive that the only character you need to remove is “_” then you can indeed just check for this particular character in the string. However, this becomes cumbersome fast if the list of “bad chars” gets longer and longer. Often it is easiest to think about which commonalities do the characters you want to keep have. Instead of writing

for char in row[2]:
            if char in '0123456789':
                clean += char

as in the original DQ solution, I would just check for digits:

for char in row[2]:
            if char.isdigit():
                clean += char

So your approach is not wrong, but it might just be not the most efficient way to handle things, if the problem you are trying to solve gets more complicated.

Side note: I am not sure about the purpose of your last chunk of code…

Best
htw

2 Likes

Thank you, that confirms my assumptions that my approach was less efficient because it relies on me inputting all of the potential bad characters rather than dealing with the digits directly.

The last chunk of code relates to a final, optional step to test the solution and print some of the rows from the .csv file to ensure that the ‘available’ column was correct (my code is a bit different from the DQ solution, though).

1 Like

@colleen.mccaskell

Glad if I could help.

Now I just realized that your opening the 'listings_clean.csv' and not the original input. All good. And good practice to use pythons with open syntax when handling file input :+1:

Best
htw