Cleaning and Preparing Data in Python Practice Problems - Cleaning House Listings 3

Screen Link:

My Code:

import csv

def write_csv(rows):
    f = open('listings_clean.csv', mode='w')
    writer = csv.writer(f)
    for row in rows:
        writer.writerow(row)
    f.close()

import random
def clean_id_col():
    f = open('listings.csv')
    read = csv.reader(f)
    listings = list(read)
    listings_wo_header = listings[1:]
    id_set = set()               # Create a set because set does not allow duplicates
    for row in listings_wo_header:
        random_num = random.sample(range(1000,9999), 1)
        
        # If 'id' is blank, we set it as a random number, then we will add it to the set. If it is not blank, we will just add the 'id' number to the set. 
        if row[0] == '':                    
            row[0] = str(random_num[0])
            id_set.add(str(row[0]))
            if row[0] in id_set:                                  # This step to make sure there is no duplicates
                row[0] = str(random_num[0])       # if there is, we will generate another random number
        elif row[0] != '':
            id_set.add(str(row[0]))
            
    write_csv(listings)
    
clean_id_col()
f = open('listings_clean.csv')
reader = csv.reader(f)
rows = list(reader)
for i in range(30):
    print(rows[i]) 

I use random number generator, although there is no prior lessons about it. I think it works better for this problem since I did not fully understand the answer given by the assignment, I come up with my own.

The answer from the code above turns out correct. But I would love to know people’s thoughts on this. Do you understand this better? What other code would you replace/add to it?

You can use random.randint(1000, 9999) instead of random.sample(range(1000,9999), 1), although you’ll need to change random_num[0] to random_num.

In. . .

        if row[0] == '':                    
            row[0] = str(random_num[0])
            id_set.add(str(row[0]))
            if row[0] in id_set:                                  # This step to make sure there is no duplicates
                row[0] = str(random_num[0])       # if there is, we will generate another random number
        elif row[0] != '':
            id_set.add(str(row[0]))

. . . the. . .

if row[0] in id_set:                                  # This step to make sure there is no duplicates
    row[0] = str(random_num[0])       # if there is, we will generate another random number

. . . part doesn’t do what you think it does. The second line doesn’t fetch a new random number, but rather uses the one that was generated just at the start of the for loop. Consequently, this will always execute if we get into the first if. You can remove these two lines and your solution will work just the same.

This was marked as correct because the randomness never interfered with the existing ids. If you run it enough times, it will fail.

Feel free to ask about what you don’t understand.

Thank you @Bruno for your input, I will try to think of some other ways to solve this problem.

1 Like