Cleaning and Preparing Data in Python Practice Problems - Slide 7

The function I wrote below should assign a 4 digit unique id to the rows missing ids. What it does is, it first calculates all 4 digit permutations of numbers between 0 and 9. Then creates a list consisting of current 4 digit ids in the dataset. Afterwards detects the rows without ids. After detecting the rows without ids, we loop through the permutations and when we find one that is not already in the dataset we modify that row and assign that permutation as the id. However, when I submit my answer, I see that some rows have duplicate ids. Such as " The column id has duplicated values: Example rows 1 and 10 with value 9876."

I can’t understand how this is happening. Can someone please help me out?

def clean_id_col():
    listings = list(csv.reader(open('listings.csv')))

    import itertools
    permutation = list(itertools.permutations([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 4))

    string_permutation = []

    for perm in permutation:
        number = ''
        for integer in perm:
            number = number + str(integer)
        string_permutation.append(number)

    unique_ids = []

    for row in listings:
        ids = row[0]
        if ids:
            unique_ids.append(str(ids))
        
    for row in listings[1:]: 
        ids = row[0]
        if not ids:
            for number in string_permutation:
                if number not in unique_ids:
                    row[0] = number
                
                
    write_csv(listings)

You loop through all numbers in string_permutation above. This is what happens -

  • ids is empty
    • First iteration of for loop
      • number not in unique_ids
      • row[0] set as number
    • Second iteration of for loop
      • number not in unique_ids
      • row[0] set as number
    • Third iteration of for loop
      • number not in unique_ids
      • row[0] set as number

And so on.

Your row[0] keeps on getting updated to number as long as number is not in unique_ids for all numbers in string_permutation for the same ids.

Do you notice the problems (yes, plural) here?

1 Like

Hey Doctor!

Thank you for your fast reply, it was very informative for me. Yes I noticed the problems :smile:

Modified the code as below so when a unique id is found I assign that id to that row and append that id to unique ids list as well. When this happens I stop the loop:

if number not in unique_ids:
    row[0] = number
    unique_ids.append(number)
    break