Dictionary and Frequency Tables Mission

Screen Link: https://app.dataquest.io/m/314/dictionaries-and-frequency-tables/13/filtering-for-the-intervals

    opened_file = open('AppleStore.csv')
    from csv import reader
    read_file = reader(opened_file)
    apps_data = list(read_file)

    rating_count_tot = []

    for row in apps_data[1:]:
        rating = float(row[5])
        rating_count_tot.append(rating)
        
    max_rating = max(rating_count_tot)
    min_rating = min(rating_count_tot)

    rating_buckets = {'0 - 100,000': 0, '100,000 - 1,000,000': 0, '1,000,000 +' :0}

    for rating in rating_count_tot:
        if rating <= 100000:
            rating_buckets['0 - 100,000'] += 1
        if 10000 < rating <= 1000000:
            rating_buckets['100,000 - 1,000,000'] += 1
        if rating > 1000000:
            rating_buckets['1,000,000 +'] += 1          

    values = rating_buckets.values()
    total = sum(values)

    print(len(rating_count_tot))
    print(total)

What I expected to happen:

For these two variables to return the same value:

print(len(rating_count_tot))
print(total)
What actually happened:

7197
7995

Essentially, I was testing to make sure that my list of ratings would be equal to the sum of the different buckets in my frequency breakdown. This way I would be sure that I did the code correctly and accounted for all the ratings.

Any idea why the number is coming out differently?

Thanks!

Hi @alvand.hajizadeh. It looks like you have an overlap here due to a missing zero:

    if rating <= 100000:                         # 100,000
        rating_buckets['0 - 100,000'] += 1
    if 10000 < rating <= 1000000:                # 10,000

Once you fix this you should get an accurate count. :slight_smile:

1 Like

ah I see. Thank you so much!