Guided Project App Profile Recommendation

Screen Link:
https://app.dataquest.io/c/112/m/350/guided-project%3A-profitable-app-profiles-for-the-app-store-and-google-play-markets/10/most-common-apps-by-genre-part-two

My Code:

def freq_table(dataset,index):
    c_percentage={}
    c_count={}
    
    for each_row in dataset:
        col=each_row[index]
        if col in c_count:
            c_count[col]+=1
        else:
            c_count[col]=1
            

    total=0
    for each_key in c_count:
        total+=c_count[each_key]
    print(total)
    for each_key in c_count:
        percentage=(c_count[each_key]/total)*100
        c_percentage[each_key]=percentage
    
    return c_percentage


Solution code:
def freq_table(dataset, index):
table = {}
total = 0

for row in dataset:
    total += 1
    value = row[index]
    if value in table:
        table[value] += 1
    else:
        table[value] = 1

table_percentages = {}
for key in table:
    percentage = (table[key] / total) * 100
    table_percentages[key] = percentage 

return table_percentages

What I expected to happen:

What actually happened:

Replace this line with the output/error

I am having difficulty understanding total calulation done for solution code. I feel it should have total of value in dictionary table instead.

The sum of the values you store in your c_count dictionary is equal to the total number of rows in your dataset.

That’s what the frequency table/dictionary does.

You have N rows and let’s say M categories. For each of the M categories, you note down how many times that category appears in your dataset. When you iterate over all rows, you should have the frequency for all M categories. And adding those values up should equal to N.

idx Category
1 A
2 A
3 B
4 A
5 C
6 B
7 A
8 C
9 C

The above has 9 rows. Your frequency table would be -

{A: 4, B: 2, C: 3}

You can see that the sum of the values of your frequency table is equal to the number of rows.

In your code, you are simply adding those frequency values.

In the solution code, they calculated the total number of rows instead.

Both are the same.

1 Like

Thank you! I understand it now.