Most popular apps by genre on the app store not sorting from highest to lowest average

My Code:

genres_ios = freq_table(ios_final, -5)

for genre in genres_ios:
    total = 0
    len_genre = 0
    for app in ios_final: 
        genre_app = app[-5]
        if genre_app == genre:
            n_ratings = float(app[5])
            total += n_ratings
            len_genre += 1
    
    average_n_ratings = total / len_genre
    print(genre, ':', average_n_ratings)

– I expected to get averages starting from the highest average sorted all the way down to the lowest averages. My averages are out of order and I can’t figure out how to get the ordered like the solutions has them.

–attached is a copy of my code and the output.

1 Like

Hi @kylemoorman1,

The freq_table function doesn’t have any code to sort the results. So it will give you an unsorted answer.

But if you have created another function, as per the instruction called display table it will sort the answers.

In order to create a sorting function, you can use a list to add these individual entries in the dictionary along with its key and value and sort the list.

If you can go ahead with these hints, please do. If you need more clarification, please let me know.

1 Like

Hi,

I had very little success with this advice. The display table is written to work with a percentage function. In fact, I’m a little confused at how my freq_table is populating the right numbers as it is.

genres_ios = freq_table(ios_final, -5)
for genre in genres_ios:
    total = 0
    len_genre = 0
    for app in ios_final: 
        genre_app = app[-5]
        if genre_app == genre:
            n_ratings = float(app[5])
            total += n_ratings
            len_genre += 1
    average_n_ratings = total / len_genre
    print(genre, ':', average_n_ratings)

I don’t know how I could writed a “display_table” function from the following code I’ve written. This code is not in a table form nor can I figure out how to arrange it as such.
This ^^ is the current code I’m using and what the solutions have. The ‘freq_table’ being referenced is the code below:

def freq_table(dataset, index):
    table = {}
    total = 0
    
    for row in dataset: 
        total += 1
        value = row[index]
        
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
    

    table_percentages = {}
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentages[key] = percentage
    
    return table_percentages

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])  

My thinking is that I need to rewrite my freq_table function to somehow accommodate for the average function and dictionary I need to populate. I’ve tried multiple different ways to rewrite my freq_table function so that this would work but I’ve exhausted every option I can think of. Could I please get help connecting the dots a bit more? I’m very stuck here

I’d also like to add, that up until this point my code has been identical to the solution; it’s just the OUTPUT which is not the same. How is that the solution’s output is sorted and mine is not? The freq_table which my code refers to has a sorting function within it so I’m really more confused than ever.

Also, here’s a link to my file so you can reference my work so far: file:///C:/Users/kmoorman/Downloads/Basics%20(1).html

Thanks,

Kyle

1 Like

You can sort items on a list. What this code does is put the items in a tuple and append inside a list. The items are sorted by the first element in the tuple. So if you want to sort by values instead of alphabetically, you change the position.

image

You can sort a dictionary by its values like this:
image

1 Like

Hi monorienaghogho,

My issue is that I can’t populate a dictionary or list which I can then sort in the way you’ve described:

genres_ios = freq_table(ios_final, -5)
for genre in genres_ios:
    total = 0
    len_genre = 0
    for app in ios_final: 
        genre_app = app[-5]
        if genre_app == genre:
            n_ratings = float(app[5])
            total += n_ratings
            len_genre += 1
    average_n_ratings = total / len_genre
    print(genre, ':', average_n_ratings)

I need help turning this code ^^ into either a dictionary or a list. I’m very stuck and I’ve spent hours trying to figure it out. Could you please look through the link I sent and propose code which would generate a list or dictionary?

genres_ios = freq_table(ios_final, -5)

a_dict = {}
for genre in genres_ios:
    total = 0
    len_genre = 0
    for app in ios_final:
        genre_app = app[-5]
        if genre_app == genre and genre_app in a_dict:            
            n_ratings = float(app[5])
            total += n_ratings
            len_genre += 1
    avg_n_ratings = total / len_genre
    a_dict[genre_app] = avg_n_ratings
    
    
print(a_dict)

Most recently, I tried to create a dictionary, but the output either doesn’t populate or give me an error. Could you please help with this specifically. I feel that my question is not being answered directly and so maybe I’m being unclear.

The problem:
While the data with the code from the solutions is all correct, the issue is that it’s not sorted. My thinking is to take the code and populate a dictionary or list from the above code. Once I do, I know how to write code to sort it.

If you could please help with code to generate a list or dictionary from the code I’ve written which already generates the correct data, this is really what I’m looking for help and clarity with.

Thanks so much,

Kyle

1 Like

Here we are finding the average rating. So we need to find sum of rating/total
In order to find the sum of rating you might need to update this n_ratings+= float(app[5])
and to find the average of rating n_rating/total

In order to create a dictionary, you are on the right path. You need to keep genre as the key and the updated average value as the value. But the problem with dictionary is that it is not easy to sort a dictionary.

So in this case, it is better to create a list and sort it using sorted(list)

In order to do that first we need to create a tuple containing the average rating and genre name.

a_tuple=(avg_rating, genre) will create this tuple.
Now we can append this tuple to an empty list

a_list.append(a_tuple)

Once the loop is over, you will have a list of average values against genre names.

Now you can sort it using sorted(a_list)

Hope this helps. Let me know if you need any other assistance.

1 Like

@kylemoorman1 kindly upload the notebook.

Please use the @ so that I notice you have replied or you reply to this message directly.

I wasn’t notified when you replied. I saw this message by chance.

1 Like

@jithins123

Here is the code I have based off of your suggestion:

genres_ios = freq_table(ios_final, -5)

for genre in genres_ios:
   total = 0
   len_genre = 0
   for app in ios_final: 
       genre_app = app[-5]
       
       if genre_app == genre:
           n_ratings = float(app[5])
           total += n_ratings
           average_n_ratings = n_ratings / total
           a_tuple = (average_n_ratings, genre)
           a_list.append(a_tuple)
   
sorted(a_list, reverse = True)

Here are the first few lines of what is being output:

(2974676.0, ‘Social Networking’),
(2161558.0, ‘Photo & Video’),
(2130805.0, ‘Games’),
(2018150.0, ‘Social Networking’),
(1927675.5, ‘Games’),
(1605715.0, ‘Games’),
(1469939.6666666667, ‘Social Networking’),
(1410399.0, ‘Games’),
(1269541.2, ‘Games’),
(1242731.5, ‘Photo & Video’),
(1190321.25, ‘Social Networking’),
(1171126.8333333333, ‘Games’),
(1126879.0, ‘Music’),
(1100572.5714285714, ‘Games’),
(1046635.875, ‘Games’),
(1019115.6, ‘Social Networking’),
(1002721.0, ‘Music’),
(998402.1111111111, ‘Games’),
(985920.0, ‘Reference’),

From here, I decided to edit this code because what we’re really trying to do is find the average number of user ratings per given app and so this needs to be accounted for with len_genre. So please correct me if I’m wrong but I believe the final formula should be: average_n_ratings = total / len_genre no? My thinking is that the loop will iterate over the data and add all ratings belonging to genre on to total, then the loop will add 1 every time it identifies the same genre, and then within the loop it will find average_n_ratings. I then incoroporated the code you suggested within the loop. Finally, I sorted from highest value to lowest value:

genres_ios = freq_table(ios_final, -5)

for genre in genres_ios:
    total = 0
    len_genre = 0
    for app in ios_final: 
        genre_app = app[-5]
        
        if genre_app == genre:
            n_ratings = float(app[5])
            total += n_ratings
            len_genre += 1
            average_n_ratings = total / len_genre
            a_tuple = (average_n_ratings, genre)
            a_list.append(a_tuple)
    
sorted(a_list, reverse = True)

The issue now is that I’m seeing duplicate genres. Why is this the case. I see Social Networking showing up multiple times in my output as well as other genres. What’s the problem with my code that’s causing this to hapen? Thanks again.

1 Like

Hi @jithins123,

I have some promising news! Please see code below:

genres_ios = freq_table(ios_final, -5)

for genre in genres_ios:
   total = 0
   len_genre = 0
   for app in ios_final:
       genre_app = app[-5]
       if genre_app == genre:            
           n_ratings = float(app[5])
           total += n_ratings
           len_genre += 1
   avg_n_ratings = total / len_genre
a_tuple = (avg_n_ratings, genre)
a_list.append(a_tuple)
sorted(a_list, reverse = True)

The only problem now is the output. Here are the first several lines of the sorted list:

(86090.33333333333, ‘Navigation’),
(86090.33333333333, ‘Navigation’),
(74942.11111111111, ‘Reference’),
(74942.11111111111, ‘Reference’),
(71548.34905660378, ‘Social Networking’),
(71548.34905660378, ‘Social Networking’),
(57326.530303030304, ‘Music’),
(57326.530303030304, ‘Music’),
(52279.892857142855, ‘Weather’),
(52279.892857142855, ‘Weather’),
(39758.5, ‘Book’),
(39758.5, ‘Book’),
(33333.92307692308, ‘Food & Drink’),
(33333.92307692308, ‘Food & Drink’),
(31467.944444444445, ‘Finance’),
(31467.944444444445, ‘Finance’),
(28441.54375, ‘Photo & Video’),
(28441.54375, ‘Photo & Video’),
(28243.8, ‘Travel’),
(28243.8, ‘Travel’),
(26919.690476190477, ‘Shopping’),
(26919.690476190477, ‘Shopping’),
(23298.015384615384, ‘Health & Fitness’),
(23298.015384615384, ‘Health & Fitness’),
(23008.898550724636, ‘Sports’),
(23008.898550724636, ‘Sports’),
(22788.6696905016, ‘Games’),
(22788.6696905016, ‘Games’),
(21248.023255813954, ‘News’),
(21248.023255813954, ‘News’),
(21028.410714285714, ‘Productivity’),
(21028.410714285714, ‘Productivity’),
(18684.456790123455, ‘Utilities’),
(18684.456790123455, ‘Utilities’),
(16485.764705882353, ‘Lifestyle’),
(16485.764705882353, ‘Lifestyle’),
(16485.764705882353, ‘Lifestyle’),
(16485.764705882353, ‘Lifestyle’),
(16485.764705882353, ‘Lifestyle’),
(14029.830708661417, ‘Entertainment’),
(14029.830708661417, ‘Entertainment’),
(7491.117647058823, ‘Business’),
(7491.117647058823, ‘Business’),
(7003.983050847458, ‘Education’),
(7003.983050847458, ‘Education’),
(4004.0, ‘Catalogs’),
(4004.0, ‘Catalogs’),
(612.0, ‘Medical’),
(612.0, ‘Medical’)

Notice that everything is printed twice? How do I get past this. Also, there are recurring genre entries after the last line of output I’ve included here. I’m getting output that is partially correct but not all the way there. I need to get rid of the duplicates and added unnecessary data. Once i do I’ll have a sorted list without the duplicates! Also, I looked at the solutions closer. I made a mistake. What they’ve printed is not actually in order! I mistakenly thought it was. For whatever reason though, the outputs they get and the outputs I get using the same exact code are not the same. So that’s rather strange-- any thoughts on why this might be?

I also have one other question I’ve been having a hard time figuring out with this code. The first line is genres_ios = freq_table(ios_final, -5). The freq_table function when called should generate percentages no? Here’s that code for reference:

def freq_table(dataset, index):
    table = {}
    total = 0
    
    for row in dataset: 
        total += 1
        value = row[index]
        
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
    

    table_percentages = {}
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentages[key] = percentage
    
    return table_percentages

How does it work that the new code isn’t finding percentages in addition to the new code we’re writing? Is the code being overwritten when we introduce the new loop to find the averages for each genre? How is freq_table functioning within this code being written to find the averages?

Thanks so much!

2 Likes

Hi @kylemoorman1

Great to know that you have made great progress from that last point. I can see that you are almost there. And sorry about the average calculation. I got a bit confused when reading the variable names. You were right about the logic.

Regarding the duplicates, if you look closely at the for loop and indentation to see what lines of codes are part of it, you will figure out that as well. I’m sure you will be able to solve that as well with this hint.

Regarding the frequency table, the code looks fine. It will be great if you can share your notebook file in .ipynb format. It will be easier that way to check if anything else has gone wrong in the previous steps.

1 Like

Hi again,

I was able to sort this out. Thanks so much for your help! I’ll go ahead and post the code below for the community:

genres_ios = freq_table(ios_final, -5)

a_list = []
for genre in genres_ios:
    total = 0
    len_genre = 0
    

    for app in ios_final:
        genre_app = app[-5]
        
        if genre_app == genre:            
            n_ratings = float(app[5])
            total += n_ratings
            len_genre += 1
    avg_n_ratings = total / len_genre
    a_tuple = (avg_n_ratings, genre)
    a_list.append(a_tuple)
    genre_sorted = sorted(a_list, reverse = True)
    
print(genre_sorted)

def explore_genre_sorted(dataset, start, end, rows_and_columns = False):
    dataset_slice = dataset[start:end]
    for row in dataset_slice:
        print(row[1],':',row[0])
        
    
    if rows_and_columns:
        print('\n')
        print('Number of Columns is:', len(dataset[0]))
        print('\n')
        print('Number of rows is:', len(dataset))

I adopted some code from the display_table code and was satisfied to get the following output:

Navigation : 86090.33333333333
Reference : 74942.11111111111
Social Networking : 71548.34905660378
Music : 57326.530303030304
Weather : 52279.892857142855
Book : 39758.5
Food & Drink : 33333.92307692308
Finance : 31467.944444444445
Photo & Video : 28441.54375
Travel : 28243.8
Shopping : 26919.690476190477
Health & Fitness : 23298.015384615384
Sports : 23008.898550724636
Games : 22788.6696905016
News : 21248.023255813954
Productivity : 21028.410714285714
Utilities : 18684.456790123455
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Business : 7491.117647058823
Education : 7003.983050847458
Catalogs : 4004.0
Medical : 612.0


Number of Columns is: 2


Number of rows is: 23

I decided to print number of rows (I guess number of columns is not really necessary) so that I could verify all genres were accounted for when executing explore_genre_sorted. I would really be thrilled if this ended up being added to the solutions notebook. Thanks again so much for your patience and your help @jithins123 I’m truly excited to have finally achieved a satisfactory answer.

1 Like

Hi @kylemoorman1,

Good to know that finally you are able to sort it out by yourself. I can understand the feeling once you struggle a bit and finally get it done. It is very satisfying.