Question regarding tuples and sorting

Hey guys, quick question regarding the behavior of sorted() with tuples. While working on polishing my Guided Project for the Python Fundamentals course (on app profitability) I stumbled on a weird result (for me). I created a function to get a list of tuples for the number of ratings for apps of a certain genre and print the first few to the screen. I was really confused when I just asked to print and Python automatically sorted it for me (i.e. printed the ones with the highest number of ratings first), but when I tried to use sorted(list, reverse = True), did not appear to be sorted. Does anyone understand why this is? The text for my function is below.

def explore_genre(dataset: list, gen_ind: int, name_ind: int, use_ind: int, top: int, genre: str):
    genre_count = []
    
    for app in dataset:
        if app[gen_ind] == genre:
            genre_count.append((app[use_ind], app[name_ind]))
    
    for genre in genre_count[:top]:
        print(genre[1], ":", genre[0]) 
1 Like

To add on, here is what happens when I try sorted()

And when I don’t

Seems really counterintuitive that it sorts without using sorted() but it is unsorted when using it, especially when considering something similar actually worked when considering average ratings above:

Created a function to return average ratings list of tuples

Printed it and it is in fact sorted

So why does it work here and not in the other case? Any help appreciated! Thanks

Hey, Petrucci.

It’s impossible to be sure without knowing everything that you did on your end (this is one of the reasons why it is a good idea to share your notebook when asking questions about guided projects), but looking at the results it looks like it is sorting lexicographically, meaning it is using the same kind of order you find in the dictionary.

This means that, for instance, 5 is greater than 1000000000000, because:

  • 5 starts with 5;
  • 1000000000000 starts with 1; and
  • 5 is lexicographically greater than 1.

It is using lexicographic (instead of numeric) order because the values are probably strings. If you modify your function to append the int version of the string, it should work as you expect:

def explore_genre(dataset, gen_ind, name_ind, use_ind, top, genre):
    genre_count = []
    
    for app in dataset:
        if app[gen_ind] == genre:
            genre_count.append((int(app[use_ind]), app[name_ind]))
    
#     genre_count = sorted(genre_count, reverse=True)
    for genre in genre_count[:top]:
        print(genre[1], ":", genre[0]) 

The reason why you get correct results without doing anything, is because the original dataset is already sorted by the number of ratings.

I hope this clears it up.

2 Likes

Thank you! I can totally see this may be the problem. I don’t know if I completely understand (e.g. why does 5 come before 3582?), but it is definitely a mistake that I did not transform to an int

1 Like

The same way that a comes before b, c or z, 3 (and 3582) comes before 5.

When you reverse the order, 5 comes before 3582.