Sorting a mixed list in order to find the median

https://app.dataquest.io/c/56/m/306/the-weighted-mean-and-the-median/4/the-median-for-open-ended-distributions

In this screen we are asked to find the median of a list that contains both int and str values which are not in any particular order. The Hint suggests that we should sort the values in the list before determining the median but does not suggest a good strategy for doing so. Using sorted()on the list will return a TypeError when it tries to compare str < int.

Here is some code I came up with instead of manually sorting it by hand:

distribution1 = [23, 24, 22, '20 years or lower,', 23, 42, 35]
distribution2 = [55, 38, 123, 40, 71]
distribution3 = [45, 22, 7, '5 books or lower', 32, 65, '100 books or more']

lower_idx = distribution1.index('20 years or lower,')
dist1_start = distribution1.pop(lower_idx)
distribution1 = sorted(distribution1)
distribution1.insert(0, dist1_start)

distribution2 = sorted(distribution2)

lower_idx = distribution3.index('5 books or lower')
distr3_start = distribution3.pop(lower_idx)
upper_idx = distribution3.index('100 books or more')
distr3_end = distribution3.pop(upper_idx)
distribution3 = sorted(distribution3)
distribution3.insert(0, distr3_start)
distribution3.insert(len(distribution3), distr3_end)

print(distribution1)
print(distribution2)
print(distribution3)

While I don’t believe this to be the most elegant solution, I did really enjoy using list.pop() to remove the “unwanted” str values while also storing it to a variable so that it could be easily reinserted back into the list once it was sorted.

2 Likes

cool, I’ve played around with automating the whole process:

import re
distribution3 = [45, 22, 7, '5 books or lower', 32, 65, '100 books or more']

# 1. Extract all strings from original list, add them to a string list:
string_list = []
for i in range(len(distribution3)-1): 
    if isinstance(distribution3[i], str):
        string_list.append(distribution3.pop(i))

# 2. Extract integers from string list and add them to numbers list:
num_list = [] 
for num in string_list:
    num_list.append(int(re.findall(r'\d+', num)[0]))

# 3. Add the integers back to the original list 
#    and sort that list:
distribution3 = sorted(distribution3 + num_list)

# 4. If in our original list we'll find an integer from numbers list,
#    then we'll replace it with the string, from our strings list:
for idx,ele in enumerate(distribution3):
    for index, item in enumerate(num_list):
        if item == ele:
            distribution3[idx] = string_list[index]
distribution3    

OUT:

['5 books or lower', 7, 22, 32, 45, 65, '100 books or more']

a few lines up there, thought about making it shorter , but readability drops:

# 1:
string_list = [distribution3.pop(i) for i in range(len(distribution3)-1) if isinstance(distribution3[i], str)]
# 2:
num_list = [int(re.findall(r'\d+', num)[0]) for num in string_list]

When I saw that, I didn’t even bother to do list comprehensions on step 4.

2 Likes

Very nice use of regex here to make it more general! Agreed that readability drops but I still like the first list comprehension you did in #1.

This can be done with a dictionary and sorting.

def mix_sorted(iter_o):
    iter_o = {i:(i if isinstance(i, int) else int(re.match(r'\d+', i).group())) for i in iter_o}
    return sorted(iter_o, key=lambda x: iter_o[x])

Or if you get away from regular expression and go back to your situation

def mix_sorted(iter_o):
    iter_o = {i:(i if isinstance(i, int) else int(i.split(" ")[0]) for i in iter_o}
    return sorted(iter_o, key=lambda x: iter_o[x])
3 Likes

Now that’s clever, thanks for sharing @moriturus7! I obviously need to practice using dictionaries more…they are amazing little data structures.