# Sorting a mixed list in order to find the median

https://app.dataquest.io/c/56/m/306/the-weighted-mean-and-the-median/4/the-median-for-open-ended-distributions

In this screen we are asked to find the median of a list that contains both `int` and `str` values which are not in any particular order. The Hint suggests that we should sort the values in the list before determining the median but does not suggest a good strategy for doing so. Using `sorted()`on the list will return a `TypeError` when it tries to compare `str` < `int`.

Here is some code I came up with instead of manually sorting it by hand:

``````distribution1 = [23, 24, 22, '20 years or lower,', 23, 42, 35]
distribution2 = [55, 38, 123, 40, 71]
distribution3 = [45, 22, 7, '5 books or lower', 32, 65, '100 books or more']

lower_idx = distribution1.index('20 years or lower,')
dist1_start = distribution1.pop(lower_idx)
distribution1 = sorted(distribution1)
distribution1.insert(0, dist1_start)

distribution2 = sorted(distribution2)

lower_idx = distribution3.index('5 books or lower')
distr3_start = distribution3.pop(lower_idx)
upper_idx = distribution3.index('100 books or more')
distr3_end = distribution3.pop(upper_idx)
distribution3 = sorted(distribution3)
distribution3.insert(0, distr3_start)
distribution3.insert(len(distribution3), distr3_end)

print(distribution1)
print(distribution2)
print(distribution3)
``````

While I don’t believe this to be the most elegant solution, I did really enjoy using `list.pop()` to remove the “unwanted” `str` values while also storing it to a variable so that it could be easily reinserted back into the list once it was sorted.

2 Likes

cool, I’ve played around with automating the whole process:

``````import re
distribution3 = [45, 22, 7, '5 books or lower', 32, 65, '100 books or more']

# 1. Extract all strings from original list, add them to a string list:
string_list = []
for i in range(len(distribution3)-1):
if isinstance(distribution3[i], str):
string_list.append(distribution3.pop(i))

# 2. Extract integers from string list and add them to numbers list:
num_list = []
for num in string_list:
num_list.append(int(re.findall(r'\d+', num)))

# 3. Add the integers back to the original list
#    and sort that list:
distribution3 = sorted(distribution3 + num_list)

# 4. If in our original list we'll find an integer from numbers list,
#    then we'll replace it with the string, from our strings list:
for idx,ele in enumerate(distribution3):
for index, item in enumerate(num_list):
if item == ele:
distribution3[idx] = string_list[index]
distribution3
``````

OUT:

``````['5 books or lower', 7, 22, 32, 45, 65, '100 books or more']
``````

a few lines up there, thought about making it shorter , but readability drops:

``````# 1:
string_list = [distribution3.pop(i) for i in range(len(distribution3)-1) if isinstance(distribution3[i], str)]
# 2:
num_list = [int(re.findall(r'\d+', num)) for num in string_list]
``````

When I saw that, I didn’t even bother to do list comprehensions on step 4.

2 Likes

Very nice use of regex here to make it more general! Agreed that readability drops but I still like the first list comprehension you did in `#1`.

This can be done with a dictionary and sorting.

``````def mix_sorted(iter_o):
iter_o = {i:(i if isinstance(i, int) else int(re.match(r'\d+', i).group())) for i in iter_o}
return sorted(iter_o, key=lambda x: iter_o[x])
``````

Or if you get away from regular expression and go back to your situation

``````def mix_sorted(iter_o):
iter_o = {i:(i if isinstance(i, int) else int(i.split(" ")) for i in iter_o}
return sorted(iter_o, key=lambda x: iter_o[x])
``````
3 Likes

Now that’s clever, thanks for sharing @moriturus7! I obviously need to practice using dictionaries more…they are amazing little data structures.