7. Exploring Price by Brand: how to sort dictionary inside loop?

I just finished exercise. I was told to practice aggregation (Exploring Price by Brand). I decided to find top 10 brands ( that have the most offers). From that “top 10” I wanted to find the mean price for each brand. This stuff was put into a dictionary, but there is no proper order( I’d like to have made it in such a way that the most expensive mean price point the brand related to this price). I did so, but I had to change this dictionary to list(because the dictionary was unsorted).

Because the dictionary is unsorted, I wonder how I can put some code inside the loop so it will append key/value to the dictionary in the above described manner?

In my version code printed is from list. It would be cool to have it from a dictionary.

mean_price_for_top_10_cars_raw = {}

for b in autos['brand'].value_counts().index[:10]: # index[:10] -> we need just top 10 
    top_brands = autos[autos["brand"] == b]
    mean_price = top_brands['price'].mean()
    mean_price_for_top_10_cars_raw[b] = mean_price
    
for key in mean_price_for_top_10_cars_raw:
    print(key, ':', mean_price_for_top_10_cars_raw[key]) # print of unsorted dict. mean_price_for_top_10_cars_raw 

My hole code below:


for b in autos['brand'].value_counts().index[:10]: # index[:10] -> we need just top 10 
    top_brands = autos[autos["brand"] == b]
    mean_price = top_brands['price'].mean()
    mean_price_for_top_10_cars_raw[b] = mean_price
    
for key in mean_price_for_top_10_cars_raw:
    print(key, ':', mean_price_for_top_10_cars_raw[key]) # print of unsorted dict. mean_price_for_top_10_cars_raw 

#The dictionary gave us unsorted, not well looking output, because we won't do any further analysis with this object, we will convert it into a list and reshape it.
tuple_list = (sorted(((value, key) 
                      for (key,value) in mean_price_for_top_10_cars_raw.items())
                     , reverse=True)
             )# sorted dictionary saved as tuple's list

list_from_tuple = [list(x) for x in tuple_list] # convert tuple to list of lists

print('Mean price for top 10 most often offered cars is:', '\n')

for index in list_from_tuple:
    index_0 = index[1]
    index_1 = index[0]
    print(index_0, ":", int(index_1), "$") # printing nice sorted output from tuple

output:
image

I also wonder how I can enumerate my output like this (in my answer: list version):

  1. brand price
  2. brand price
  3. brand price
  4. ect
2 Likes

Hi @drill_n_bass,

In Python 3.7+, dictionaries preserve the order of insertion.

If you want to sort a dictionary by value, you can use the following approach:

import operator
unsorted_dict = {'a': 10, 'b': 28, 'c': 63, 'd': 1, 'e': 14}
sorted_by_value = sorted(unsorted_dict.items(), key=operator.itemgetter(1))
sorted_by_value

The result sorted_by_value will be a list of tuples:

[('d', 1), ('a', 10), ('e', 14), ('b', 28), ('c', 63)]

If you want to sort your dictionary by value, but in descending order (and it seems that you want :slightly_smiling_face:), just add the argument reverse=True to the sorted() function above.

Now, if you want to convert your list of tupples back to dictionary and, hence, to have a dictionary sorted by value, you can define the following function:

def tuple_list_to_dict(tuple_list): 
    return dict(tuple_list) 
 
sorted_dict = tuple_list_to_dict(sorted_by_value)
sorted_dict

Output:

{'d': 1, 'a': 10, 'e': 14, 'b': 28, 'c': 63}
2 Likes

For some reason this code that you mentioned doesn’t work.

I tried two ways:

mean_odometr_km = {}

for b in autos['brand'].value_counts().index:
    top_brands = autos[autos["brand"] == b]
    mean_odo_km = top_brands['odometer_km'].mean().astype(int)
    mean_odometr_km[b] = mean_odo_km

import operator
sorted_by_value = sorted(mean_odometr_km.items(), key=operator.itemgetter(1))
sorted_by_value



def tuple_list_to_dict(tuple_list): 
    return dict(tuple_list) 

sorted_dict = tuple_list_to_dict(sorted_by_value)
sorted_dict

print(sort_dict)
print(type(sort_dict))

output:
image

I tried also this:

mean_odometr_km = {}

for b in autos['brand'].value_counts().index:
    top_brands = autos[autos["brand"] == b]
    mean_odo_km = top_brands['odometer_km'].mean().astype(int)
    mean_odometr_km[b] = mean_odo_km


def sort_dict(dictionary): # sorting dictionary

    import operator

    dictionary = sorted(dictionary.items(), key=operator.itemgetter(1), reverse=True) # reverse: the highest value on top
    return dict(dictionary) #  The result sorted_by_value will be a list of tuples !!!

#     def tuple_list_to_dict(dictionary): # we chained this def. with this one above
#         return dict(dictionary) 

mean_odometr_km_sorted = sort_dict(mean_odometr_km)
print(mean_odometr_km_sorted)

…but the output isn’t sorted :frowning: :

Hi @drill_n_bass,

There is a typo in your first block of code: not sort_dict, but sorted_dict. Just fix it, and everything will work correctly:

print(sorted_dict)
print(type(sorted_dict))

Otherwise, of course, sort_dict() is a function, so Python gave you the right answer :slightly_smiling_face:

true. there was typo. But after fixing this version ( my first block), now, both give me the same - unsorted - output:

updated code:

mean_odometr_km = {}

for b in autos['brand'].value_counts().index:
    top_brands = autos[autos["brand"] == b]
    mean_odo_km = top_brands['odometer_km'].mean().astype(int)
    mean_odometr_km[b] = mean_odo_km

    
import operator
sorted_by_value = sorted(mean_odometr_km.items(), key=operator.itemgetter(1))
sorted_by_value



def tuple_list_to_dict(tuple_list): 
    return dict(tuple_list) 

sorted_dict = tuple_list_to_dict(sorted_by_value)
sorted_dict

print(sorted_dict)
print(type(sorted_dict)) 
    

Uhm… Sounds really surprising, because it should work indeed :thinking: Can you please re-run the code block above and show the exact output of the mean_odometr_km dictionary? Just insert print(mean_odometr_km) right after executing the for-look and before the rest of the code in the code block above. I’d like to test this dictionary on my computer and try to understand the issue.

By the way, I think we can modify the second part of the code above, otherwise it’s too wordy. Probably we don’t really need that function which does the only simple action. So let’s write it directly in this way:

import operator
sorted_by_value = sorted(mean_odometr_km.items(), key=operator.itemgetter(1))

sorted_dict = dict(sorted_by_value)

print(sorted_dict)
print(type(sorted_dict)) 

But right before this part, don’t forget to insert print(mean_odometr_km) and then please show the output of this line in a code format (not as a screenshot). I’m already becoming curious about this puzzle :slightly_smiling_face:

@drill_n_bass,

Look what I tried:

  1. The first code block is just duplication of the solution notebook for this project (of course, I took from there only the relevant code, not everything).
import pandas as pd
import numpy as np

autos = pd.read_csv('autos.csv', encoding='Latin-1')
autos.columns = ['date_crawled', 'name', 'seller', 'offer_type', 'price', 'ab_test',
       'vehicle_type', 'registration_year', 'gearbox', 'power_ps', 'model',
       'odometer', 'registration_month', 'fuel_type', 'brand',
       'unrepaired_damage', 'ad_created', 'num_photos', 'postal_code',
       'last_seen']
autos["odometer"] = (autos["odometer"]
                             .str.replace("km","")
                             .str.replace(",","")
                             .astype(int)
                             )
autos.rename({"odometer": "odometer_km"}, axis=1, inplace=True)
  1. The second code block is creating an unsorted dictionary mean_odometr_km, and printing it out. Here I only changed the usage of astype(int) with int() to obtain integers.
mean_odometr_km = {}
for b in autos['brand'].value_counts().index:
    top_brands = autos[autos["brand"] == b]
    mean_odo_km = top_brands['odometer_km'].mean()
    mean_odometr_km[b] = int(mean_odo_km)
    
print(mean_odometr_km) 

Output:

{'volkswagen': 128955, 'opel': 129298, 'bmw': 132521, 'mercedes_benz': 130886, 'audi': 129643, 'ford': 124131, 'renault': 128223, 'peugeot': 127352, 'fiat': 117037, 'seat': 122061, 'skoda': 110947, 'mazda': 125132, 'nissan': 118978, 'smart': 100756, 'citroen': 119764, 'toyota': 115988, 'sonstige_autos': 87188, 'hyundai': 106782, 'volvo': 138632, 'mini': 89375, 'mitsubishi': 126293, 'honda': 123709, 'kia': 112640, 'alfa_romeo': 131109, 'porsche': 97363, 'suzuki': 109334, 'chevrolet': 99522, 'chrysler': 133149, 'dacia': 84728, 'daihatsu': 114843, 'jeep': 126409, 'subaru': 124449, 'land_rover': 118333, 'saab': 143750, 'daewoo': 121708, 'trabant': 59358, 'jaguar': 121298, 'rover': 136449, 'lancia': 123157, 'lada': 86774}
  1. The 3rd code block is sorting out the dictionary by value:
import operator
sorted_by_value = sorted(mean_odometr_km.items(), key=operator.itemgetter(1))
sorted_dict = dict(sorted_by_value)
print(sorted_dict)

Output:

{'trabant': 59358, 'dacia': 84728, 'lada': 86774, 'sonstige_autos': 87188, 'mini': 89375, 'porsche': 97363, 'chevrolet': 99522, 'smart': 100756, 'hyundai': 106782, 'suzuki': 109334, 'skoda': 110947, 'kia': 112640, 'daihatsu': 114843, 'toyota': 115988, 'fiat': 117037, 'land_rover': 118333, 'nissan': 118978, 'citroen': 119764, 'jaguar': 121298, 'daewoo': 121708, 'seat': 122061, 'lancia': 123157, 'honda': 123709, 'ford': 124131, 'subaru': 124449, 'mazda': 125132, 'mitsubishi': 126293, 'jeep': 126409, 'peugeot': 127352, 'renault': 128223, 'volkswagen': 128955, 'opel': 129298, 'audi': 129643, 'mercedes_benz': 130886, 'alfa_romeo': 131109, 'bmw': 132521, 'chrysler': 133149, 'rover': 136449, 'volvo': 138632, 'saab': 143750}

Don’t pay attention that the exact values in my dictionaries mean_odometr_km and sorted_dict are a bit different from yours (on your screenshots) for each brand: obviously, you dropped some rows while cleaning the data and I didn’t, since I was only trying to understand the issue behind (and the issue is definitely not about cleaning the data :slightly_smiling_face:).

I think it’s the same. But I’m just start checking…here is my hole code:

I’m still checking your answer…
First difference noticed is, that I used in this part you mentioned:

this code:

for b in autos['brand'].value_counts().index:
    top_brands = autos[autos["brand"] == b]
    mean_odo_km = top_brands['odometer_km'].mean().astype(int)
    mean_odometr_km[b] = mean_odo_km

But it shouldn’t make a difference

Strangely, this asfloat(int) gave me an error, so I substituted with int(). But yes, I agree that it shouldn’t be an issue.

or astype(int) gave you this error?

astype(int), yes. Sorry, it was a typo above :flushed:

It perfectly works now! :smiley: I substituted only this astype(int) with int() (it seems to be a conflict of our python versions, so in your version astype(int) should work anyway). Then, I added in your code cell [42] only print(mean_odometr_km) and print('\n') as below, just to visualize the output and separate unsorted and sorted dictionaries in the output. Al the other code remains as it was.

mean_odometr_km = {}

for b in autos['brand'].value_counts().index:
    top_brands = autos[autos["brand"] == b]
    mean_odo_km = top_brands['odometer_km'].mean()
    mean_odometr_km[b] = int(mean_odo_km)
print(mean_odometr_km)
print('\n')

Output (of the whole cell [42] in your project, not only of the piece of code above):

{'volkswagen': 128728, 'bmw': 132434, 'opel': 129223, 'mercedes_benz': 130856, 'audi': 129287, 'ford': 124068, 'renault': 128183, 'peugeot': 127136, 'fiat': 116553, 'seat': 121563, 'skoda': 110954, 'mazda': 124745, 'nissan': 118572, 'citroen': 119580, 'smart': 99595, 'toyota': 115709, 'sonstige_autos': 87262, 'hyundai': 106511, 'volvo': 138355, 'mini': 88602, 'mitsubishi': 126930, 'honda': 122851, 'kia': 112434, 'alfa_romeo': 131399, 'porsche': 97457, 'suzuki': 109049, 'chevrolet': 99251, 'chrysler': 133181, 'daihatsu': 115284, 'dacia': 84268, 'jeep': 127546, 'subaru': 124857, 'land_rover': 118010, 'saab': 144415, 'jaguar': 120921, 'trabant': 59666, 'daewoo': 122430, 'rover': 135615, 'lancia': 122019, 'lada': 85517}


{'trabant': 59666, 'dacia': 84268, 'lada': 85517, 'sonstige_autos': 87262, 'mini': 88602, 'porsche': 97457, 'chevrolet': 99251, 'smart': 99595, 'hyundai': 106511, 'suzuki': 109049, 'skoda': 110954, 'kia': 112434, 'daihatsu': 115284, 'toyota': 115709, 'fiat': 116553, 'land_rover': 118010, 'nissan': 118572, 'citroen': 119580, 'jaguar': 120921, 'seat': 121563, 'lancia': 122019, 'daewoo': 122430, 'honda': 122851, 'ford': 124068, 'mazda': 124745, 'subaru': 124857, 'mitsubishi': 126930, 'peugeot': 127136, 'jeep': 127546, 'renault': 128183, 'volkswagen': 128728, 'opel': 129223, 'audi': 129287, 'mercedes_benz': 130856, 'alfa_romeo': 131399, 'bmw': 132434, 'chrysler': 133181, 'rover': 135615, 'volvo': 138355, 'saab': 144415}
<class 'dict'>

Please try to run your notebook now, inserting only those 2 print statements. Don’t change astype(int) with int(), if in your version of python you received no error about it.

that’s strange. When I did this project on DataQuest the output is still incorrect ( it’s not sorted).

So, I’ve uploaded files to jupiter notebook(via Anaconda on pc). There it works, but, same as you, I cant use astype(int) in few places(in my code).

When I place for ex.:

top_six_brands = {}
for value in mean_price_for_top_10_cars_raw:
    if mean_price_for_top_10_cars_raw[value] >= 3039: # bool filtering price for the cheapest(Peugeot)
        top_six_brands[value] = mean_price_for_top_10_cars_raw[value].astype(int)

Part top_six_brands[value] = mean_price_for_top_10_cars_raw[value].astype(int) gives me this output:

how should I understand that? Then how can I change float to int ?

Yes, it’s better to run guided projects directly in Anaconda: in this way you can have a bigger screen and use the latest version of Python. In fact, in the older versions (before Python 3.7) it was imposiible to sort dictionaries, so it’s a relatively new feauture). Hence, DQ has a version of Python earlier than 3.6.

As for converting float to int, you can modify the code above (and also some other occurencies of this astype(int) in your project) in this way:

top_six_brands[value] = int(mean_price_for_top_10_cars_raw[value])
1 Like

thank you for help :slight_smile:

so in new Python:

It works like with other objects. Cool :slight_smile:

About that piece of code where we’re sorting an unsorted dictionary: I’ve just realized now that we actually can do it in one line, without introducing any functions and other complications :blush: Look:

sorted_dict= dict(sorted(unsorted_dict.items(), key=operator.itemgetter(1))) ```

Always in Python 3.7+, of course.

I’ll check it tomorrow, but, anyway I just checked in DQ - same result. :confused:
The lesson here is: use Anaconda. :slight_smile:

1 Like

on my anaconda there was need to add:

import operator

Without it, this code you mentioned now, was rejected and produced an error.
My python on Jupyter:

@drill_n_bass,

Oh, sorry, my fault! We have to add import operator before that line of code, since we use it there, yes.