308-1 When using groupby and .to_dict() is there a way to avoid nested dictionnaries?

Screen Link: https://app.dataquest.io/m/308/measures-of-variability/1/the-range

Hello everyone, it’s not the first time people here would use a groupby instead of the more loop-inclined solutions DQ proposes.
But doing so you still need to convert the groupby into a dictionnary with “to_dict()”, but no matter what you pass as a parameter - below I chose “index” -, I always end up with a nested dictonnary.

To bypass that I still need to use a for loop. But is there a way to get rid of this last step and turn this into a great oneliner ???

Here’s my code :

import pandas as pd
houses = pd.read_table('AmesHousing_1.txt')

def range(arr):

range_by_year = houses.groupby(["Yr Sold"]).agg({"SalePrice":range}).to_dict("index")

As a result it gives me this nested dictionnary :

{2006: {'SalePrice': 590000},
 2007: {'SalePrice': 715700},
 2008: {'SalePrice': 601900},
 2009: {'SalePrice': 575100},
 2010: {'SalePrice': 598868}}

And here’s the loop to fix it :

for k, v in range_by_year.items():
    range_by_year[k] = range_by_year[k]["SalePrice"]

Wich gives me the dictionnary as requested :

{2006: 590000, 2007: 715700, 2008: 601900, 2009: 575100, 2010: 598868}

Thanks in advance !

1 Like

I think this should suffice for you -

range_by_year = houses.groupby(["Yr Sold"])["SalePrice"].agg(range).to_dict()

Instead of trying to point out the column in agg, just apply the agg to the column itself.


Indeed, that makes it better. Thank you !