I am failing to filter a grouped dataframe to display it's mean

Screen Link: [[] Exploring EBay Car Sales Data — Exploring The Date Columns | Dataquest

My Code:

by_brands = autos.groupby("brand").mean()
top_brands = brand_freq[brand_freq > .05].index

brand_mean_prices = {}
for brand in top_brands:
    brand_only = [by_brands == brand]
    mean_price = brand_only["usd_price"]
    brand_mean_prices[brand] = mean_price
    
brand_mean_prices

What I expected to happen:
I wanted to display mean of brand names from top_brands as a dictionary or whichever way possible.

What actually happened: But I get a key error instead.

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/dataquest/system/env/python3/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2645             try:
-> 2646                 return self._engine.get_loc(key)
   2647             except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 0

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-50-5eeea529f5a6> in <module>
     11 autos.groupby("brand").mean()'''
     12 
---> 13 (by_brands[0] == bmw)

/dataquest/system/env/python3/lib/python3.8/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2798             if self.columns.nlevels > 1:
   2799                 return self._getitem_multilevel(key)
-> 2800             indexer = self.columns.get_loc(key)
   2801             if is_integer(indexer):
   2802                 indexer = [indexer]

/dataquest/system/env/python3/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2646                 return self._engine.get_loc(key)
   2647             except KeyError:
-> 2648                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2649         indexer = self.get_indexer([key], method=method, tolerance=tolerance)
   2650         if indexer.ndim > 1 or indexer.size > 1:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 0

Your “brand_only” var inside the for loop is just a list of booleans !

even if I give by_brands[by_brands == brand] the result is same.

As far I understand, you are making a comparison between a DataFrameGroupBy (by_brands) and an index item (brand), so it cannot match.

This maybe:

brand_only = by_brands[by_brands[your_col_of_interest] == brand]

1 Like

Check out cell 21 in this notebook. I think it has a similar dictionary structure to what you’re trying to do.
The same dataquest project
Hope it helps. :slight_smile:

Click here to view the jupyter notebook file in a new tab