Pandas Data Aggregation Mission 12

Screen Link: https://app.dataquest.io/m/343/data-aggregation/12/aggregating-multiple-columns-and-functions-with-pivot-tables

Your Code:

happiness2015.pivot_table(['Happiness Score', 'Family'], 'Region')
haphappiness2015_by_region = happiness2015.pivot_table(
    ['Country',
     'Region',
     'Happiness Rank',
     'Happiness Score',
     'Standard Error',
     'Economy (GDP per Capita)',
     'Family',
     'Health (Life Expectancy)',
     'Freedom',
     'Trust (Government Corruption)',
     'Generosity',
     'Dystopia Residual'
    ],
    'Region')
haphappiness2015_by_region.plot(kind='barh', title='Mean Happiness Scores by Region', xlim=(0,10), legend=False)

What I expected to happen:
I would like to ask the reason why removing the last two columns passed in the list parameter of pivot table causes an error. The code above works fine when I test it When I remove one column from the list of columns passed as a parameter to happiness2015_by_region it still works. However, when I remove two or more columns the following error appears:

ValueError: Grouper for ‘Region’ not 1-dimensional

What actually happened:

happiness2015.pivot_table(['Happiness Score', 'Family'], 'Region')
haphappiness2015_by_region = happiness2015.pivot_table(
    ['Country',
     'Region',
     'Happiness Rank',
     'Happiness Score',
     'Standard Error',
     'Economy (GDP per Capita)',
     'Family',
     'Health (Life Expectancy)',
     'Freedom',
     'Trust (Government Corruption)',    
    ],
    'Region')
haphappiness2015_by_region.plot(kind='barh', title='Mean Happiness Scores by Region', xlim=(0,10), legend=False)

Error full text:

ValueErrorTraceback (most recent call last)
<ipython-input-1-3243e870360a> in <module>()
     12      'Trust (Government Corruption)',
     13     ],
---> 14     'Region')
     15 haphappiness2015_by_region.plot(kind='barh', title='Mean Happiness Scores by Region', xlim=(0,10), legend=False)

/dataquest/system/env/python3/lib/python3.4/site-packages/pandas/core/frame.py in pivot_table(self, values, index, columns, aggfunc, fill_value, margins, dropna, margins_name)
   4466                            aggfunc=aggfunc, fill_value=fill_value,
   4467                            margins=margins, dropna=dropna,
-> 4468                            margins_name=margins_name)
   4469 
   4470     def stack(self, level=-1, dropna=True):

/dataquest/system/env/python3/lib/python3.4/site-packages/pandas/core/reshape/pivot.py in pivot_table(data, values, index, columns, aggfunc, fill_value, margins, dropna, margins_name)
     79         values = list(values)
     80 
---> 81     grouped = data.groupby(keys)
     82     agged = grouped.agg(aggfunc)
     83 

/dataquest/system/env/python3/lib/python3.4/site-packages/pandas/core/generic.py in groupby(self, by, axis, level, as_index, sort, group_keys, squeeze, **kwargs)
   5160         return groupby(self, by=by, axis=axis, level=level, as_index=as_index,
   5161                        sort=sort, group_keys=group_keys, squeeze=squeeze,
-> 5162                        **kwargs)
   5163 
   5164     def asfreq(self, freq, method=None, how=None, normalize=False,

/dataquest/system/env/python3/lib/python3.4/site-packages/pandas/core/groupby.py in groupby(obj, by, **kwds)
   1846         raise TypeError('invalid type: %s' % type(obj))
   1847 
-> 1848     return klass(obj, by, **kwds)
   1849 
   1850 

/dataquest/system/env/python3/lib/python3.4/site-packages/pandas/core/groupby.py in __init__(self, obj, keys, axis, level, grouper, exclusions, selection, as_index, sort, group_keys, squeeze, **kwargs)
    514                                                     level=level,
    515                                                     sort=sort,
--> 516                                                     mutated=self.mutated)
    517 
    518         self.obj = obj

/dataquest/system/env/python3/lib/python3.4/site-packages/pandas/core/groupby.py in _get_grouper(obj, key, axis, level, sort, mutated, validate)
   2955                         sort=sort,
   2956                         in_axis=in_axis) \
-> 2957             if not isinstance(gpr, Grouping) else gpr
   2958 
   2959         groupings.append(ping)

/dataquest/system/env/python3/lib/python3.4/site-packages/pandas/core/groupby.py in __init__(self, index, grouper, obj, name, level, sort, in_axis)
   2703                 if getattr(self.grouper, 'ndim', 1) != 1:
   2704                     t = self.name or str(type(self.grouper))
-> 2705                     raise ValueError("Grouper for '%s' not 1-dimensional" % t)
   2706                 self.grouper = self.index.map(self.grouper)
   2707                 if not (hasattr(self.grouper, "__len__") and

ValueError: Grouper for 'Region' not 1-dimensional

I have concluded for now that the reason for the error is a mismatch in parameters. As I understand, the parameters values and columns or indexes of DataFrame.pivot_table operate like this:

  • The column values passed to values will be the columns on which the aggregation function is applied.
  • The index parameter must be supplied with some column or index name with which the values (usually column names) passed to columns or indexes will be grouped or “aggregated” by.
  • If the index parameter has a value, the unique values found in the column names passed to index will be used as the indexes of the returned data frame.
  • If columns has a value, the unique values found in the columns passed to columns will be used as the column names of the returned data frame.
  • The reason my code throws an error is because the values aren’t being passed to the proper parameters.

I assume these because I managed to get the code to output a very ugly graph by adding parameter names to my call to DataFrame.pivot_table.

happiness2015.pivot_table(['Happiness Score', 'Family'], 'Region')
haphappiness2015_by_region = happiness2015.pivot_table(
    values = 
    ['Country',
     'Region',
     'Happiness Rank',
     'Happiness Score',
     'Standard Error',
     'Economy (GDP per Capita)',
     'Family',
     'Health (Life Expectancy)',
     'Freedom',
     'Trust (Government Corruption)',
     'Generosity',
     'Dystopia Residual'
    ],
    index = 'Region',
    aggfunc = np.mean)
haphappiness2015_by_region.plot(kind='barh', title='Means of Scores by Region', xlim=(0,10), legend=False)

For some or all indexes in DataFrame , you are assigning MORE THAN just one label, groupby() doesn’t know which label it should use for grouping. DataFrame groupby() doesn’t need to care about df or which columns , groupby() only cares about one thing, a lookup table that tells it which df.index is mapped to which label (ie. group name). To solve this, get rid of the multi-index by using