What does the argument 'group' in 'def dif(group):' code mean?

Screen Link:
https://app.dataquest.io/m/343/data-aggregation/9/introduction-to-the-agg-method

My Code:

def dif(group):
    return (group.max() - group.mean())

mean_max_diff = happy_grouped.agg(dif(group))

What I expected to happen:
The diff function has ‘group’ as argument. What is ‘group’ ?

What actually happened:
The correct code is
mean_max_diff = happy_grouped.agg(dif).
Where does the ‘group’ argument come in ??

NameError: name 'group' is not defined

Hi @DebbiePappas. In this exercise, the “group” argument is passed into the “dif” function, which calculates the difference the max and the mean happiness scores. You can think of “group” as the specific regions in the ‘happiness2015’ dataframe (i.e. Western Europe, Eastern Asia, etc.). Since the “happy_grouped” dataframe isolates the “Happiness Score” coumn, the ‘dif’ function subtracts the mean happiness score from the max happiness score for each region and returns the value. Remember, defining an argument of a function is arbitrary. It can be called “region” instead of “group” to make it easier to remember:


def dif(region):
----return(region.max() - region.mean())


Hope this helps!

1 Like

I understand . Thanks for explaining !

Could someone please help me to understand the difference between

mean_max_dif = dif(happy_grouped)

and

mean_max_dif = happy_grouped.agg(dif)

Both lines give me the same outcome but the platform doesn’t accept the 1st one as a correct answer.

Hello @npekceti!

In your case, there will be no difference but suppose you chose the Economy (GDP per Capita) column and want to know the maximum value for each region. If you apply the max function to the groupby object you’ll have rather strange results (you’ll have tuple of values for the Western Europe) so you want to use the agg method to calculate the maximum values.

In any case, the agg method also helps you to apply multiple functions at a time and write the code in just one line.

1 Like