Can you explain the function here

Screen Link:
https://app.dataquest.io/m/343/data-aggregation/9/introduction-to-the-agg-method

My Code:

import numpy as np
grouped = happiness2015.groupby('Region')
happy_grouped = grouped['Happiness Score']
def dif(group):
    return (group.max() - group.mean())
happy_mean_max = happy_grouped.agg([np.mean, np.max])
mean_max_dif = happy_grouped.agg(dif)

What I expected to happen:
I don’t understand how the value is assigned to the variable group in the function dif
def dif(group):
return (group.max() - group.mean())
What actually happened:
I don’t understand how the value is assigned to the variable group(argument) in the function dif
def dif(group):
return (group.max() - group.mean())

Replace this line with the output/error

From how I understand it, the parameter given to happy_grouped.agg() can be any function or list of functions like np.mean and np.max . It then passes the series of Happiness Score for each region as the paramter of the function.

So for example np.mean(group) is passed a series with the values from Australia and New Zealand and calculates the mean of it and then goes to the next group of regions. The parameter group is the series of Australia and New Zealand. After the calculation the parameter group is now equal to the next series Central and Eastern Europe and so forth.

The function dif(group) does the same and is passed the group of rows which is the result of groupby, so first the values of Australia and New Zealand and so on.
I hope this helps a little. I’m not very good at explaining.

Hi!
In this mission we discover groupby objects. We grouped the happiness2015 data frame by “Region” and then saved only the “Happiness Score” data by each group (Region in our case) to happy_grouped. Then we apply a custom function dif to happy_grouped using the GroupBy.agg() method. The values used are the happiness scores of every country in each group(region).

But how an argument is passed to the function dif, I need to understand that.

I dont see any assignment of value to the parameter anywhere

Yeah, it doesn’t explicitly assign them as far as I understand it. It just happens invisibly. You can test the understanding for yourself by writing your own simple function. For example we first create a small dataframe:

df = pd.DataFrame({'Region': ['Australia', 'New Zealand', 'Australia', 'New Zealand'],
 'value': [2,3,5,0]})

Example-df

Then write a simple function that will return the maximum from a set of rows or the minimum depending on the variable you assigned:

def findExtreme(series, maxi):
    if maxi:
        return np.max(series)
    else:
        return np.min(series)

Using df.groupby('Region').agg(findExtreme, True) will now return this:
exampe-df2

if you path assign maxi = False , so df.groupby('Region').agg(findExtreme, False), this will happen:
exampe-df3

You can see that the second paramter is assigned explicitly by yourself. I don’t really know what happens under the hood since I also am just trying to learn here.

Hi! @allurivijay
First, Welcome to the community!
Hope you are enjoying it.
Now, let’s try to answer your question

You can use dif function in two ways to calculate mean_max_dif and get the same result:

  1. When you pass it as an argument to GroupBy.agg() mehtod i.e GroupBy.agg(dif), here you are passing dif function without the parentheses which is only referring to the function object itself and treating it like a variable, means without the parentheses you’re not actually calling the function you are just referring to the function.
    So, when you do happy_grouped.agg(dif) for each group , GroupBy.agg() method will refer to the function and will call it, then pass that group as an argument to the dif function and return the value of it. So, you did
    mean_max_dif = happy_grouped.agg(dif)

Note: Methods are also a kind of function in a class and you can’t pass function (with parentheses) as an argument to a function (which is method here) in python. So, we can only pass the name of the function (without parentheses) which is only referring to the actual function.

  1. You can also call the function itself and pass the group (remember you defined group as a parameter while defining the function dif) as an argument to it.
    mean_max_dif = dif(happy_grouped)
    Here you are directly applying the function to the group and then function returning the value (here no one is calling the function because you are using function itself directly)

Both will give you the same result, it’s just two different pythonic ways of doing things.
You can also refer to this post just in case for .apply() method.
If you further want to learn the difference between object and variable, I will highly encourage you to read this
Hope this helps!

3 Likes

thanks much for the explanation. it will be helpful if it is mentioned earlier that if the function is passed as a parameter to a method we shouldn’t use the paranthesis.

Hi! @allurivijay
If explanation helps you, it will be great if you marked it as a solution for the sake of community and to help other learners.

Happy Learning!