Screen Link:
In the lesson the custom function was created.
def dif(group):
return(group.max() - group.mean())
happy_grouped.agg(dif)
But I tried to understand it, but failed.
Can someone explain how this function works?
Screen Link:
In the lesson the custom function was created.
def dif(group):
return(group.max() - group.mean())
happy_grouped.agg(dif)
But I tried to understand it, but failed.
Can someone explain how this function works?
What exactly are you having trouble understanding?
group
?group.max()
?max()
?group.mean()
?mean()
?group.max() - group.mean()
?.agg()
happy_grouped.agg(dif)
The reason why I have listed the above out is because that’s pretty much what you would get if you broke down the code, line-by-line. Which is essentially the first step to try and figure out what some piece of code is doing.
So, looking at the above individual parts of that code, what exactly are you having trouble understanding? Take your time, and think about it.
Also, share the link to the Mission. Otherwise it’s difficult to be able to refer to the source and provide an answer/have a discussion that can help you out.
When we code happy_grouped.agg(dif) we use .agg method to perform aggregation using dif function. But we didn`t pass any argument in dif. How could dif count something if no args were passed?
What is ‘group’ in dif(group)? Is it pandas.core.groupby.SeriesGroupBy type?
What does parentheses after return means?
return ( group.max() - group.mean() )
https://app.dataquest.io/m/343/data-aggregation/9/introduction-to-the-agg-method
1.
That’s something which is briefly touched upon in that Step itself -
Note that when we pass the functions into the
agg()
method as arguments, we don’t use parentheses after the function names. For example, when we usenp.mean
, we refer to the function object itself and treat it like a variable, whereasnp.mean()
would be used to call the function and get the returned value.
How that actually works comes down to trying to understand how it is all implemented in python and pandas, which I is not necessary to understand. Just think of this as a specific use-case for functions that are used in agg()
.
2.
group
is just a variable name. When you use happy_grouped.agg(dif)
, you are basically saying that happy_grouped
has a set (or column, for simplicity here) of values. And for each value in that set/column, you are going to apply the function dif
to it. That value is the same as group
when dif
is called. It’s similar to -
def print_name(name):
print(name)
print_name("the_doctor")
So, when print_name
gets called, the argument name
takes the parameter "the_doctor"
and it prints that. So, the name
is the same as group
, and passing the value "the_doctor"
is the same as passing the value from happy_grouped
to the function in agg
.
If functions are confusing you, I would recommend going through the Python Missions on functions again.
3.
It’s just parentheses. You could write it as return group.max() - group.mean()
and it would mean the same. In this particular case, the parentheses doesn’t have any significance. It’s just one way you can write your code so that it’s clear enough to read that you are returning the difference between the maximum value and the mean value of the input, group
.
Do i understand correct that when i use any function in GroupBy.agg() method, the function takes values in the GroupBy object as arguments ?
Yes, that’s correct.
Thanks alot! Now its clear to me