How to read the custom function dif in the lesson

Screen Link:

In the lesson the custom function was created.

def dif(group):
    return(group.max() - group.mean())
happy_grouped.agg(dif)

But I tried to understand it, but failed.
Can someone explain how this function works?

1 Like

What exactly are you having trouble understanding?

  • Is it group?
  • group.max()?
  • max()?
  • group.mean()?
  • mean()?
  • group.max() - group.mean()?
  • .agg()
  • happy_grouped.agg(dif)

The reason why I have listed the above out is because that’s pretty much what you would get if you broke down the code, line-by-line. Which is essentially the first step to try and figure out what some piece of code is doing.

So, looking at the above individual parts of that code, what exactly are you having trouble understanding? Take your time, and think about it.

Also, share the link to the Mission. Otherwise it’s difficult to be able to refer to the source and provide an answer/have a discussion that can help you out.

1 Like
  1. When we code happy_grouped.agg(dif) we use .agg method to perform aggregation using dif function. But we didn`t pass any argument in dif. How could dif count something if no args were passed?

  2. What is ‘group’ in dif(group)? Is it pandas.core.groupby.SeriesGroupBy type?

  3. What does parentheses after return means?
    return ( group.max() - group.mean() )

https://app.dataquest.io/m/343/data-aggregation/9/introduction-to-the-agg-method

1 Like

1.
That’s something which is briefly touched upon in that Step itself -

Note that when we pass the functions into the agg() method as arguments, we don’t use parentheses after the function names. For example, when we use np.mean , we refer to the function object itself and treat it like a variable, whereas np.mean() would be used to call the function and get the returned value.

How that actually works comes down to trying to understand how it is all implemented in python and pandas, which I is not necessary to understand. Just think of this as a specific use-case for functions that are used in agg().

2.
group is just a variable name. When you use happy_grouped.agg(dif), you are basically saying that happy_grouped has a set (or column, for simplicity here) of values. And for each value in that set/column, you are going to apply the function dif to it. That value is the same as group when dif is called. It’s similar to -


def print_name(name):
    print(name)

print_name("the_doctor")

So, when print_name gets called, the argument name takes the parameter "the_doctor" and it prints that. So, the name is the same as group, and passing the value "the_doctor" is the same as passing the value from happy_grouped to the function in agg.

If functions are confusing you, I would recommend going through the Python Missions on functions again.

3.
It’s just parentheses. You could write it as return group.max() - group.mean() and it would mean the same. In this particular case, the parentheses doesn’t have any significance. It’s just one way you can write your code so that it’s clear enough to read that you are returning the difference between the maximum value and the mean value of the input, group.

2 Likes

Do i understand correct that when i use any function in GroupBy.agg() method, the function takes values in the GroupBy object as arguments ?

1 Like

Yes, that’s correct.

2 Likes

Thanks alot! Now its clear to me :slight_smile:

1 Like