What is the logic here in agg function?

I dont really understand the logic relationship among below 3 lines. First, we defined a function called dif. Then we used this function on mean_max_dif and returns one column. In other posts, community mandatory has explained that if we did not put() after function, which means it called itself. what does itself mean here? if we dont put any parameter, it means no value, correct?
Happy_mean_max links to the other 2 lines? why we need it. Thank you!!

My Code:

def dif(group):
    return (group.max() - group.mean())

happy_mean_max=happy_grouped.agg([np.mean, np.max])

Let me use my code to explain.

One good thing about agg is that you can pass multiple functions through it and it does all the calculations once.

You group happiness2015 by Region and you take the Happiness Score column from it as happy_grouped, which makes happy grouped a Series.

There is no in-built function that can calculate and return the difference between the max and the mean. So we designed this function dif, telling us that we can use custom fuctions of our own.

happy_mean_max is the mean and maximum values of Happiness Score from the respective regions. So we expect as many rows as there are regions. The dif function is also passed into agg along side mean and max for mean_max_dif.

import numpy as np
grouped = happiness2015.groupby('Region')
happy_grouped = grouped['Happiness Score']

def dif(group):
    return (group.max() - group.mean())

happy_mean_max = happiness2015.groupby('Region')['Happiness Score'].agg((np.mean, np.max))

mean_max_dif = happiness2015.groupby('Region')['Happiness Score'].agg((np.mean, np.max, dif))['dif']
1 Like

Your explanation is clear, thank you! but one last thing i need to understand about the last line. in agg() you put dif already, why still need to put['dif']at the end.Thank you again!

mean_max_dif = happiness2015.groupby('Region')['Happiness Score'].agg((np.mean, np.max, dif))['dif']

The question only requires me to return the dif table