Why df.shape don't need a () & difference between sum() and .sum()

1. Why sometimes you need to put a () after to make the code work while other cases you don’t?
e.g. If you need to check the head of a table, you put df.head()
but when you want to check its shape, you could only use df.shape

----> 6 combined.shape()

TypeError: ‘tuple’ object is not callable

2. why for some cases, when you want the sum things up, you use sum(xxx) - put things inside the parenthesis; while sometimes sum() is appended after xxx?

outliers_low = sum(wnba[‘Games Played’] < lower_bound)


3. Similar to (2), why is “mean” sometimes used as df. mean(), while sometimes it’s used as np. mean?

class_size = class_size.groupby(‘DBN’).agg(np.mean)

Also, when should we use ‘np’ here before the mean?

Thank you so much!!

You would have learned this in the Object-oriented Python lesson. Classes can have methods and attributes. Methods are just like functions and attributes are just like variables storing something in them.

For a DataFrame, head() is a method/function and shape is an attribute.

This, again, delves more into how those libraries are implemented and goes well beyond the level of understanding you would require.

A very broad overview is that in Python you have built-in functions you have read and worked with, like sum() is one. When you import a library like Pandas in your code, those built-in functions are used differently. So, sum(something) and something.sum() work the same. This is a very broad overview and not quite correct.

It’s pretty complicated to explain this more because this deals with a side of software engineering we don’t need to focus on. Over time you will intuitively start using these functions/methods without much issue.

These two posts try to explain this (but with some contradictory terminology) if you do want to understand things more -

Those are two different use cases

The first one is finding the mean() across the DataFrame df. The second one is applying the mean to the DataFrame after it has been grouped by DBN.

You could also instead do something like -

And the above will work the same as -

But the latter offers varying flexibility because of the aggregate function, agg(). For that, you can check the documentation of the function to see what else it can be used for.

Yes. Otherwise, you will get an error because mean is not a built-in function in Python. So, you need to use the one that numpy has instead.

Weirdly, Pandas has a mean function as well, but we can’t use pd.mean instead of np.mean and there’s no clear answer that I could find on why. But some issues like these are expected to crop up because Pandas is developed on top of Numpy.

Over time, you will get used to some of these common use-cases and accept them as is, and for some weird ones, you will have to learn to find solutions online for which websites like stackoverflow could help. But, you don’t need to understand how everything works “underneath”.