Confused about axes in pandas

Screen Link: https://app.dataquest.io/m/293/data-cleaning-basics/10/dropping-missing-values

I expected df.dropna(axis=0) would remove columns containing NAN. This behavior of axes is quite the opposite when axes are used with methods like sum() and mean().

I have tried searching google, but didn’t found any satisfactory answer.

Here is one video, which demonstrates that the behavior of axes varies with the method being used.

1 Like

Hi @prateek:

Does this help clarify your doubt?


Like the article said you can use the alternative syntax to reduce your confusion.

For added clarity, one may choose to specify axis='index' (instead of axis=0 ) or axis='columns' (instead of axis=1 ).

2 Likes

Even If I use named parameters, the roles of axes seems to be reversed. Isn’t?

1 Like

What do you understand by the roles of the axes?

Are you referring to an x-y plot?

1 Like

Consider the following dataframe:

oo = pd.DataFrame([ [1, 2, 1], [3, 4, 3], [5, 6, 5], [7, 8, 7]  ])

When axis =0 is passed to oo.sum(axis=0), it works like this:

image

Howerver, when axis=0 is used with dropna() method it remove rows, that is:

image

These are two different directions. That’s what confuses me.

1 Like

Hello @prateek, when it comes to removing null values we refer to the axis we want to drop

  • 0, or ‘index’ : Drop rows which contain missing values.
  • 1, or ‘columns’ : Drop columns which contain missing value.

Actually it will depend with the method being used. Mostly for computation (mean, median etc…) it means:

  • Axis 0 will act on all the ROWS in each COLUMN
  • Axis 1 will act on all the COLUMNS in each ROW
4 Likes

Thanks for clarifying.

1 Like

Your reply was so immaculate <3

1 Like