How To Interpret Pandas Axis?

Pandas axis can be confusing to understand. So let me make it easier for you. If you use 0, the operation will be performed across rows, and if you use 1, the operation will be performed across columns. Here is an example:

import numpy as np
import pandas as pd
df = pd.DataFrame({
    'A': [1, 2, 3, np.nan],
    'B': [4, 5, 6, 7],
    'C': [8, 9, np.nan, 1],
    'D': [1, 2, 4, 3]
})
df
     A  B    C  D
0  1.0  4  8.0  1
1  2.0  5  9.0  2
2  3.0  6  NaN  4
3  NaN  7  1.0  3

Let’s try using axis = 0 first:

print(df.dropna(axis=0))
     A  B    C  D
0  1.0  4  8.0  1
1  2.0  5  9.0  2
print(df.sum(axis=0))
A     6.0
B    22.0
C    18.0
D    10.0
dtype: float64

As you can see in the case of pandas.DataFrame.dropna, it dropped all rows with NaN values. And for pandas.DataFrame.sum, it added values for each column. However, I think this is what creates the confusion. Instead of looking at what values are affected, we can look at the direction at which the operation is performed.

If we use, axis = 0, the operation will be performed in the bottom direction.

 |      A  B    C  D
 | 0  1.0  4  8.0  1
 | 1  2.0  5  9.0  2
 | 2  3.0  6  NaN  4
 | 3  NaN  7  1.0  3
 v

That is if we use pandas.DataFrame.dropna, it will check each row for NaN values and drops it and, if we use pandas.DataFrame.sum function, it will add rows together.

If we use, axis = 1, the operation will be performed in the right direction.

----------------->
     A  B    C  D
0  1.0  4  8.0  1
1  2.0  5  9.0  2
2  3.0  6  NaN  4
3  NaN  7  1.0  3

That is if we use pandas.DataFrame.dropna, it will check each column for NaN values and drops it and if we use pandas.DataFrame.sum function, it will add columns together.

print(df.dropna(axis=1))
   B  D
0  4  1
1  5  2
2  6  4
3  7  3
print(df.sum(axis=1))
0    14.0
1    18.0
2    13.0
3    11.0
dtype: float64
4 Likes

Thank you @Sahil after reading the post I’m less confused about the pandas axis.

1 Like