Exploring Data with pandas: Fundamentals - Page 6

medians = f500[[“revenues”, “profits”]].median(axis = 0)

we could also use .median(axis=“index”)

The above code is found in the course materials [Exploring Data with pandas: Fundamentals - Page 6].
Can someone explain why “axis = 0” would return the median value in the columns “revenues” and “profits”? The “axis = 0” is really confusing me? I would have thought it should be “axis = 1”?

In fact, I found the code to be less confusing when “axis = 0” is left out, even though it is the default value when left out.

Would appreciate a clear explanation on this. Thank you.

1 Like

Hello @looc

I hope you know the index is the row label of the DataFrame. In many pandas functions by default axis=0. The argument specifies along which axis you want to perform a function. axis=0 and axis='index' are the same and also axis=1 and axis='columns' are the same

  • Axis 0 will act on all the ROWS in each COLUMN
  • Axis 1 will act on all the COLUMNS in each ROW
2 Likes

Hi Victoromondi,

Thank you for your reply.

Yup, I’m aware that the index is the row label of DataFrame, and also that we can call using arguments axis=0 and axis='index' (which means the same thing), as well axis=1 and axis='columns'.

What is not apparently intuitive to me is that:
In the DataFrame, the values of the revenues of the various companies are all stored in the column “revenues”, and looking for the median of this collection of values should be just looking through all the values in this column (hence axis = 1 comes instinctively to my mind) and finding the median value.

I’m just trying to find a better explanation or understanding behind the code, otherwise I would just have to remember that it is just the way that this is done lol.

1 Like

This will return the median of every row.

  • Axis 1 will act on all the COLUMNS in each ROW

Let me explain this below:

df = pd.DataFrame(
    {'a':range(5), 'b':range(5,10), 'c':range(10,15)}, 
    index=['index_'+str(x) for x in range(5)]
)

axis=0: What is the median Columnwise?

Suppose I want the median of every column, I will use axis=0 since this will act on all the ROWS in each COLUMN, in many pandas function without specifying the axis by default it will act on ROWS in each COLUMN

df.median()

The Output will be:

a     2.0
b     7.0
c    12.0
dtype: float64

See I’ve got the median of column a, b and c

df.median(axis=0)

The output will be as above

a     2.0
b     7.0
c    12.0
dtype: float64

let’s see the output of the medians of Column a and column b

 medians = df[['a', 'b']].median(axis=0)
 medians

The output

a    2.0
b    7.0
dtype: float64

axis=1: What is the median ROWwise?

To get the median across the ROWs I will set axis=1

df.median(axis=1)

The output will be:

index_0    5.0
index_1    6.0
index_2    7.0
index_3    8.0
index_4    9.0
dtype: float64

It has given me the median of every ROW.

I hope it is clear now.


Always REMEMBER:

  • Axis 0 will act on all the ROWS in each COLUMN
  • Axis 1 will act on all the COLUMNS in each ROW
2 Likes

Yes, very clear now! Thank you for taking the time and patience to explain it to me.

1 Like

Hi @looc, if Victor’s answer helped you then please mark it as a solution for the benefit of the rest of the community as well.

1 Like