The above code is found in the course materials [Exploring Data with pandas: Fundamentals - Page 6].
Can someone explain why “axis = 0” would return the median value in the columns “revenues” and “profits”? The “axis = 0” is really confusing me? I would have thought it should be “axis = 1”?

In fact, I found the code to be less confusing when “axis = 0” is left out, even though it is the default value when left out.

Would appreciate a clear explanation on this. Thank you.

I hope you know the index is the row label of the DataFrame. In many pandas functions by default axis=0. The argument specifies along which axis you want to perform a function. axis=0 and axis='index' are the same and also axis=1 and axis='columns' are the same

Yup, I’m aware that the index is the row label of DataFrame, and also that we can call using arguments axis=0 and axis='index' (which means the same thing), as well axis=1 and axis='columns'.

What is not apparently intuitive to me is that:
In the DataFrame, the values of the revenues of the various companies are all stored in the column “revenues”, and looking for the median of this collection of values should be just looking through all the values in this column (hence axis = 1 comes instinctively to my mind) and finding the median value.

I’m just trying to find a better explanation or understanding behind the code, otherwise I would just have to remember that it is just the way that this is done lol.

df = pd.DataFrame(
{'a':range(5), 'b':range(5,10), 'c':range(10,15)},
index=['index_'+str(x) for x in range(5)]
)

axis=0: What is the median Columnwise?

Suppose I want the median of every column, I will use axis=0 since this will act on all the ROWS in each COLUMN, in many pandas function without specifying the axis by default it will act on ROWS in each COLUMN

df.median()

The Output will be:

a 2.0
b 7.0
c 12.0
dtype: float64

See I’ve got the median of column a, b and c

df.median(axis=0)

The output will be as above

a 2.0
b 7.0
c 12.0
dtype: float64

let’s see the output of the medians of Column a and column b

medians = df[['a', 'b']].median(axis=0)
medians

The output

a 2.0
b 7.0
dtype: float64

axis=1: What is the median ROWwise?

To get the median across the ROWs I will set axis=1