Why use the index, 0, instead of the lable, "0"

Screen Link: https://app.dataquest.io/m/381/exploring-data-with-pandas%3A-fundamentals/5/method-chaining

My Code:

zero_previous_rank = f500.loc[:,"previous_rank"].value_counts().loc['0']

What I expected to happen:

I expected an output of 33, but got a keyring error. I don’t understand why it wouldn’t accept “0” which looks like the label of row that I want when I look at

f500.loc[:,"previous_rank"].value_counts()

It seems to me that the correct answer (below) could give a wrong answer (though happens not to) as we don’t know a priori that the “0” row is in the 0 index. For example the “159” row has index 1.

zero_previous_rank = f500["previous_rank"].value_counts().loc[0]

Hi @ThompsonLBen,

Welcome to the Community!

Everything is ok with your piece of code, except for that 0. Here it’s the index of the first row of the series f500.loc[:,"previous_rank"].value_counts()
just according to your code. This 0 is an integer, not a string type here. Hence, you should use loc[0] to select the first (well, actually the “zero-th” row, considering Python’s indexing) of that series.

1 Like

Thanks @Elena_Kosourova !

I’m still a little confused. The line

zero_previous_rank = f500.loc[:,"previous_rank"].value_counts().loc[490]

returns 1, despite the fact that the series is shorter than 490. So it seems like it is returning the element with the label 490 not with index 490 (because then it should throw an error).

So say I for some reason wanted the element with index 1. It seems like the only way to do this would be to print

f500.loc[:,"previous_rank"].value_counts()

which gives

0      33
159     1
147     1
148     1
149     1
       ..
321     1
322     1
323     1
324     1
235     1
Name: previous_rank, Length: 468, dtype: int64

then observe that the 2nd “row” is labeled with 159 and then evaluate

f500["previous_rank"].value_counts().loc[159]

So is there a better way to select from a series by index? Or am I wrong in thinking that loc is using the label and not the index - though in that case how would I look up by label (if the label is an int).

Thanks :slight_smile:

Hi!

I´ll start with your last question:

The series f500["previous_rank"].value_counts() is not very illustratuve, so let´s consider another one whose labels are also integers.

f500['years_on_global_500_list'].value_counts()

which output is

23    177
1      22
2      21
6      20
5      18
     ... 
10     11
20     11
14     11
15     11
11      9
Name: years_on_global_500_list, Length: 23, dtype: int64 

Try the following methods on your terminal and what they return:

  1. f500['years_on_global_500_list'].value_counts().loc[6]
  2. f500['years_on_global_500_list'].value_counts().iloc[6]
  3. f500['years_on_global_500_list'].value_counts()[6]

Can you tell now what method selects an element of a series by its label and which one by its index?

If it seems somewhat complicated, don´t worry. Selecting by index is covered in the next course of the path, Exploring Data with pandas: Intermediate.

And now coming back to your original question:

Why use the index, 0, instead of the lable, “0”

I guess you were confused by the fact that in this mission .loc[] takes as an argument an integer and not a string. But no matter what type of value you pass it´s still going to be a label, not an index. pandas.Series.value_counts() return a Series where its values are counts of unique values and its labels are those unique values of the Series on which you apply the .value_counts(). So, if the original values were of a string type, the labels of the .value_counts() Series will be of a string type, and if they were integers, they will continue to be integers.

1 Like