Please Help me understand this logic

Screen Link:

My Code:

industry_usa = f500["industry"][f500["country"] == "USA"].value_counts().head(2)

sector_china = f500["sector"][f500["country"] == "China"].value_counts().head(3)

Question:
f500[“industry”][f500[“country”] == “USA”].

Can someone explain how we are able to get a series label of all the industries in the US?
Technically, its f500[" "][ ], I didn’t know we were allowed to put another bracket next to the first bracket.
I don’t think they taught us this method of data pulling? If not, can someone refer me to the exact lesson?

Sorry if my question is difficult to understand, but please help. Thanks!

Hi!

To return specific values in a dataframe you can use 2 following methods:

  1. df.loc[row, column]. I think this one was taught in first missions of this course.
  2. The one offered in the solution, with double brackets.
    By f500["industry"] you select a series of all industries in all countries, and by adding [boolean mask] you select specific values of the Series. Although I would put it vice versa: f500[boolean mask]["column"], first select specific rows and then tell which column you are interested in.

At this level there’s no difference which method to use although later you’ll see that in some cases you’d better use the classic df.loc[ row, column] one, for example to avoid SettingWithCopyWarning

3 Likes

Ohh this makes sense.
So you are saying f500[boolean mask]["column"] is the same concept as df.loc[row, column]?

Thank you!!

I’d better say that these are different concepts that lead to the same result.

2 Likes

Well, at first sight, I would tell you that both methods work, and for teaching purposes, I believe there is no difference.

However, when selecting rows and columns in pandas, as far as I learned, it is always better to use the .loc[row, column] format (even when using boolean masks). Why?

Because this method will leverage the vectorized computational power pandas and NumPy are built on. The other method, under the hood, utilizes repeated for loops that are computationally intense especially when you are dealing with way larger datasets than the one we use for learning purposes.

1 Like