What is the difference between these two code

industry_usa = f500.loc[f500[“country”] == “China”,“sector”].value_counts()


sector_china = f500[“sector”][f500[“country”] == “China”].value_counts()

it shows the same result but I don’t understand the way to write in the second one.

When using loc you can specify the rows and columns you want to access. So, the above results in all rows from the "sector" column, but where the country is "China".

So, if you had a simple example like -

country sector
China Energy
India Education
USA Wholesale
China Finance

running the above code would only return -


Because only the rows with the sector - "Energy" and "Finance", corresponded to the country "China"

Now, the concept for the following is similar

Instead of using loc you are chaining different approaches.

f500[“country”] == “China”

will return a Series with just one column where each row is either True or False. The boolean corresponds to whether or not that row’s country was China.


just gives you the entire column sector. When you combine/chain the two -

f500[“sector”][f500[“country”] == “China”]

It’s the same operation. You access rows in the column sector for which the country was China.

In Pandas, there is more than one way to index data from a dataframe. They have an entire page dedicated to covering this that I would recommend checking out - Indexing and selecting data — pandas 1.3.1 documentation

Why I can’t see or seem like I can’t remembered that I learn to write the second concept? I understand but I am a bit surprise like I have never learn or see this thing before.

Thanks and greatly appreciated your inputs!

1 Like

It would be difficult to go through the content and try to find this. But I do think this is covered in one way or another. Even if not, it’s difficult to try and cover everything there is related to Pandas in the content, so it’s even better to keep learning as you get new information by asking questions (just like you did)! :grinning_face_with_smiling_eyes:

Glad I could help!