Industry_usa question

Hello,
I had a question about the 12th mission of the “Exploring Data with pandas: Fundamentals.”

URL: Learn data science with Python and R projects

This is the correct answer:
industry_usa = f500["industry"][f500["country"] == "USA"].value_counts().head(2)

I have NO idea how this is the correct answer. I was struggling with this and when I looked at the answer, again, I have no idea how this is correct. I thought there would be a comma somewhere but it seems like these are two series back-to-back. I can’t find anything that looks like this in previous missions. Can someone explain what this is?

1 Like

This is called chained indexing i don’t remember mission if there any.

It will treat is as linear operations, they happen one after another. First it will run f500["industry"] and return first Series and then second operation [f500["country"] == "USA"] will apply on Series return by first operation.

first_series = f500["industry"]
second_series = first_series[f500["country"] == "USA"]
industry_usa  = second_series.value_counts().head(2)

Above code is same as

industry_usa = f500["industry"][f500["country"] == "USA"].value_counts().head(2)
3 Likes

Hi @DishinGoyani,
Is chained indexing the same as method chaining? If yes, I thought it involved two methods. Something like X.value_counts().head() [where X is a series].

You seem to be saying that we can have two series side-by-side? In the form of XX.value_counts().head()?

1 Like

Yes chaining concept are same. In method chaining you can use multiple methods side-by-side and In index chaining you can do multiple indexing side by side.

Yes that is also right.

No we can not have two series side-by-side like XX.value_counts().head().

Index chaining means we can have indexing side-by-side like
X[some_index][some_index].value_counts().head() It does not mean series like XX.value_counts().head().

2 Likes

@DishinGoyani,
Thank you again for your help on this.

1 Like

Hi! This is a great explaination, but I’m wondering why rearranging the code does not work if it runs sequentially. For example,

industry_usa = [f500[“country”]==“USA”]f500[“industry”].value_counts().head(2)

1 Like

HI Dishin,

Wanted to clarify something in this thread as it seems to be relevant to my question. This pertains to screen #12
So, we learn from screen #10 how to boolean index (specify where value(s) are located), following this format:

df[bool, col]

eg. f500.loc[f500[“previous_rank”] == 0, “previous_rank”] = np.nan
eg. motor_countries = f500.loc[f500[“industry”] == “Motor Vehicles and Parts”, “country”]

but, in chain indexing as you explained, the format is different:

df[col][bool].method()

So, am I correct in stating that in cases when we are selecting values to run through a method/function eg. mean(), max(), etc. the second format is always used?
The first is used more for assigning and changing values?