Writing Assignment Differently

I had a question on the 7th mission for the “Exploring Data with Pandas: Intermediate.”

URL: https://app.dataquest.io/m/292/exploring-data-with-pandas%3A-intermediate/7/pandas-index-alignment

The correct code is:
previously_ranked = f500[f500["previous_rank"].notnull()]

rank_change = previously_ranked["previous_rank"] - previously_ranked["rank"]

My code (that I tried) was:
previously_ranked=f500.loc[f500.loc[:,"previous_rank"].notnull()]-f500.loc[f500.loc[:,"rank"].notnull()]

I ended up getting a (pretty lengthy) error message, and I don’t know why. Why can’t you combine these?

This will return DataFrame. You can cross check by printing it

print(f500.loc[f500.loc[:,"previous_rank"].notnull()])

So operation between DataFrame is not possible.

Hi @DishinGoyani,
Thanks for the reply. I just had another question about this (I’m not really understanding this mission).

What does previously_ranked["rank"] mean? Why do we put “rank” in brackets when the variable, previously_ranked has “previous_rank” in it? Like I said, I’m just really confused about this.

#  f500.loc[f500.loc[:,"previous_rank"].notnull()] break-down into 2 line for understanding

# 1st
# This is to filter-out null values from column `previous_rank`
# Will return `Series` containing `True` where value is null otherwise `False`
bool_idx = f500["previous_rank"].notnull()  

# 2nd
# Now use above boolean index `bool_idx` to get rows from `f500` dataframe that has non-null values.
# Will return dataframe; `previously_ranked` is same as `f500` but with only non-null values
previously_ranked = f500[bool_idx]

# Here we are subtracting `rank` column from the `previous_rank` column of `previously_ranked` dataframe.
# It will store difference between `previous_rank` and `rank` as series
rank_change = previously_ranked["previous_rank"] - previously_ranked["rank"]

# At last adding new column to original dataframe `f500`
f500["rank_change"] = rank_change

Hope it helps.

1 Like

Hi @DishinGoyani!
Good news, I think I understand now since you broke it up into two steps above. Bad news…one more question! You piqued my interest when you said series. Why couldn’t we do this:

previously_ranked = f500.loc[:,"previous_rank"].notnull()-f500.loc[:,"rank"].notnull()

These are series correct? Asked another way, why do we need to create that previously_ranked dataframe?

Yes they are series. but this line does not make sense here as our goal is to (As in mission instruction)

… to select all rows from f500 that have a non-null value for the previous_rank column…

And this line is definitely not doing that.


We don’t have we can do it like this (this similar to your first reply)

f500["rank_change"] = f500.loc[f500.loc[:,"previous_rank"].notnull(), "previous_rank"]-f500.loc[f500.loc[:,"previous_rank"].notnull(), "rank"]

But does it look good? or futher it make confusions so that is one of the reason we are using temporary dataframe previously_ranked.

1 Like

@DishinGoyani,
Thanks so much for your help and patience! I’m in a much better place with this mission thanks to you.

1 Like