Series.isnull() and use of brackets

What technique is used to come up with this answer?

null_previous_rank = f500[f500["previous_rank"].isnull()][["company", "rank", "previous_rank"]]

To be more specific, I dont exactly understand the use of two sets of brackets that I broke up like this:
f500[f500["..."].isnull()], and [["...", "...", "..."]]

https://app.dataquest.io/m/292/exploring-data-with-pandas%3A-intermediate/5/using-pandas-methods-to-create-boolean-masks

2 Likes

Hello @igor.amelichev,
the first one is a boolen indexing technique. With it, you select elements with True and False values.

In the second one, you are giving a list as an index instead of a single value.

2 Likes

Hi @fedepereira,
Thanks. What confused me is that we wrapped individually the boolean indexing and then chose specific columns as a list also within its own set of brackets.

1 Like

you don’t have to, that’s method chaining. it saves you the stress of long code

This exercise also confused me. I would add a comma between the two groups, like this:

null_previous_rank = f500[f500["previous_rank"].isnull(),["company", "rank", "previous_rank"]]

Why isn’t this possible?

1 Like

Hi @Amaryllis,
instead of that, I would recommend to organize the code in a way that is highly readable, for example:

#First you create helpful variables
filtered_rows = f500["previous_rank"].isnull()
filtered_columns = ["company", "rank", "previous_rank"]
#Then you can access the data by using those variables
null_previous_rank = f500[filtered_rows][filtered_columns]

You can access also with the .loc property:

null_previous_rank = f500.loc[filtered_rows, filtered_columns]
7 Likes

This was my first approach as well. It seems like the method chaining kind of came out of nowhere.

1 Like

Right, we always used commas so far in the course and now all of a sudden it doesn’t work anymore!

2 Likes

Thanks !!! :grinning:
Worked for me. Have you ever tried it for column ? Logic should work.

1 Like

Hello,

Yes as in the last exercise of the last step the use of

result = df[filterRows][filterColumns]

comes really out of nowhere.

I’m going to check the documentation on it but a word about it to introduce this method would come handy :slight_smile:

Hi @yolann.sabaux

I’m not sure if I understood your question. But this is what happened here.

The below mentioned is the code that needed some explanation

And it was re-written like this.

So in the above code, it was rewritten into smaller codes for better readability.


If you want it to breakdown the code

filtered_rows = f500["previous_rank"].isnull()

The above line creates a boolean list with True values when f500["previous_rank"] is null and this bool is stored at filtered_rows.

filtered_columns = ["company", "rank", "previous_rank"]

Now filtered_columns saves the names of the columns we need to work on.

Now the next line…

null_previous_rank = f500[filtered_rows][filtered_columns]

This code can be again broken down to 2 parts.

f500[filtered_rows] this part outputs a portion of f500 data set after applying the boolean indexing because of the bool values stored in filtered_rows.

So now we have a small portion of f500 dataframe. Now by using [filtered_columns] we are going to select those columns present in this dataframe.

The result will be those columns of f500 to which f500[‘previous_rank’] has null values.

Were you able to understand the explanation?

@yolann.sabaux @jithins123

I would actually try to avoid this double bracket notation [][] alltogether and stick with df.loc[] and df.iloc[] because than it is crystal clear from the syntax that your accessing specific dataframe rows and columns. Syntactically [][] could also mean a lot of other things depending on the preceeding variable. I personally only use the df[] notation. And this when I want to access specific columns (or drop the non-referenced ones)

It also leeds to issues, if the problem you are trying to solve gets more complicated. For instance, if you are storing several dataframes in a dictionary, then you would already need 3 pair of consecutive brackets. One pair to access the dataframe in the dictionary, another pair for the rows and the last pair for the columns. This being said, I think it is nonetheless good practice to store row and column filters in separate variables first and combine them in one df.loc[] call. This is especially helpful for more complex filter conditions. Example:

filtered_rows = f500["previous_rank"].isnull()
filtered_columns = ["company", "rank", "previous_rank"]

null_previous_rank = f500.loc[filtered_rows, filtered_columns]

Best
htw

Hello,

Thank you for your answer.

I actually don’t have an issue to understand the logic. I just find quite disturbing to introduce the use of the double brackets out of nowhere (and I couldn’t find the use of the double bracket in the documentation); there is no explicit link between the two brackets (no parenthesis or whatsoever).

But as @htw said, I think I will stick to classical use of the df.loc[] and df.iloc[] methods.

Thank you both of you for taking the time.

Bien à vous,
Y

@yolann.sabaux

[["company", "rank", "previous_rank"]] == [filtered_columns]

because filtered_columns = ["company", "rank", "previous_rank"]

Hence double brackets

This method doesn’t seem to appear in the documentation that you linked to:
null_previous_rank = f500[filtered_rows][filtered_columns]

Please can you confirm which section to review?

The same for me as well. The method chaining wasn’t really shown too well in the example. It’s just a variable and can be tricky to see and traceback.