To be more specific, I dont exactly understand the use of two sets of brackets that I broke up like this: f500[f500["..."].isnull()], and [["...", "...", "..."]]
Hi @fedepereira,
Thanks. What confused me is that we wrapped individually the boolean indexing and then chose specific columns as a list also within its own set of brackets.
Hi @Amaryllis,
instead of that, I would recommend to organize the code in a way that is highly readable, for example:
#First you create helpful variables
filtered_rows = f500["previous_rank"].isnull()
filtered_columns = ["company", "rank", "previous_rank"]
#Then you can access the data by using those variables
null_previous_rank = f500[filtered_rows][filtered_columns]
f500[filtered_rows] this part outputs a portion of f500 data set after applying the boolean indexing because of the bool values stored in filtered_rows.
So now we have a small portion of f500 dataframe. Now by using [filtered_columns] we are going to select those columns present in this dataframe.
The result will be those columns of f500 to which f500[‘previous_rank’] has null values.
I would actually try to avoid this double bracket notation [][] alltogether and stick with df.loc[] and df.iloc[] because than it is crystal clear from the syntax that your accessing specific dataframe rows and columns. Syntactically [][] could also mean a lot of other things depending on the preceeding variable. I personally only use the df[] notation. And this when I want to access specific columns (or drop the non-referenced ones)
It also leeds to issues, if the problem you are trying to solve gets more complicated. For instance, if you are storing several dataframes in a dictionary, then you would already need 3 pair of consecutive brackets. One pair to access the dataframe in the dictionary, another pair for the rows and the last pair for the columns. This being said, I think it is nonetheless good practice to store row and column filters in separate variables first and combine them in one df.loc[] call. This is especially helpful for more complex filter conditions. Example:
I actually don’t have an issue to understand the logic. I just find quite disturbing to introduce the use of the double brackets out of nowhere (and I couldn’t find the use of the double bracket in the documentation); there is no explicit link between the two brackets (no parenthesis or whatsoever).
But as @htw said, I think I will stick to classical use of the df.loc[] and df.iloc[] methods.