Use boolean to filter the data set

I dont understand the usage of filter dataset by boolean.

In the below code, we try to see if one series of killed datasets equals to another series that does not belong to any dataset. All return is False. Then how can we use an individual boolean we just created to filter killed dataset? I have a difficulty in understanding it. Thank you for helping in advance!!

Screen Link: https://app.dataquest.io/m/370/working-with-missing-data/2/verifying-the-total-columns

My Code:

killed_cols = [col for col in mvc.columns if 'killed' in col]
killed = mvc[killed_cols].copy()

killed_manual_sum=killed.iloc[:,0]+killed.iloc[:,1]+killed.iloc[:,2]

killed_mask=killed['total_killed']!=killed_manual_sum
killed_non_eq=killed[killed_mask]

First, make sure that all the values in the mask are actually False or not.

Then, go through the Content in that Step again. Focus specifically on -

If you think about it, the total number of people killed should be the sum of each of the individual categories. We might be able to “fill in” the missing values with the sums of the individual columns for that row.

And see how the above relates to the mask you are creating.

The return is not all False, but how can boolean filter the killed dataset? waht’s the connection between killed_mask and killed dataset here?

The return is not all False, but how can boolean filter the killed dataset? what’s the connection between killed_mask and killed dataset here?

Did you go through the content again? It’s explained what we are trying to do -

Let’s look at how we could explore the values where the total_killed isn’t equal to the sum of the other three columns.

And why we are trying to do the above -

but the total_killed column has five missing values.
.
.
.
If you think about it, the total number of people killed should be the sum of each of the individual categories. We might be able to “fill in” the missing values with the sums of the individual columns for that row.

Does that help clear things up?

You are right, killed_manual_sum is a manually created series. But in order to create it we used the data from the killed dataframe, which means that it inherited row indexes from the killed dataframe. So,killed_manual_sum is a separate series that doesn´t belong to any dataframe, but it shares the same row indexes with the killed dataframe, which lets us use it to create a boolean mask for the killed df.

1 Like