Where does the information on to-be-compared column come from?


quick question to clarify the functioning of boolean indexing.
For this exercise I do not understand where in the code I transfer the information that the tip_bool array should work to filter out trips with less than 50 tip.
In some way I need to tell the code to compare the tip_bool array with the column tip_amount as I am working on the full array taxi.

Is this information implicitly contained in the tip_bool array?

tip_amount = taxi[:,12]
tip_bool = tip_amount > 50
top_tips = taxi[tip_bool,5:14]

tip_bool is a boolean array, i.e, it contains trues and falses, you can use a boolean array to index an array , or a Series or a DataFrame as long as the they have the same number of rows.

Thanks for your quick answer.
I do understand this.

As far as my understanding goes, with tip_bool I create - as you said - a boolean series with trues and falses. My question is where the information comes from that this is applied to the column where I have the data on the tip amount. Why isn’t this applied to the column with the length of the trip?

so here you selected every row in the 13th column (column in index 12)

Then after selecting the column you checked every element in the top_amount array if they are greater than 50. If an element is greater than 50 it will be True else False, hence making tip_bool a boolean array.

Using the boolean array (tip_bool) you selected all rows from taxi with values tip amounts of more than 50 , and the columns from indexes 5 to 13 inclusive.

Hope it’s clear now

1 Like

Getting there :wink:
So my understanding is that tip_bool “carries” the information that it references to the 13th column.

So what would happen if I would delete the 13th column with the data on the tip amount making another column the 13th column and afterwards run this operation:

top_tips = taxi[tip_bool,5:14]

Would the operation be applied to the “new” 13th column or would I produce some kind of error as the reference for tip:bool is missing?

As long as you won’t interfere with the tip_bool same rows will be selected. What matters is that the boolean array must have the same number of rows with the data.

But from which column would these rows be selected when I deleted exactly this column where the initial reference of the tip_bool array was made to?

Think I solved my issue :wink: Thanks!