Boolean Indexing final challenge

I am finding it hard to absorb the effect of numpy slicing and the consecutive boolean indexing on a particular variable.
For example in the final challenge when numpy slicing is applied on taxi to get trip_mph(which is only supposed to store speeds), how come another variable cleaned_taxi which is subset of taxi_mph is again used to fetch distance, length columns and methods are applied on them?
Why does cleaned_taxi have those columns in the first place?
someone explain plz.

2 Likes

Hi @aswadr093:

Please provide a mission link and format your code appropriately as per these guidelines so that we can better assist you.

Hi @aswadr093,

In that challenge 'cleaned_taxi' is actually not a subset of 'trip_mph' (which stores only speeds, you are right), but a subset of 'taxi' itself, to which was applied a boolean mask of 'trip_mph < 100'. It means that we extracted all the rows from 'taxi' where the speed is less than 100, creating in this way a new ndarray 'cleaned_taxi', and then applied all the other manipulations (calculating distance etc.) to this new ndarray.

Hope it was helpful.

1 Like

Thanks. This sounds plausible.

1 Like