I am finding it hard to absorb the effect of numpy slicing and the consecutive boolean indexing on a particular variable.

For example in the final challenge when numpy slicing is applied on taxi to get trip_mph(which is only supposed to store speeds), how come another variable cleaned_taxi which is subset of taxi_mph is again used to fetch distance, length columns and methods are applied on them?

Why does cleaned_taxi have those columns in the first place?

someone explain plz.

2 Likes

Hi @aswadr093:

Please provide a mission link and format your code appropriately as per these guidelines so that we can better assist you.

Hi @aswadr093,

In that challenge `'cleaned_taxi'`

is actually not a subset of `'trip_mph'`

(which stores only speeds, you are right), but a subset of `'taxi'`

itself, to which was applied a boolean mask of `'trip_mph < 100'`

. It means that we extracted all the rows from `'taxi'`

where the speed is less than 100, creating in this way a new ndarray `'cleaned_taxi'`

, and then applied all the other manipulations (calculating distance etc.) to this new ndarray.

Hope it was helpful.

1 Like

Thanks. This sounds plausible.

1 Like