Help needed to understand why this line of code doesnt work in this case

<Learn data science with Python and R projects

cleaned_taxi=taxi[bool_taxi,[7,8,13]]

I really don’t understand why when I run this particular code it gives an error in this case. I just wanted to create a cleaned_taxi array where the condition(Boolean array) applies to columns with this specific index7,8,13

the same idea was used in the instance in lesson 5 of the Boolean indexing with Numpy

# create a boolean array for trips with average
# speeds greater than 20,000 mph
trip_mph_bool = trip_mph > 20000

# use the boolean array to select the rows for
# those trips, and the pickup_location_code,
# dropoff_location_code, trip_distance, and
# trip_length columns
trips_over_20000_mph = taxi[trip_mph_bool,5:9]

print(trips_over_20000_mph)

You will have to be more specific with the code you share so that others can help you out. Right now -

  • It’s unclear what bool_taxi is
  • You haven’t specified the error that you get either.

But regardless of the above, numpy allows us to select columns within a range like in the example with 5:9 but selecting specific columns becomes trickier and it’s not as straightforward. You don’t have to jump into the actual technical details as of now, but these posts share possible solutions to this that you can use -

To keep it simple, look into the np.ix_ method mentioned in the above resources.

trip_mph = taxi[:,7] / (taxi[:,8] / 3600)
bool_taxi=trip_mph<100
print(bool_taxi.shape)

#cleaning
cleaned_taxi=taxi[bool_taxi,(7,8,13)]`Preformatted text`

the error message
Traceback (most recent call last):
cleaned_taxi=taxi[bool_taxi,(7,8,13)]
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (2004,) (3,)

The recommended references helped solved my issues. Thanks