Confusion over numpy.delete

Screen Link:

Boolean indexing with numpy exercise


My Code:

rip_mph = taxi[:,7] / (taxi[:,8] / 3600)
mph_cut_bool=trip_mph>=100
cleaned_taxi=np.delete(taxi,mph_cut_bool,axis=0)
print(taxi.shape,np.sum(mph_cut_bool),cleaned_taxi.shape)

a=np.arange(12).reshape(3,4)
atest=np.delete(a,[False,False,False],axis=0)
print(a)
print(atest)

What I expected to happen:

(2013,15) 9 (2004,15)
[[0,1,2,3],
[4,5,6,7],
[8,9,10,11]]
[[0,1,2,3],
[4,5,6,7],
[8,9,10,11]]

What actually happened:

(2013, 15) 9 (2011, 15)
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
[[ 4  5  6  7]
 [ 8  9 10 11]]

I’m testing using numpy delete with a boolean array. I’m not very familiar with numpy delete because my background is as a C/C++ programmer prior to python programming and my impression was that the main benefit of numpy was to use fixed-size arrays for efficiency much like using an array in C++ rather than a vector. However, learning numpy delete is worthwhile and seems to be required for this exercise.

I was struggling to get consistent array sizes in my deletes of the taxi array, because if I am deleting 9 elements that match the condition I’m deleting, the array size should go from 2013 rows to 2004 rows. So I decided to test deletion using booleans with a simpler array. I was looking at an example of using delete here (that I thought was the documentation, but it is not). Not the documentation, another site They didn’t have an example of using booleans, but they set up an array using arange and reshape. Fine, I’ve used those before to initialize arrays. So I did that using their example, then tested the boolean array for deletion. [True, False, True] produced the expected output. Only the middle row remained. [False, False, True] produced surprising output. Only the middle row still remained. [False, False, False] kept two rows. At this point I was baffled. I tried resetting the online programming environment but the problem remained. Is anyone familiar with using booleans in delete? I can think of other ways to solve this problem using a for loop, but I’m wondering if there is a better way since this section is about boolean masks.

1 Like

Doing this on my cellphone right now, but this solves the problem. Rather than using a Boolean mask in the delete function, use a Boolean mask to create a list of indices to pass to the delete function.


2 Likes

it looks like this happens because [False, False, False] is interpreted as [0, 0, 0] which deletes the 0th row instead of none as i expected. [True, False, True] is interpreted as [1,0,1] which deletes the 0th and first row in contrast to the 0th and 2nd i expected.

1 Like