About numpy shape and concat

Backing up previously asked question in our previous slack forum:

Mission link: https://www.dataquest.io/m/290/boolean-indexing-with-numpy/6/assignment-using-boolean-arrays

zeros = np.zeros([taxi_modified.shape[0], 1])
taxi_modified = np.concatenate([taxi, zeros], axis=1)

Hi friends, the code above is confusing. i played on Spyder and then saw some links but the confusion compounded.

In this code, what does np.zeros exactly do to the taxi_modified.shape[0]?

I know it converts contents to zero, but I am not sure why we have to select 0th index via shape? also, in the second line, axis=1 is not summing or aggregating so it must refer to creating changes along columns?

Discussion 1

Try running

print(np.zeros([3, 1]), np.zeros([5,2]), sep="\n")

Do the results help you figure out what’s going on? Do not forget that taxi_modified.shape[0] is just a number.

Regarding the second line of code, try the following:

x = np.array([[1, 2], [3, 4]])
y = np.array([[5, 6]]).T
print(x, y, sep="\n")
print("The concatenation of x and y is\n{}".format(np.concatenate([x, y], axis=1)))

Hopefully seeing these spelled out examples will make you understand it without further explanation.

Discussion 2

i guess it is clear to the extent that in that array, we had to deliberately create a zero column so as to pair it with a non-zero column to show which row’s drop-off location was an airport?

This is not what’s happening — you’re creating a column of zeros, concatenating it to the original array, then using boolean indexing to change the values in the column to 1’s where the dropoff location is an airport.

kinda shaky why a true/false condition should be created on an array… but thanks for your explanation all the same.

Binary columns like these are actually very common — you’ll see more of this when you get into machine learning.

Discussion 3

shape gives the dimension of the DataFrame. shape[0] gives the row.

For numpy.concatenate:

Grows resultant ndarray vertically
For example: c = numpy.concatenate((a,b), axis=0)
axis=0 => concatenate by row. Therefore, columns must aligned. That is, the shape[1] must match between the sequence of array-like objects, e.g. a.shape[1] == b.shape[1]. The resultant c.shape[1] = a.shape[1] = b.shape[1].

Grows resultant ndarray horizontally
For example: c = numpy.concatenate((a,b), axis=1)
axis=1 => concatenate by column. Therefore, rows must aligned. That is, the shape[0] must match between the sequence of array-like objects, e.g. a.shape[0] == b.shape[0]. The resultant c.shape[0] = a.shape[0] = b.shape[0].

numpy.vstack(a, b) is similar to numpy.concatenate((a,b), axis=0). That is, the operation to grow resultant array vertically (by row.)

numpy.hstack(a, b) is similar to numpy.concatenate((a,b), axis=1). That is, the operation to grow resultant array horizontally (by column.)