Which is most popular airport challlenge NumPy array question

I do not understand why jfk.shape returns the tuple (11832,15)

jfk = taxi[taxi[:,6] == 2]
jfk_count = jfk.shape[0]
print(jfk.shape)

We have extracted all rows from the column with index 6 where the boolean condition ==2 is met and hence it is 11832 rows of the total --89560 --returned by taxi.shape — (89560, 15).

I believe we have extracted a 1-d array from the 2-d array ‘taxi’ and assigned it to the variable ‘jfk’. So then since ‘jfk’ holds only one column of the 15, when I inspect its dimensions I expected to get a one-tuple returned : (11832, ) , but I got (11832, 15) indicating it is the whole 2-d array ‘taxi’ (although with rows filtered) with dimensions 11832 x 15 and not its subset.

Could this be explained please?

https://app.dataquest.io/m/290/boolean-indexing-with-numpy/9/challenge-which-is-the-most-popular-airport

Hey @jamesberentsen

The reason behind the issue in your question is , you are printing the (jfk.shape) instead of Jfk_count which is the expected one to print in order to see the total count of rows for the condition which we have given above. So the code should follow like this:

jfk = taxi[taxi[:,6] == 2]
jfk_count = jfk.shape[0]
print(jfk_count) -> You will now see the tuple with row values only.

Please note that when you print (jfk.shape) it will print the output for whole 2d-array dimensions along with both rows and columns.

I hope this helps
Best
K!

I believe we have extracted a 1-d array from the 2-d array ‘taxi’ and assigned it to the variable ‘jfk’. -> No , You have selected all the rows meeting a certain condition from column with index 6.
So all 11832 rows out of total 89560 are returned along with all 15 columns

jfk = taxi[taxi[:,6] == 2]

is like saying

jfk = taxi[:][taxi[:,6] == 2]

if you want to select one specific column , e.g column with index = 1 :

jfk_1 = taxi[:,1][taxi[:,6] == 2]

Hi prasadkalyan05

The reason behind the issue in your question is , you are printing the (jfk.shape) instead of Jfk_count which is the expected one to print in order to see the total count of rows for the condition which we have given above. So the code should follow like this:

Thanks for your reply. However perhaps I did not write so clearly as the issue in the question is not that I mistakenly printed jfk.shape instead of Jfk_count. I was printing jfk.shape out of curiosity in order to see the dimensions, it was what I intended to see. I am aware that jfk.shape[0] returns the first element of a tuple. What is still unclear is why there are 15 columns in (89560, 15).

Please note that when you print (jfk.shape) it will print the output for whole 2d-array dimensions along with both rows and columns.

  1. I see that taxi is a 2-d array and a 2-d array is made up of 1-d-sub arrays
    one of which was extracted by this code I believe:

jfk = taxi[taxi[:,6] == 2]

So then I believed ```
print(jfk.shape)



would return (11832,) and not (11832,15) .

So I think that jfk.shape[1] would return zero or one as it is only one column in there?

Hi vasheyy,

Thanks for your reply.
Could you explain why the 15 columns are returned as well please? Since I thought that jfk was just a one column slice out of the 15 columns.
Where the condition ==2 is satisfied, is it not only all rows of column 6 being selected here ie no rows from adjacent columns 1-5 & 7-15?

jfk = taxi[taxi[:,6] == 2]

Thanks
JB

Hi again! @jamesberentsen

As per my understanding the shape attribute of pandas DataFrame stores the number of rows and columns as a tuple (number of rows, number of columns)
In our case we are filtering the taxi array with all rows from the column with index 6 where the boolean condition ==2 is met. Once we perform this operation the resulting JFT count array will have different count of rows but the column count will remains the same because we are not modifying anything on columns part

For your better understanding, try printing
taxi.shape - you will see the output(89560, 15)
jfk.shape - you will see the output(11832, 15)

Let me know if you still have any questions.
Best
K!

If you want to select all rows of column 6 only , please try below code.

jfk_6 = taxi[:,6][taxi[:,6] == 2]

1 Like

Thanks vasheyy,

If you want to select all rows of column 6 only , please try below code.

jfk_6 = taxi[:,6][taxi[:,6] == 2]

By printing that I was able to see the difference.
I printed jfk_6 and compared output to print(jfk)

jfk_6 I see that it returns :
jfk_6ndarray (<class ‘numpy.ndarray’>)
array([ 2, 2, 2, …, 2, 2, 2])

I see now that the below actually means return all columns and their rows, but with the condition to filter on column with index 6 whose values are equal to 2 – ```
jfk = taxi[taxi[:,6] == 2]

jfk---  [[  2016      1      1 ...   52.8  105.6      1]
 [  2016      1      1 ...      0   37.3      2]
1 Like

Hi! I don’t understand about shape[0]. Can you explain more about that?

By the way, I code like that:
jfk = taxi[taxi[:, 6] == 2]
jfk_count = jfk.sum()
It is float not int, so my answer is not fit with dataquest.

Hi there,

I received your question but I am just a newbie.

I think you need to start a new thread to ask the question.

If you click on ‘new topic’

Regards
JB

2 Likes

Hi @hongchi0502

if you use shape attribute on any 2d ndarray, it will return a tuple which consists of no.of rows and columns. Like wise for any ndarray, it will return its corresponding shape
Ex -
array = numpy.ndarray([[1,2,3], [4,5,6], [7,8,9]])
print(array.shape) #prints a tuple : (3,3)
print(array.shape[0]) #prints first element in tuple - 3, i.e no.of rows

Hello Everyone,

My Code:

import numpy as np

taxi = np.genfromtxt('nyc_taxis.csv', delimiter=',')
jfk = taxi[:, 6] == 2
jfk_count = np.count_nonzero(jfk)

laguardia = taxi[:,6] == 3
laguardia_count = np.count_nonzero(laguardia)

newark = taxi[:, 6] == 5
newark_count =  np.count_nonzero(newark)

Screen Link: <!-- https://app.dataquest.io/m/290/boolean-indexing-with-numpy/9/challenge-which-is-the-most-popular-airport->

Found a very helpful method np.count_nonzero() in Numpy library, which can be used to get the size of given array