Screen Link:
Introduction To Pandas | Dataquest
My Code:
One of the correct solutions is this :
industry_usa = f500[f500['country']=='USA']['industry'].value_counts().head(2)
The above makes sense intuitively i.e. a df is being subjected to a boolean mask for performing operations on one of the columns - industry.
My query is around syntax which is throwing up error if i try doing it based on my interpretation of lessons.
The way I tried it based on previous lessons is this :
industry_usa = f500['country']=='USA' f500['industry'].value_counts().head(2)
First part is boolean, second is selecting the column. I expected the code to run perfectly, but it did not throwing up syntax error.
Differences between 2 codes are as follows, but would require some guidance or links to syntax on why the output did not happen in the second case :
-
Based on lessons, my code indexes industry column like this - f500[âindustryâ] however the correct code doesnât mention âf500â, but only [industry]
-
My code doesnât mention df at the start of the code, but the correct code does. My assumption was that mentioning df at start was not required as it is implicit in the boolean & column selection : f500['country']=='USA' & f500['industry']
.
I am asking questions as there are too few available on DQ on pandas and it would help new students
To help clarify things for you, try printing out each to see what these objects actually are. For example, try:
print(f500['country']=='USA')
and
print(f500['industry'])
In order for the boolean mask to be useful to us, we need to apply it to the df itself. Python will not assume this for usâŚwe need to tell it directly what we want that mask to do. In other words, we need f500
âat the start of the codeâ in order for it to actually filter our df according to our criteria (ie give us the rows where country == USA).
Once we have this ânewâ df (one where we only have rows where country == USA) we then want to select just the industry
column. We can do this by simply using ['industry']
after our newly filtered df (ie f500[f500['country']=='USA']
). Using a combination of the solution code and your code, another way to accomplish this task could look like this:
mask_usa = f500['country']=='USA'
f500_usa = f500[mask_usa]
industry_usa = f500_usa['industry']
The reason the solution code doesnât mention f500
in this part of the code is because f500[f500['country']=='USA']
is a df in and of itself! Itâs a âsub dfâ of f500
. If we use f500['industry']
it will give us all the rows of f500
but we only want the rows where the country is USA.
I hope this helps clarify things a bit for you and if it doesnât, please feel free to ask more questions and we can figure it out together!
super ! thank you. I figured out the logic later once i moved aheadâŚbut the way you explainedâŚit made an extra âclickâ. thanks !
1 Like
Very nice, congrats!
Youâre welcome, it was my pleasure. Iâm glad I was able to provide some additional help. Extra âclicksâ are good 
Happy learning!