Filtering a dataframe using pandas

Hi All

I’m busy wiht the Data Cleaning Walkthrough: Combingin the data exercise: Learn data science with Python and R projects

The specific task that my question relates to is straight forward: we have to filter a dataframe and only select rows that has a specific column value.

The answer is: data[‘graduation’] = data[‘graduation’][data[‘graduation’][‘cohort’] == ‘2006’]

My question is: Why is the data[‘graduation’] parts duplicated after the =

Why is the answer not data[‘graduation’] = data[‘graduation’][‘cohort’] == ‘2006’]

Any hints or tips on how you would google to sort out this issue would also be appreciated, as I’ve tried a couple of google searches without success.

Best,

JK

We probably don’t need Google for this one…rather: try printing the values (right hand side of the equal sign) to see the difference between them:

print(data['graduation']['Cohort'] == '2006')
print(data['graduation'][data['graduation']['Cohort'] == '2006'])

What do you notice?

I would probably write it like this if I wanted to make it more readable for others:

bool_cohort_2006 = data['graduation']['Cohort'] == '2006'
data['graduation'] = data['graduation'][bool_cohort_2006]