Screen Link:
My Code:
data['demographics'] = data['demographics'][data['demographics']['schoolyear'] == 20112012]
print(data['demographics'])
What I expected to happen:
What actually happened:
Replace this line with the output/error
Why do we need to state the dataframe (data[‘demographics’]) twice in the code? Why doesn’t one reference suffice?
data[‘demographics’][data[‘demographics’][‘schoolyear’]
1 Like
Please also add mission link with same.
Hi @s.cook20,
Let’s start from the rightmost mention of 'data['demographics']'
, this one: '[data['demographics']['schoolyear'] == 20112012]'
In this case it’s used to create a boolean mask for filtering our dataframe data[‘demographics’] to choose only those rows where the column 'schoolyear'
has the value of 20112012.
Then second 'data['demographics']'
in the middle: it’s the dataframe itself to which we apply the above-mentioned boolean mask for filtering it.
The third mention of 'data['demographics']'
(the leftmost, before the equal sign) is used to re-assign our dataframe with its new, filtered version. We could create here a new dataframe. But since from now on we’ll care only about the values of 'schoolyear'
equal to 20112012, we can just re-assign the dataframe.