Why is the dataframe stated twice?

Screen Link:

My Code:

data['demographics'] = data['demographics'][data['demographics']['schoolyear'] == 20112012]

What I expected to happen:

What actually happened:

Replace this line with the output/error

Why do we need to state the dataframe (data[‘demographics’]) twice in the code? Why doesn’t one reference suffice?

1 Like

Please also add mission link with same.

Hi @s.cook20,

Let’s start from the rightmost mention of 'data['demographics']', this one: '[data['demographics']['schoolyear'] == 20112012]'
In this case it’s used to create a boolean mask for filtering our dataframe data[‘demographics’] to choose only those rows where the column 'schoolyear' has the value of 20112012.

Then second 'data['demographics']' in the middle: it’s the dataframe itself to which we apply the above-mentioned boolean mask for filtering it.

The third mention of 'data['demographics']' (the leftmost, before the equal sign) is used to re-assign our dataframe with its new, filtered version. We could create here a new dataframe. But since from now on we’ll care only about the values of 'schoolyear' equal to 20112012, we can just re-assign the dataframe.