Guided Project: Clean And Analyze Employee Exit Surveys [Step 4]

Screen Link:
https://app.dataquest.io/m/348/guided-project%3A-clean-and-analyze-employee-exit-surveys/4/filter-the-data

Clean And Analyze Employee Exit Surveys.ipynb (11.6 KB)

My Code:

resignation_pattern = "Resignation"
dete_resignations = dete_survey_updated["separationtype"].str.contains(resignation_pattern).copy()
tafe_resignations = tafe_survey_updated["separationtype"].str.contains(resignation_pattern).copy()

What I expected to happen:
I expected it to be saved as 2 dataframes (dete_resignations and tafe_resignations)
I used the string “Resignation” as that was the common string in all three resignation types.

What actually happened:
Not showing any results when I add .info() or .head()

What am I doing wrong?

Click here to view the jupyter notebook file in a new tab

Well the problem is that you just created a boolean mask wich a Series object, and Series doesn’t have info or head as attributes. What i find weird is that you didn’t get an error.

Now try to use that boolean mask to filter the data

Ok.
Would I just filter the data like this then?
dete_survey_updated["separationtype"].str.contains(resignation_pattern).copy()
tafe_survey_updated["separationtype"].str.contains(resignation_pattern).copy()

How would I assign the dete_resignations and tafe_resignations?

dete_resignations = dete_survey_updated[dete_survey_updated['separationtype'].str.contains(resignation_pattern)]
tafe_resignations = dete_survey_updated['separationtype'].str.contains(resignation_pattern)]

You don’t need to use .copy() because you’re using a bool mask to find wich rows in each dataframe contains “Resignation” in the separationtype column

You can use the following expression as well:

dete_resignations = dete_survey_updated.loc[(dete_survey_updated.separationtype.str.startswith('Resignation', na=False))].copy()

As regard of the .copy() term, I think we need to use it because we won’t know if result of the bool mask will return a view or a copy from the original dataframe and because of that we must add.copy() to make sure that it returns a copy.

Take a look: https://www.dataquest.io/blog/settingwithcopywarning/

I did this :slight_smile:

pattern = r'[Rr]esignation'
dete_resignations = dete_survey_updated.loc[dete_survey_updated['separationtype'].str.contains(pattern)]
tafe_resignations = tafe_survey_updated.loc[tafe_survey_updated['separationtype'].str.contains(pattern,na=False)]