Atempting to resolve SettingWithCopyWarning causes some issues

Screen Link:
https://app.dataquest.io/m/348/guided-project%3A-clean-and-analyze-employee-exit-surveys/9/clean-the-service-column

I’ve completed working on the Guided Project: Clean And Analyze Employee Exit Surveys. Just to get it clean I wanted to get rid of the SettingWithCopyWarning issues and I’ve failed to resolve them. I got a couple of extra issues as well while trying to resolve them so thought I’d come and ask any help would be appreciated.

The 9th challenge out of 11 requires us to clean up the institute_service column
Issue 1:

# Extracts the number of years in the institute from the column by using the specified pattern and #converting the extracted value to float
combined_updated.loc[:,"institute_service"] = combined_updated["institute_service"].astype(dtype='str').str.extract(pattern)
print(combined_updated["institute_service"])

While trying to resolve the warning and based on the suggestion in the warning I used the .loc(). The output came up as:

3     NaN
5     NaN
8     NaN
9     NaN
11    NaN
       ..
696   NaN
697   NaN
698   NaN
699   NaN
701   NaN
Name: institute_service, Length: 651, dtype: float64

However if I use the code as follows

combined_updated.["institute_service"] = combined_updated["institute_service"].astype(dtype='str').str.extract(pattern)
print(combined_updated["institute_service"])

The output is as follows:

3        7
5       18
8        3
9       15
11       3
      ... 
696      5
697      1
698    NaN
699      5
701      3
Name: institute_service, Length: 651, dtype: object

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

Can I know why I get the NaN result despite correctly setting the .loc function?

Issue 2:
I’m unsure why I keep getting the warning. I have used the following code to ensure that the the combined data frame has new copies of the DETE and TAFE dataframes but still I keep getting the warning.

dete_resignations_up = dete_resignations.copy(deep = True)
tafe_resignations_up = tafe_resignations.copy(deep = True)

Any idea why this might be?

1 Like

I tried with your code as well. And I got the correct output. So, this likely depends on rest of your code. Or if you ran some code twice. You could try to share your Notebook here (attach it as a file or share the GitHub link to a repo)

Did you make sure to use copy() when you created combined_updated as well? That’s most likely the only cause here.

1 Like

@the_doctor You are right :smiley: I did miss the copy() when creating combined_updated. That took a significant amount of time figuring out. I should have asked earlier.
However the issue with the NaN still persists. I’ve attached my file here: Basics.ipynb (236.8 KB)