While working on my Guided Project: Clean And Analyze Employee Exit Survey, I stumbled upon a famous “SettingWithCopyWarning” error, for the second time already in my code. I don’t understand how to get rid of it or even why it’s there.
first it appeared when I tried cleaning one column in order to just get the year format (ex. my aim was to turn smth like this “08/2010” into “2010.0”). I tried with and without .loc, but always get the warning, here is my code line:
second time it appears while I just wanted to create a new column, which is basically a subtraction of some two columns in my dataframe, the code is this:
I am stuck on a different piece of code from step 9. I cannot avoid this error. Tried copy, tried resetting index and copy. Tried intermediate values and copy. Every time I want to assign it back to institute_service column in combined it gives me the error.
resetting index as there were some duplicate values
Hi!
First of all it’s not an error. It’s a warning here is a really good explanation about it
Second if you try this
combined_updated[‘institute_service’] = combined_updated[‘institute_service’].astype(str).str.extract(’(\d+)’).astype(float).copy()
Yes Warning is still there:
/dataquest/system/env/python3/lib/python3.4/site-packages/ipykernel/main.py:1: FutureWarning: currently extract(expand=None) means expand=False (return Index/Series/DataFrame) but in a future version of pandas this will be changed to expand=True (return DataFrame)
if name == ‘main’:
/dataquest/system/env/python3/lib/python3.4/site-packages/ipykernel/main.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
Thank you Warning slightly changed - Expand part disappeared.
/dataquest/system/env/python3/lib/python3.4/site-packages/ipykernel/main.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
Hi! To avoid the warning, it’s necessary to apply .copy() before you try to make changes. I don’t speak about all possible cases when the warning appears, but particularly about this project. In this project the warning appears mostly due to chained assignment, described in the article @raisa.jerin.sristy79 shared above. What leads to the warning?
First, you create a new dataframe from the original dataframe based on some condition (the code below is an example, it’s from my project): dete_resignations = dete_survey_updated[dete_survey_updated['separationtype'].str.contains(r'Resignation')]
Then, you continue working with the dete_resignation and obviously at some point you need to introduce some changes into it:
It’s when Pandas gets confused. When you created dete_survey_update actually you don’t create a new dataframe, you create a “view” of the original dataframe which contains only the rows which face the condition you need. So, Pandas (the way the library is created) can’t assure you where the changes are going to be introduced: to the new dataframe or to the original one as well. And it raises the SettingwithCopyWarning.
Si, what should have been done to avoid the Warning? You should create a copy, right, but not in the moment when you introduce changes, but in the moment when you create a new dataframe:
I strongly recommend to read the article provided and maybe watch a couple of videos on YouTube where this issue is explained with more details and more technically then I do.
Thank you for your detailed response. I will definitely read through the article with more attention once again.
I have tried a few permutations and for some reason chained .copy() didn’t work but this did:
combined_updated = combined_updated.copy()
#effectively overwritten the old data frame with the new. was slightly surprised it did because I would have thought it is trying to reference itself but probably internals have their own referencing for view and dataframes. Or copy creates an intermediate object which is then “swaps” pointer to this new object.
It clearly states that problem lies in the expand=False statement, hence set expand=True.
The expand parameter, if True, return DataFrame with one column per capture group. If False, return a Series/Index if there is one capture group or DataFrame if there are multiple capture groups.