Star Wars survey project - SettingWithCopyWarning

Screen Link:

My Code:

bool_map = {'Yes': True, 'No': False}
for col in [
    "Have you seen any of the 6 films in the Star Wars franchise?",
    "Do you consider yourself to be a fan of the Star Wars film franchise?"
    ]:
    star_wars[col] = star_wars[col].map(bool_map)

What I expected to happen:
df.map() function to work and values in the dataframe to be replaced accordingly

What actually happened:

/dataquest/system/env/python3/lib/python3.4/site-packages/ipykernel/__main__.py:5: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

To be clear, the map did work, but it still throws this warning. I’ve read the DQ blog post on SettingWithCopyWarning and have researched elsewhere online, and I can’t find the fix. I’ve tried to append .copy() to the end of my code but this doesn’t work either. I’ve also tried star_wars.loc[:, col] = star_wars.loc[:, col].map(bool_map) with the same result.

To make things more confusing, the solution set uses my exact same method!
Thanks in advance.

I tried your code it did not re-produce any error

Can you show add your full code?

Thanks. I’ve attached my notebook and star_wars.csv.
Basics.ipynb (157.2 KB) star_wars.csv (521.8 KB)

Click here to view the jupyter notebook file in a new tab

Hi! Try to add .copy() to the cell #4 where you drop the rows with IDrespondent in null.
This one:
star_wars = star_wars[pd.notnull(star_wars['RespondentID'])] #drop rows with null RespondentID

2 Likes

That did work - thanks! Would you be able to explain why that made a difference for my understanding. Again, here the solution file seemed to do the same as I had done, but without using copy().

Hi @mikedbjones

Cell #4 in conjunction with cell #7 should be the culprit. As @ksenia.kustanovich mentioned adding copy() to cell #4 should resolve the issue.

star_wars = star_wars[pd.notnull(star_wars['RespondentID'])].copy()

Alternatively you could operate on the original data frame with out setting a copy when basing your implementation on DataFrame.loc[]. However, I don’t think this leads to a simpler solution in this case.

Some additional thoughts on this. What your are in essence trying to do is chained assignement. This can lead to unexpected behavior and is therefor not considered a good idea. The details are explained in this excellent DQ blog post: settingwithcopywarning

Best
htw

2 Likes

Thanks for this, much appreciated.