I was wondering if someone can explain the SettingWithCopy warning to me. This comes up in the guided project for the Data Cleaning and Analytics mission, but it is still unclear to me. How do we know when we should use .copy() on a dataframe? At what times should we be making a copy of our dataframe, as opposed to using the original one?
Everything in Python are objects.
Set y to x
x = [1, 2, 3, 5] y = x
Changing a value in y
y = 2
A change in y is reflected in x.
>>> x 2, 2, 3, 5 >>> y 2, 2, 3, 5
If the change made by y is not desired by x. Then do a copy for y instead.
x = [1, 2, 3] y = x.copy()
In python, when you assign a label to an object, the label is referencing to an object. For mutable object (where values can mutate), any changes to the object is reflected to all labels that references the same object.
Therefore, when a label reference to another mutable object in order its modify values, it will best to make a copy of the object to reduce further issues.
This is an excellent blog post that breaks down what happens with this warning: https://www.dataquest.io/blog/settingwithcopywarning/