Hi @Bruno,
I wanted to ask a follow-up question on this discussion. I’ve been reading the Dataquest article, SettingwithCopyWarning: How to Fix This Warning in Pandas, and am trying to better understand their example of Common issue #2: Hidden chaining
.
Let’s say that I wanted to create a new DataFrame based on some transformation of an existing DataFrame, but I also wanted to preserve the old DataFrame without modifying it. Should I apply copy()
before or after the transformation? For example, let’s say df1
is my original DataFrame.
linear = [x for x in range(-3, 4, 1)]
cubic = [x ** 3 for x in range(-3, 4, 1)]
quintic = [x ** 5 for x in range(-3, 4, 1) ]
import pandas as pd
df1 = pd.DataFrame({'lin' : linear, 'cub' : cubic, 'quin' : quintic})
print(df1)
lin cub quin
0 -3 -27 -243
1 -2 -8 -32
2 -1 -1 -1
3 0 0 0
4 1 1 1
5 2 8 32
6 3 27 243
Now, I’d like to create a new DataFrame from a portion of df1
so I can transform it later without running the risk of modifying df1
. One option is to copy df1
before slicing it…
df2 = df1.copy().loc[df1.lin == df1.quin]
print(df2)
lin cub quin
2 -1 -1 -1
3 0 0 0
4 1 1 1
…so that when I modify df2
…
df2.loc[3, 'cub'] = "Hello"
print(df2)
lin cub quin
2 -1 -1 -1
3 0 Hello 0
4 1 1 1
…df1
stays the same.
print(df1)
lin cub quin
0 -3 -27 -243
1 -2 -8 -32
2 -1 -1 -1
3 0 0 0
4 1 1 1
5 2 8 32
6 3 27 243
Now, let’s say I create a different DataFrame called df3
from the same portion of df1
, but this time I apply copy()
after slicing it.
df3 = df1.loc[df1.lin == df1.quin].copy()
print(df3)
lin cub quin
2 -1 -1 -1
3 0 0 0
4 1 1 1
The result looks the same as df2
, so, now, I want to see if modifying df3
will affect df1
.
df3.loc[3, 'cub'] = 'Greetings'
print(df3)
lin cub quin
2 -1 -1 -1
3 0 Greetings 0
4 1 1 1
print(df1)
lin cub quin
0 -3 -27 -243
1 -2 -8 -32
2 -1 -1 -1
3 0 0 0
4 1 1 1
5 2 8 32
6 3 27 243
It doesn’t seem to matter whether I apply copy()
before or after the transformation even though it seems that logically, you’d want to apply it before as when creating df2
. Is that right?
I’m going to try one last experiment in which I do not use copy()
to see if it modifies the original DataFrame.
df4 = df1.loc[df1.lin == df1.quin]
print(df4)
lin cub quin
2 -1 -1 -1
3 0 0 0
4 1 1 1
df4.loc[3, 'cub'] = "Whassup!"
print(df4)
lin cub quin
2 -1 -1 -1
3 0 Whassup! 0
4 1 1 1
/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py:543: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self.obj[item] = s
I got the SettingWithCopyWarning
, which I expected, but nothing seemed to happen to the original DataFrame (this is also expected since it’s ambiguous whether get operations return views or copies).
print(df1)
lin cub quin
0 -3 -27 -243
1 -2 -8 -32
2 -1 -1 -1
3 0 0 0
4 1 1 1
5 2 8 32
6 3 27 243
So, it’s really unclear to me when it’s necessary to use copy()
and whether you should use it before a transformation as in df2
or after the transformation as in df3
…or am I missing something essential here?
Thanks.