Advice on SettingwithCopyWarning (Exploring Ebay Car Sale Data)

Screen Link:

My Code:

date_format = '%Y-%m-%d'
def convert_dt(series):
    date = series.str[:10]
    date_dt = pd.to_datetime(date, format=date_format)
    return date_dt

# Adding converted columns
autos['date_crawled_dt'] = convert_dt(autos['date_crawled'])
autos['ad_created_dt'] = convert_dt(autos['ad_created'])
autos['last_seen_dt'] = convert_dt(autos['last_seen'])

What I expected to happen:
Hi all, the guided project asked me to do some cleaning and analysis on columns with date information. While the prompt just instructed me to do some checking using the string-format columns (date_crawled, ad_created, last_seen), I wanted to convert them into datetime formats, and just add the converted information as new columns (date_crawled_dt, ad_created_dt, last_seen_dt) to the existing autos data frame before analyzing them

What I did was create a function to do the conversion, then use my function (convert_dt) to add new converted columns.

What actually happened:
I was able to create the new columns, but I got the SettingwithCopyWarning. The first few lines of the warning (it’s just repeated three times) can be seen below:

/dataquest/system/env/python3/lib/python3.4/site-packages/ipykernel/ SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation:

My question

My question is, is it a good idea to just ignore this warning? Or should I just create a new data frame containing my newly converted columns? Creating a new data frame, at least for this guided project, is easy enough but I’m worried that in dealing with other data sets, I might want to keep working with a single data frame, such it’s easier to interact or analyze these transformed columns with other columns from the original data frame.

Is it also the case that every time I add a column which is a copy or a transformation of another existing column, I will be getting this warning?

1 Like

Hi @philiplibre,

This SettingwithCopyWarning is a frequent topic to discuss here in the Community. You can find helpful the previous discussions of this issue. Also, this article is a great resource.

Exactly in your case, to avoid SettingwithCopyWarning and also to avoid creating a new dataframe (you are right, it’s not always a good decision), try using .loc[row_indexer,col_indexer] = value to create the new columns, as the error description suggests.

1 Like

Hi @Elena_Kosourova, thanks for the response. My solution was to actually just create a copy of the dataframe containing only the series I was currently interested in. It can be seen in my Guided Project No. 3 notebook which I’ve also shared (in the Exploring the Date Columns section).

Anyway, I tried the .loc[row_indexer,col_indexer] = value using the following code:
autos.loc[autos.index, 'date_crawled_dt'] = convert_dt(autos['date_crawled'])

I still got the SettingwithCopyWarning. I also tried the following code:
autos['date_crawled_dt'] = convert_dt(autos['date_crawled']).copy()
The SettingwithCopyWarning still persisted.

In the interest of conserving memory, especially if I’m dealing with larger data sets, there may be cases where I would prefer not to create a copy of my main dataframe. I’m still not sure why the warning keeps popping up, perhaps I’ll try to read up more on documentations regarding this particular warning to understand its inner workings.

Click here to view the jupyter notebook file in a new tab


Can you try this code instead?

autos.loc[:, 'date_crawled_dt'] = convert_dt(autos['date_crawled'])

Hi @Elena_Kosourova,

The same SettingwithCopy warning still pops up when I try to use

autos.loc[:, 'date_crawled_dt'] = convert_dt(autos['date_crawled'])

Here’s the warning I get (I used my local jupyter notebook this time):

C:\Users\Philip\anaconda3\lib\site-packages\pandas\core\ SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation:

I suspect the warning is the result of the way my convert_dt function operates under the hood since I tried to recreate the warning using a smaller dataframe (and a simpler conversion function) and the warning does not pop up:

df = pd.DataFrame([['a',1], ['b',2], ['c',3]], columns=['letter','number'])

def add_two(series):
    added_two = series + 2
    return added_two

df['plustwo'] = add_two(df['number'])
df['number_copy'] = df['number'].copy()

Anyway, I’ll consider this as solved for now until further notice. I still welcome any comments if someone figures out exactly why the warning is being returned for this particular case. Thanks for the suggestions.

1 Like