Generate Column in Pandas Dataframe from sliced/cleaned Column

Hi

While trying to generate a new column based on a sliced and cleaned column in a Pandas Dataframe, I continue getting this warning.

“A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead”

The issue I am working on is the 6th Guided Project: Employee Exit Survey, in the Data Cleaning Module

Screen Link:

My respective code is the following:

yrs_str = combined_updated['institute_service'].astype('str')  # convert to string
patt = r"([0-9]+)[-\.]?([0-9]+)?" #regular exp pattern to extract up to 2 numbers
yrs_extr = yrs_str.str.extract(patt)              # extract numbers
yrs_cln  = yrs_extr.dropna(how='all')             # remove all lines with both elements NaN - 88 lines(ref above)
yrs_calc = yrs_cln.fillna('0').astype('float').apply(calc_yrs,axis=1) #fill single column NaN with '0'   >> convert to float >> deploy calc_yrs function 

combined_updated.loc[:,'service_cat'] = yrs_calc.map(carr_stage).copy()

I do understand that the Series “yrs_calc” I generated has not the same size as I remove the 88 NaN elements from it. Nevertheless I would expect to be able to generate a new column with the above quoted slicing method.

But I still get the “copy of slice warning”. Therefore I assume that I have misunderstood the explanations given on the below link:
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

Keen to get your hint/explanation where I failed to deploy properly the slicing.

thanks for reading
bender

Hi @bender38

I may not be able to answer why as even I am confused with the details of this warning. But here’s a workaround I usually do:

# make a copy of the data frame first
combined_updated = combined_updated.copy()

# then make the changes to the dataframe
combined_updated.loc[:,'service_cat'] = yrs_calc.map(carr_stage)

The key here is the .copy() method. More details on this are here. Hope this helps.

Hi Rucha

Thanks for your hint, this has worked for me as well in other cases, my issue here is that I want to generate a new column that indicates 88 values less then the DataFrame size, as these elements are NaN.
But if you look at my code I use the “copy()” command, but that seems not to be the issue in my case.

Maybe a work around would be to use rather “fillna” function instead of dropping those NaN values, in order to keep the shape size intact ?

regards
Bender