Feature Preparation, Selection and Engineering


I am trying to update two dataframes datasets train and holdout. I created two for-loops to achieve this. But unfortunately it is not working the way I expected. Please help me to understand the logic.

Screen Link:

My Code:

def create_dummies(df,column_name):
    dummies = pd.get_dummies(df[column_name],prefix=column_name)
    df = pd.concat([df,dummies],axis=1)
    return df

df_cat = [train, holdout]
categories = ['Age_categories', 'Pclass', 'Sex']

for d in df_cat:
    for c in categories:
        d = create_dummies(d, c)


What I expected to happen:
I was expecting the datasets train and holdout to be updated with the new columns. But that has not happened. When I print the variable ‘d’, I understand the modification has happened. But it is not reflecting on the actual datasets.

What actually happened:
The actual datasets train and holdout are not updated.


hi @sreekanthac

this should help you.


Thank you @Rucha for the quick response. That helped.

1 Like

@sreekanthac @Rucha
The problem in stackoverflow link was solved as the drop function had the argument inplace.
How do we solve this particular problem where you are using a custom function “create_dummies” without an inplace option?

@sreekanthac - were you able to modify your code to get the desired result?

1 Like