Rescale or dummy a column in feature preparation?

During the feature preparation, we used to rescale and dummy columns. But, how to know when to use rescale and when dummy?

Hey @nileshsuryavanshi395,

So both re scaling and dummy column(s) construction are different parts of the pipeline. One usually creates a dummy variable for categorical columns.
Say we have a regression problem in hand and the final feature set contains categorical features. If we apply a LabelEncoder, then we are saying that one category (which got a code of 0) is inferior to another category (which got a code of 1), since the model only sees the numeric value. Thus this categorical column is instead converted to dummy variables so that every category is given equal importance.
Once you have all the features ready after performing the above-mentioned steps, re-scaling of the features is performed. Say we had two features area of a house and number of bedrooms, it is clear that the former will be larger than the latter by at least a factor of a thousand. This mismatch of scale will lead to the model finding coefficients that can get biased towards one. To avoid this, re-scaling is done so all features are on pretty much the same scale.
I hope this solves your doubt. If not, do let me know what was unclear.

Thanks
Raj Tulluri

1 Like

Thanks for this wonderful explanation. :slightly_smiling_face:

1 Like