Solution Recommendation

Screen Link: https://app.dataquest.io/m/188/guided-project%3A-creating-a-kaggle-workflow/5/selecting-the-best-performing-features

Hi guys,

Noticed in the recommended solution that we have used RFECV to identify the best features - but unfortunately this results in columns with a high level of multicollinearity e.g. ‘Title_Miss’, ‘Title_Mr’, ‘Title_Mrs’, ‘Sex_female’, ‘Sex_male’. Instead of these 5 different feature columns, we could have just used one any one single column: Sex_female or Sex_male, or Title_Mr
Wouldn’t that have been better?

Thanks