Guided Project: Clean and Analyze Employee Exit Surveys Question regarding dropping columns

I am currently doing this project but I cannot understand why are we dropping columns dete_survey.columns[28:49]. I undertand the TAFE columns, they make sense but the DETE?

We are missing a bunch of data on the last 5 columns, why are we keeping these ones?


Hi Manuel,

Those columns from the DETE dataset are presented in denotations, and we don’t have any legend to decifer them. Well, later, while analyzing the TAFE dataset, we’ll see some columns with similar values (“Agree”, “Neutral”, etc.), so most probably these denotations mean the same. Anyway, given that the DETE columns from 28 to 49 have 6 possible values each, it would not be so straightforward to translate them into “satisfied-dissatisfied” terms, especially considering the neutral values. Furthermore, we have already a big bunch of quite informative columns about dissatisfaction in that institute, and those columns are already in boolean. So there is just no need to decipher and re-scale the other columns.

And yes, it’s better to delete also the last columns. In the solution notebook, you’ll find using the thresh=500 argument for this purpose (dropping columns with less than 500 non null values).