Thank you for bringing this up. The mission screen: 5. Removing Low Variance Features seems to be missing some instructions.
The last technique we’ll explore is removing features with low variance. When the values in a feature column have low variance, they don’t meaningfully contribute to the model’s predictive capability. On the extreme end, let’s imagine a column with a variance of
0 . This would mean that all of the values in that column were exactly the same . This means that the column isn’t informative and isn’t going to help the model make better predictions.
Based on this, we have to find the columns with low variance. We can call the
.var() method on a pandas dataframe to calculate variance easily.
Wood Deck SF 0.033064
Open Porch SF 0.013938
Full Bath 0.018621
1st Flr SF 0.025814
Garage Area 0.020347
Gr Liv Area 0.023078
Overall Qual 0.024496
Based on the result, the
Open Porch SF column has the lowest variance. So removing it makes sense. However, it’s not clear what threshold we are using to remove columns with low variance. I will get this issue logged.