Feature Selection in Linear Regression

On this page I learned about rescaling the values. Nothing in the instructions or the explanation told us about how to check for the features with low variance.

On this page , we were told that Open Porch SF was dropped because it has the lowest variance.

Can someone explain how we are checking for features with low variance? Thanks in advance.

1 Like

Hi @amol,

Thank you for bringing this up. The mission screen: 5. Removing Low Variance Features seems to be missing some instructions.

The last technique we’ll explore is removing features with low variance. When the values in a feature column have low variance, they don’t meaningfully contribute to the model’s predictive capability. On the extreme end, let’s imagine a column with a variance of 0 . This would mean that all of the values in that column were exactly the same . This means that the column isn’t informative and isn’t going to help the model make better predictions.

Based on this, we have to find the columns with low variance. We can call the .var() method on a pandas dataframe to calculate variance easily.

unit_train.var()

Wood Deck SF     0.033064
Open Porch SF    0.013938
Fireplaces       0.046589
Full Bath        0.018621
1st Flr SF       0.025814
Garage Area      0.020347
Gr Liv Area      0.023078
Overall Qual     0.024496
dtype: float64

Based on the result, the Open Porch SF column has the lowest variance. So removing it makes sense. However, it’s not clear what threshold we are using to remove columns with low variance. I will get this issue logged.

Best,
Sahil

2 Likes

@Sahil - What is the threshold value for variance while excluding the columns?