Handling multicollinearity


I’m trying to understand the different ways to handle multicollinearity in ML models. Having searched online I understand that:

  1. You can use Pearson’s correlation between 2 variables and you can then produce a heatmap of correlations.
  2. There is also variance inflation factor (VIF) to identify collinearity which exists between 3 or more variables.
  3. Removing one column from the dummy variable (I think!)

If I have a mix of numerical and categorical features (independent variables) can I use Pearson’s correlation and/or VIF? Are these methods valid for categorical features? Or do I need to use another method such as chi-square test?


Hey @Roya

I believe this article should help you.

PS : As per my knowledge , for a categorical and a continuous variable , multicollinearity can be measured by t- test (if the categorical variable has 2 categories) or ANOVA (more than 2 categories).

Let me know if that helps.



Yes, thank you @prasadkalyan05 !

1 Like