I’m trying to understand the different ways to handle multicollinearity in ML models. Having searched online I understand that:
- You can use Pearson’s correlation between 2 variables and you can then produce a heatmap of correlations.
- There is also variance inflation factor (VIF) to identify collinearity which exists between 3 or more variables.
- Removing one column from the dummy variable (I think!)
If I have a mix of numerical and categorical features (independent variables) can I use Pearson’s correlation and/or VIF? Are these methods valid for categorical features? Or do I need to use another method such as chi-square test?