Hi guys,
So I have just finished the first 4 missions in Machine Learning Fundamentals course on K-Nearest Neigbors.
What I have understood is that:
- We need to identify the most relevant features to fit the model. More is not the better always.
- Then identify and select the ideal k value using hyperparameterization for the selected features
Now I am not sure if the following would be covered in future courses of the learning path, but anyways:
- How do we determine the most relevant features in cases where we have tens of features? Is there any statistical technique that we can utilize? Or is it just trial and error (utilizing domain knowledge) through the use of MSE and RSME?
- Should we first identify the set of features and then identify the ideal k value for these features using hyperparameterization?
If there are 15 distinct features, then we can have hundreds of different feature combinations to evaluate first on a standalone basis and then in conjunction with the ideal k value. That is a lot of trial and error and not efficient I believe. I am hoping that there is a more efficient way to do this?
Thanks for any help!