Optimal features for K-Nearest Neighbors

Hi guys,

So I have just finished the first 4 missions in Machine Learning Fundamentals course on K-Nearest Neigbors.

What I have understood is that:

  • We need to identify the most relevant features to fit the model. More is not the better always.
  • Then identify and select the ideal k value using hyperparameterization for the selected features

Now I am not sure if the following would be covered in future courses of the learning path, but anyways:

  • How do we determine the most relevant features in cases where we have tens of features? Is there any statistical technique that we can utilize? Or is it just trial and error (utilizing domain knowledge) through the use of MSE and RSME?
  • Should we first identify the set of features and then identify the ideal k value for these features using hyperparameterization?

If there are 15 distinct features, then we can have hundreds of different feature combinations to evaluate first on a standalone basis and then in conjunction with the ideal k value. That is a lot of trial and error and not efficient I believe. I am hoping that there is a more efficient way to do this?

Thanks for any help!

To find the features you can use 3 methods, i’m still learning that part because i had the same question but this article has a really awesome and easy explanation

Now to find the ideal k the best thing you can do is test different k values, to do that use a for loop were you test different k values in a range and your test data and save the accuracy in an empty list. After that you plot ks vs accuracy and the best k is were the curve flattens Someting like this
1
In this example the best k is 15.

I don’t know if there’s another way to find the best k but this is the one i use

Thanks mate - definitely helpful!