Question about feature selection in k-nearest neighbor regression

Screen Link:

Hi fellow learners,

I’m doing the Guided Project: Predicting Car Prices using k-nearest neighbor and am wondering about the feature selection in the solution. Why do we have to use continuous data only? Why is discrete data not used as features like num-of-doors, and symboling? I get that num-of-doors is categorical but it is ordinal and can be converted to numerical easily.

I’ve been googling but haven’t found anything that helps, would really appreciate some help in clarifying feature selection in the k-nearest neighbor regression. :innocent:

Thanks!

2 Likes

Hi @veratsien

I believe it’s probably up to the user to explore more and try to improve the RMSE score with other combinations of features. Since the data is composed by 26 columns if I am not wrong, there are 2^{26} feature combinations possible to train with.

It depends also on the algorithm used. Some of them accept only continous data, others accept both discrete and continous variables, others accept only categorical features, etc.

2 Likes

Thank you for your reply. I’m actually doing the project with columns like num-of-doors to just explore a bit more.

I understand feature selection varies with the algorithm used, I’m wondering about feature selection in this specific case with k-nearest neighbor regression. I looked at a few other projects shared in the community and they all seem to go with the columns with continuous data. Just wondering if I’m missing something in this particular case. :thinking:

2 Likes

Well, if you changed your categorical feature in a continuous one (by normalization I guess) and then your model has improved with this new feature, well done!

2 Likes