Hyperparameter Optimization in KNN


I have a question regarding the size of k (hyperparameter) in the KNN algorithm (AirBnb data):

What happens if we have only 4 instances in our dataset with let’s say 3 bathrooms and we use this feature (bathrooms) in our training with k = 5? Does it mean that 4 of our neighbours will have 3 bathrooms and 1 will not? Is there any way we can access/visualize situations like this within the KNeighborsRegressor?

Thank you

I’m not sure I understand the assumptions in your question, but I’ll take a stab at answering this question anyway.

Not necessarily. Given an apartment you’ll be looking at the closest neighborhoods bathroom-wise. If the given apartment has one bathroom, it may very well be the case that the five closest neighbors also only only one bathroom and those instances you mentioned never come into play.

Hello Bruno,

Thank you very much for your answer. I think my question was very confusing - I will try to give you a better example: In the chapter “Introduction to K-Nearest Neighbors” we pick the five (k=5) closest neighbours to calculate the “price” of our listing for one feature (“accomodates” - univariate case). What if our listing accomodates 10 people, but there are only 4 OTHER listings which also accomodate 10 people? This means one of the neighbours (4 < 5) will necessarily have “accomodades != 10”.

Is it not possible to account for that when choosing the value of k? How does the KNeighborsRegressor deal with this?

Thank you again

I also thought of that interpretation, but I didn’t want to overcomplicate my reply.

This algorithm doesn’t care if their closest neighbors have matching values or not, it cares about “who” the closest neighbors are.

Then we’ll use four listings that accomodate ten people, and another one that doesn’t.

In the example in the second set of diagrams of the third screen, if the number of neighbors we decided to use was 5, then we’d use all of them regardless of the number of bedrooms.

I hope this clarifies it.