In this lesson, could you guys clarify that this is standardization not normalization. In addition, can you clarify why you chose to standardize the data instead of normalizing it using min-max normalization like you did for the weighted sum problem?
“Normalization” is sometimes used as an umbrella term for the “adjusting values measured on different scales to a notionally common scale”. In my experience it most commonly refers to min-max normalization, though.
This is to say that I’m inclined to agree with you, but I do not commit to the opinion using “normalization” is actually wrong — it’s a defensible (if weak) point of view.
Now, regarding which to choose, I actually think that min-max scaling is more appropriate here; kNN is heavily based on distances and standardization doesn’t allow you to properly compare different features.
For example, the maximum value of
maximum_nights after standardizing is around 61. The maximum value of
number_of_reviews post-standaridization is roughly 12.
Using just this two features (assuming the maximum is representative of the column, which isn’t necessarily the case — this just an example to convey the idea), the former would crush the latter, it would have a much stronger influence.
Min-max normalization allows us to escape from this problem. The content of this screen explains this pictorially in a different context.
At the end of the day, what performs better is the best choice, so you can even try both and see where it leads you.
I hope this helps.