Multivariate KNN & Standardization vs Normalization

Screen Link:

In this lesson, could you guys clarify that this is standardization not normalization. In addition, can you clarify why you chose to standardize the data instead of normalizing it using min-max normalization like you did for the weighted sum problem?

Hey, Melissa.

Normalization” is sometimes used as an umbrella term for the “adjusting values measured on different scales to a notionally common scale”. In my experience it most commonly refers to min-max normalization, though.

This is to say that I’m inclined to agree with you, but I do not commit to the opinion using “normalization” is actually wrong — it’s a defensible (if weak) point of view.

Now, regarding which to choose, I actually think that min-max scaling is more appropriate here; kNN is heavily based on distances and standardization doesn’t allow you to properly compare different features.

For example, the maximum value of maximum_nights after standardizing is around 61. The maximum value of number_of_reviews post-standaridization is roughly 12.

Using just this two features (assuming the maximum is representative of the column, which isn’t necessarily the case — this just an example to convey the idea), the former would crush the latter, it would have a much stronger influence.

Min-max normalization allows us to escape from this problem. The content of this screen explains this pictorially in a different context.

At the end of the day, what performs better is the best choice, so you can even try both and see where it leads you.

I hope this helps.