In the Multivariate KNN mission under the Machine Learning course they refer to the z-score formula as normalization. Wouldn’t that be standardization? And why was standardization chosen over normalization (x-min)/(max-min)?

Hey Barry,

You are correct that technically the formula used is standardization (z-score transformation).

As for your other question, I do not know why they used standardization over normalization, but this did pique my curiosity. I did a little reading and found this article that I satisfied my curiosity and helped me learn some new things as well.

Hope this will be helpful. Thanks for the question!

I don’t know why the author chose one over the other, but I’ll try to provide some intuition into how this works.

A better answer would do this and illustrate the relevant points. Hopefully, after reading my answer, you feel empowered enough to investigate and illustrate them yourself.

The kNN algorithm is based on the distance between data points. Consider an example like classifying whether a customer is good or bad, based on the number of purchases and total amount spent — we have two features: amount spent and number of purchases.

The amount spent feature tends to have much higher values, so a customer that has maybe purchased one time, but made a large purchase can be considered good, while a regular customer that has only made small purchases can be considered bad if they haven’t spent much money.

Standardization centers the data around the mean and makes it so that the standard deviation is 1. It doesn’t care about the range of the data. If you standardize the data, the range of values will most likely still be greater for the amount spent than it will be for the number of purchases, so the algorithm will still be biased towards the amount spent (because it will contribute much more to distance formula).

Min-max scaling, however, limits the range of the data, so that both the number of purchases and the total amount spent will range between 0 and 1. This way kNN won’t be biased towards any of the features.

Therefore, min-max scaling will work better for kNN. Having said this, I need to contradict it partially. As often is the case in data science, this ends up depending on my factors. If you only have one feature, then it’s unlikely to matter, since there is no bias towards another feature. If the original features have the same range of values, then standardization may not cause any harm either.