BLACK FRIDAY EXTRA SAVINGS EVENT - EXTENDED
START FREE

Z-scores : 8. Using Standardization for Comparisons

Screen Link:
https://app.dataquest.io/m/309/z-scores/8/using-standardization-for-comparisons

Here our task is to help a client to choose between the first and the second house (from the table above). We need to compare the index scores (which are provided by two different companies). The first house has index_1 of NaN , the second house has index_1 of 38.05 where as first house has index_2 of -0.41111, second house has index_2 of NaN. Under such a scenario how can we make a comparison and conclude one house is better than other, since NaN makes no sense here?

As per my understanding one company has not rated house one (as per index_1) while the other company has not rated house two (as per index_2). That means one house each has not been rated by both the companies. So how can we make a comparison in such a case?

Please elaborate.

Output
image

1 Like

That means one house each has not been rated by both the companies

You reached this right conclusion here, but that’s exactly the problem we’re trying to solve.

We want to compare the first house with the second house to choose the better one. The first house has a NaN index_1 and a -0.41111 index_2. The second house has a 38.05 index_1 and a NaN index_2.

Unfortunately, we can’t compare directly -0.41111 with 38.05 because they come from different measurement systems (different companies evaluated the houses).

The solution here is to standardize the index_1 and index_2, which means transforming all values in these two columns to z-scores.

As a result of the standardization, the -0.4111 score becomes +0.429742 (this is a z-score), and the 38.05 of the second house becomes -0.93592 (also a z-score).

In terms of z-scores, the higher the z-score the better the house, so the first house is better since +0.42972 is greater than -0.93592.

7 Likes

Thanks for the explanation.

1 Like