CYBER WEEK - EXTRA SAVINGS EVENT
TRY A FREE LESSON

Z-scores_Converting Back from Z-scores

Screen Link:

My Code:

mean = 50 
st_dev = 10
houses["dis"] = houses["z_merged"].apply(lambda x :((x*st_dev)+mean) )
mean_transformed = houses["dis"].mean()
stdev_transformed = houses["dis"].std(ddof = 0)

What I expected to happen:
I don’t understand this point
We are actually free to choose any values we want for mean & standard deviation , We want some more intuitive values for our two standardized distributions of index values, so let’s choose mean = 50 & standard deviation = 10
,

What actually happened:

Replace this line with the output/error

based on What he choose these values?

2 Likes

Yeah, this was not clear to me either as to why those values were more intuitive and how to identify the values through intuition in such a case.

@Sahil If it’s not too much trouble, is it possible to have a response from the content author(s) about this?

1 Like

I think in this case it’s trying to almost set a percent (out of 100) value system. If 50 is the ‘middle’ or mean value, than anything greater than 50 is higher quality, lower than 50, lower quality. It leaves room on that kind of scale for really good values to be near (or maybe even exceed) 100, and vice versa for lower amounts.

What I was a little confused about is why we’re switching to ddof=0 again, but I’m guessing it’s because we’re doing the entire population again and not a standard deviation of a standard deviation, which is ddof=1.

1 Like

Hi @the_doctor, @esramgamal, @ghighcove,

To answer this, we have to start with the previous screen. Initially, the index values of the two companies were like this:

index_1 index_2 SalePrice
0 NaN -0.411111 215000
1 38.05 NaN 105000
2 NaN -0.888889 172000
3 39.44 NaN 244000
4 NaN -0.690000 189900

The initial problem was that the measurement system used by company 1 (index_1) was not comparable to the system used by company 2 (index_2). So we standardized it by transforming them into z-scores.

However, now the issue is that our values look like this:

z_1 z_2
0 NaN 0.429742
1 -0.935920 NaN
2 NaN -0.114456
3 0.786063 NaN
4 NaN 0.112082

While the values are good enough for comparison, it is not easy to communicate these values to non-technical audiences. So here, we have used this formula x = zσ + μ to convert the z-score into something that can be easily understood by a non-technical audience. Our choice of using \mu = 50 and \sigma = 10 is kind of a random choice (more on it below). And this is how our new values will look like:

0    54.297418
1    40.640797
2    48.855438
3    57.860626
4    51.120821
Name: transformed, dtype: float64
Min:  29.217360116843054 Max:  121.37299126210257

Though, it’s not completely random. We have to ensure that the new values make sense to the non-technical audience. For example, if we use \mu = 10 and \sigma = 50, it will not make our scores intuitive:

0    31.487092
1   -36.796016
2     4.277190
3    49.303132
4    15.604103
Name: transformed, dtype: float64
Min:  -93.91319941578475 Max:  366.86495631051287

Here, we have to experiment with a couple of values to ensure that the minimum values are at least greater than 0 to make it slightly intuitive. While there are many ways to find an intuitive range of values,

We can use a similar approach as above. Here is what I would suggest doing:

  1. Define the value range (Ex: 0 - 100)
  2. Set the middle value of the range as mean (Ex: \mu = 50)
  3. Play around with the standard deviation values to ensure that the minimum and maximum values are within our defined range. (Ex: \sigma = 7.0054)

Here is what we would get with the above values:

0    53.010513
1    43.443504
2    49.198189
3    55.506683
4    50.785180
Name: transformed, dtype: float64
Min:  35.44092945625323 Max:  99.99963529875333

And if we round them, it would look better (Note: We will not be able to reverse it to original values if we use the round function. So make sure to keep a copy of the original values):

0    53
1    43
2    49
3    56
4    51
Name: transformed, dtype: int64
Min:  35 Max:  100

Let me know if it’s still not clear. I would request the content author to comment on it.

Best,
Sahil

1 Like

Hi,
Thanks For reply
that’s mean we can choose any values for the mean & standard deviation
or they are depending to each other
ex: can I make mean = 50 & standard deviation = 20
or if I choose mean = 50 then standard deviation should be 10
as also I found there are another choice for mean& standard deviation
like this also on the slide
One practical example include transforming test scores for the SAT test using μ=500μ=500 and σ=110σ=110 or transforming IQ scores from different measurement systems using μ=100μ=100 and σ=15σ=15.

Thanks

Hi @esramgamal,

The main goal of selecting the mean and standard deviation here is to make sure that our minimum and maximum value are within but also close to our value range. While we can test it with any random mean and standard deviation values, personally, I found that setting the mean value to the middle value and playing with the standard deviation would make this process faster.