mean = 50
st_dev = 10
houses["dis"] = houses["z_merged"].apply(lambda x :((x*st_dev)+mean) )
mean_transformed = houses["dis"].mean()
stdev_transformed = houses["dis"].std(ddof = 0)
What I expected to happen:
I don’t understand this point
We are actually free to choose any values we want for mean & standard deviation , We want some more intuitive values for our two standardized distributions of index values, so let’s choose mean = 50 & standard deviation = 10
I think in this case it’s trying to almost set a percent (out of 100) value system. If 50 is the ‘middle’ or mean value, than anything greater than 50 is higher quality, lower than 50, lower quality. It leaves room on that kind of scale for really good values to be near (or maybe even exceed) 100, and vice versa for lower amounts.
What I was a little confused about is why we’re switching to ddof=0 again, but I’m guessing it’s because we’re doing the entire population again and not a standard deviation of a standard deviation, which is ddof=1.
To answer this, we have to start with the previous screen. Initially, the index values of the two companies were like this:
The initial problem was that the measurement system used by company 1 (index_1) was not comparable to the system used by company 2 (index_2). So we standardized it by transforming them into z-scores.
However, now the issue is that our values look like this:
While the values are good enough for comparison, it is not easy to communicate these values to non-technical audiences. So here, we have used this formula x = zσ + μ to convert the z-score into something that can be easily understood by a non-technical audience. Our choice of using \mu = 50 and \sigma = 10 is kind of a random choice (more on it below). And this is how our new values will look like:
Though, it’s not completely random. We have to ensure that the new values make sense to the non-technical audience. For example, if we use \mu = 10 and \sigma = 50, it will not make our scores intuitive:
Here, we have to experiment with a couple of values to ensure that the minimum values are at least greater than 0 to make it slightly intuitive. While there are many ways to find an intuitive range of values,
We can use a similar approach as above. Here is what I would suggest doing:
Define the value range (Ex: 0 - 100)
Set the middle value of the range as mean (Ex: \mu = 50)
Play around with the standard deviation values to ensure that the minimum and maximum values are within our defined range. (Ex: \sigma = 7.0054)
Thanks For reply
that’s mean we can choose any values for the mean & standard deviation
or they are depending to each other
ex: can I make mean = 50 & standard deviation = 20
or if I choose mean = 50 then standard deviation should be 10
as also I found there are another choice for mean& standard deviation
like this also on the slide
One practical example include transforming test scores for the SAT test using μ=500μ=500 and σ=110σ=110 or transforming IQ scores from different measurement systems using μ=100μ=100 and σ=15σ=15.
The main goal of selecting the mean and standard deviation here is to make sure that our minimum and maximum value are within but also close to our value range. While we can test it with any random mean and standard deviation values, personally, I found that setting the mean value to the middle value and playing with the standard deviation would make this process faster.