Link to the screen:
Hello everyone, this is my first time posting to the discussion community, sorry if the structure of my post does not follow the structure correctly.
I have an alternate solution to the assignment, or should I say the corrected one, since there is this note in the assignment text: Make sure your function is flexible enough to compute z-scores for both samples and populations.
So this is my code, the differences with DQ solution is in the definition of the zscores function:
min_val = houses['SalePrice'].min() mean_val = houses['SalePrice'].mean() max_val = houses['SalePrice'].max() def zscores(value, array, population_or_sample): mean_val = sum(array)/len(array) st_dev = array.std(ddof = population_or_sample) distance = value - mean_val return distance/st_dev min_z = zscores(min_val, houses['SalePrice'], 0) max_z = zscores(max_val, houses['SalePrice'], 0) mean_z = zscores(mean_val, houses['SalePrice'], 0)
When passing ‘population_or_sample’ as the third argument in the function, it is much easier to make the function flexible for calculating the standard deviation for either population or sample, passing it to ddof
And the question here is why should we use numpy’s std() and mean() instead of the ones from pandas?
If I understood correctly, these numpy’s functions are suitable for passing any array like data, while pandas std() and mean() operate only on Series?
But in this case we have
houses['SalePrice'] that is a series so no need for numpy’s functions?
Thank you in advance for the answer.