Z-scores : 3. Alternate solution and a question

Link to the screen:

Hello everyone, this is my first time posting to the discussion community, sorry if the structure of my post does not follow the structure correctly.

I have an alternate solution to the assignment, or should I say the corrected one, since there is this note in the assignment text: Make sure your function is flexible enough to compute z-scores for both samples and populations.

So this is my code, the differences with DQ solution is in the definition of the zscores function:

min_val = houses['SalePrice'].min()
mean_val = houses['SalePrice'].mean()
max_val = houses['SalePrice'].max()

def zscores(value, array, population_or_sample):
    mean_val = sum(array)/len(array)
    st_dev = array.std(ddof = population_or_sample)
    distance = value - mean_val
    return distance/st_dev

min_z = zscores(min_val, houses['SalePrice'], 0)
max_z = zscores(max_val, houses['SalePrice'], 0)
mean_z = zscores(mean_val, houses['SalePrice'], 0)

When passing ‘population_or_sample’ as the third argument in the function, it is much easier to make the function flexible for calculating the standard deviation for either population or sample, passing it to ddof

And the question here is why should we use numpy’s std() and mean() instead of the ones from pandas?
If I understood correctly, these numpy’s functions are suitable for passing any array like data, while pandas std() and mean() operate only on Series?
But in this case we have houses['SalePrice'] that is a series so no need for numpy’s functions?

Thank you in advance for the answer.

Kind regards,



Think numpy is just faster, and in the future it will make a difference(when your dataset rows are counted in billions), btw it does generate different results, here’s an article about it:

article about speed of numpy vs pandas:

Thank you very much for the answer, I will definitely read the articles you sent.
Best regards,