Distance above and below the mean of a distribution

Screen Link:
https://app.dataquest.io/m/305/the-mean/3/the-mean-as-a-balance-point

My Code:

from numpy.random import randint, seed

equal_distances = 0
for i in range(5000):
    seed(i)
    distribution = randint(0, 1000, 10)
    dist_mean = sum(distribution) / len(distribution)
    low_sum = 0
    up_sum = 0
    for value in distribution:
        if value == dist_mean:
            continue
        if value < dist_mean:
            low_sum += round((dist_mean - value), 1)
        elif value > dist_mean:
            up_sum += round((value - dist_mean), 1)
        
    
    if (low_sum == up_sum):
        equal_distances += 1

What I expected to happen:

equal_distances should be 5000
What actually happened:

equal_distances = 4021

I need to measure the total distances below the mean and the total distances above the mean for 5000 different distributions to check if they are equal. For some reason I’m only being able to get 4021 and not 5000…

1 Like

Try using the round() after you are done summing the values in distribution, the rounding may be affecting the result. Try it like this:

from numpy.random import randint, seed

equal_distances = 0
for i in range(5000):
    seed(i)
    distribution = randint(0, 1000, 10)
    dist_mean = sum(distribution) / len(distribution)
    low_sum = 0
    up_sum = 0
    for value in distribution:
        if value == dist_mean:
            continue
        if value < dist_mean:
            low_sum +=(dist_mean - value)
        elif value > dist_mean:
            up_sum +=(value - dist_mean)
        
    
    if round(low_sum, 1) == round(up_sum, 1):
        equal_distances += 1

Thanks otavios, that worked. I still don’t understand why my approach didn’t haha.

1 Like

That was really tough to figure out actually :sweat_smile:

What I think is that by rounding every time you add a new value, the final value might be a little different than expected value.

1 Like