Screen Link:
My Code:
‘’’
import numpy as np
import matplotlib.pyplot as plt
chi_squared_values =
for n in range(1000):
random_numbers = np.random.choice([0, 1], size=32561, replace=True)
binary_counts = np.bincount(random_numbers)
male_count = binary_counts[0]
female_count = binary_counts[1]
male_diff = ((male_count - 16280.5) ** 2) / 16280.5
female_diff = ((female_count - 16280.5) ** 2) / 16280.5
chi_squared = male_diff + female_diff
chi_squared_values.append(chi_squared)
plt.hist(chi_squared_values)
‘’’
What I expected to happen:
The corrrect chi_squared_values
What actually happened:
Different values
I (sloppily) recreated the above code using np.random.rand() and I got the correct values, but I don’t understand what happens under the hood to give a different output. Correct code:
‘’’
import numpy as np
import matplotlib.pyplot as plt
chi_squared_values =
for n in range(1000):
random_numbers = np.random.rand(32561)
binary_numbers =
for number in random_numbers:
if number >= 0.5:
number = 1
binary_numbers.append(number)
else:
number = 0
binary_numbers.append(number)
male_count = binary_numbers.count(0)
female_count = binary_numbers.count(1)
male_diff = ((male_count - 16280.5) ** 2) / 16280.5
female_diff = ((female_count - 16280.5) ** 2) / 16280.5
chi_squared = male_diff + female_diff
chi_squared_values.append(chi_squared)
plt.hist(chi_squared_values)
‘’’