Statistics Fundamentals: Sampling 9/14

I cannot get the plot to provide results similar to the images. My results are not nearly as close to the actual average.

print(wnba[‘MIN’].value_counts(bins = 3, normalize = True))

means =
for i in range(100):
low = wnba[wnba[‘MIN’] <= 347.333]
mid = wnba[(wnba[‘MIN’] > 347.333) & (wnba[‘Games Played’] <= 682.667)]
high = wnba[wnba[‘MIN’] > 682.667]
low_stratum = low.sample(4, random_state=i)
mid_stratum = mid.sample(4, random_state=i)
high_stratum = high.sample(4, random_state=i)

stratum = pd.concat([low_stratum, mid_stratum, high_stratum])

plt.scatter(range(1,101), means)

Here is the code. Could someone explain what I am doing wrong or provide the right analysis to get the data with less sampling errors (like in the photo)?

Thank you.

Hi Isaiah, welcome to our community!

Check out this line in your code.
mid = wnba[(wnba['MIN'] > 347.333) & (wnba['Games Played'] <= 682.667)]

1 Like

Hi Isaiah,

After correcting the syntax error that April pointed out try setting the range of the x and y axis to equal that in the example picture. I noticed when I reproduced the graph you made the x-axis ranged from -20 to 120 and the y-axis was 140 to 280. This makes the values seem far more spread out.

As a reminder you can set the range of the x and y axis as follows:


I couldn’t get the exact same results as the example picture, but this way your y- and x-axis are the same as the example picture and now you can see that your values are a lot closer to the mean than they originally appeared.

Hope this helps!

1 Like