Help for the variability in the plots

Screen Link:

My Code:

import pandas as pd
import numpy as np

strata_1 = wnba[wnba['MIN']<=347]
strata_2 = wnba[(wnba['MIN']>347) & (wnba['MIN']<=683)]
strata_3 = wnba[wnba['MIN']>683]

sample_means = []

for i in range(100):
    sample_1 = strata_1['PTS'].sample(4,random_state=i)
    sample_2 = strata_2['PTS'].sample(4,random_state=i)
    sample_3 = strata_3['PTS'].sample(4,random_state=i)
    final_sample = pd.concat([sample_1,sample_2,sample_3])
    sample_means.append(final_sample.mean())

# Outside loop
plt.scatter(x=np.arange(1,101),y=sample_means)
plt.axhline(y=wnba['PTS'].mean())
plt.title('Minutes Played')
print(sample_means)

What I expected to happen:
To get less variability in the above stratified sampled scatterplot.

What actually happened:

The correlation between :

wnba['PTS'] and wnba['Games Played'] = 0.579
wnba['PTS'] and wnba['Games Played'] = 0.911

Thus, the scatterplot for stratified sampling done above should have less variability whereas, it shows high variability than the scatterplot for( *wnba['PTS'] and wnba['Games Played']* even after having less correlation)

Hi!
It might be somewhat late, but I´m doing this course now and I have seen that your question were left without an answer.
In case you haven´t figured it out yet…
Actually the variability in your plot is less than in the ones recieved in the previous missions, the problem is in the limits of y-axis on the plot. In the previous plot the y-axis was set in limits from approx. 80 to 350, while this plot is autamically zoomed in, having the y-axis limits set from 140 to 280.
If you check min and max means on your plot, they are 160 and 260 while earlier they were approx. 120 and 340.

Add plt.ylim(80, 350) to your code and you´ll see that the scatterplot is a way more grouped around the mean horizontal line.

2 Likes

This should be marked as the solution, OP!