Screen Link: https://app.dataquest.io/m/283/sampling/9/choosing-the-right-strata
Hello. I’m not getting the plot I expected.
My Code:
strat1=wnba[wnba['MIN']<=347]
strat2=wnba[(wnba['MIN']>348) & (wnba['MIN']<=682)]
strat3=wnba[wnba['MIN']>682]
proportional_sampling_means=[]
for i in range(100):
sample1=strat1.sample(4,random_state=i)
sample2=strat2.sample(4,random_state=i)
sample3=strat3.sample(4,random_state=i)
final_sample=pd.concat([sample1,sample2,sample3])
proportional_sampling_means.append(final_sample['PTS'].mean())
plt.scatter(range(1,101),proportional_sampling_means)
plt.axhline(wnba['PTS'].mean())
What I expected to happen:
What actually happened:
I’ve found this post 283-9: Choosing the Right Strata. Sahil provided the following code for the proper plot output:
under_12 = wnba[wnba['MIN'] <= 350]
btw_13_22 = wnba[(wnba['MIN'] > 350) & (wnba['MIN'] <= 700)]
over_23 = wnba[wnba['MIN'] > 700]
proportional_sampling_means = []
for i in range(100):
sample_under_12 = under_12['PTS'].sample(4, random_state = i)
sample_btw_13_22 = btw_13_22['PTS'].sample(4, random_state = i)
sample_over_23 = over_23['PTS'].sample(4, random_state = i)
final_sample = pd.concat([sample_under_12, sample_btw_13_22, sample_over_23])
proportional_sampling_means.append(final_sample.mean())
plt.scatter(range(1,101), proportional_sampling_means)
plt.axhline(wnba['PTS'].mean())
plt.axis([-5, 105, 100, 350])
Before looking at sahil’s code, I’ve tried increasing the sample size, because I thought that would lower the variation, but apparently I’m wrong, because it didn’t change the scatter behavior. Can someone clarify what’s wrong with my train of thought here as well? Answer: I discovered that I didn’t increase it enough. Which brings me to my next question:
Side question: Doing this sampling method, I will have a chance of getting the same data over and over again, right Isn’t this bad for the analysis? Also, I suppose that increasing the sample size will increase the chance of getting duplicate data. Is that correct?
Every input is greatly appreciated. Thx =)