Stratified Sampling, Screen 7

According to instructions,

My Code:

wnba["Pts_per_game"] = wnba["PTS"] / wnba["Games Played"]
grouped = wnba.groupby('Pos')
# Loop through the strata, for each stratum.
lst = []    # List to store index 
samples_mean = {}   # Dict to store mean corresponding to Pos value. 
for pos, index in grouped.groups.items():
    for each in index:
        lst.append(each)
    req_df = wnba.iloc[lst, :]
    sample_mean = req_df["Pts_per_game"].sample(10, random_state=0).mean()
    samples_mean[pos] = sample_mean
    
position_most_points = max(samples_mean, key=samples_mean.get)

Answer Code:

wnba['Pts_per_game'] = wnba['PTS'] / wnba['Games Played']

# Stratifying the data in five strata
stratum_G = wnba[wnba.Pos == 'G']
stratum_F = wnba[wnba.Pos == 'F']
stratum_C = wnba[wnba.Pos == 'C']
stratum_GF = wnba[wnba.Pos == 'G/F']
stratum_FC = wnba[wnba.Pos == 'F/C']

points_per_position = {}
for stratum, position in [(stratum_G, 'G'), (stratum_F, 'F'), (stratum_C, 'C'),
                (stratum_GF, 'G/F'), (stratum_FC, 'F/C')]:
    
    sample = stratum['Pts_per_game'].sample(10, random_state = 0) # simple random sapling on each stratum
    points_per_position[position] = sample.mean()
    
position_most_points = max(points_per_position, key = points_per_position.get)

Please take a look at the Stratified Sampling instructions step 7.
https://app.dataquest.io/m/283/sampling/7/stratified-sampling

Help me,
Why my code is giving a different answer than the answer code? If both are storing the same index corresponding to required rows, according to Pos value.

Thank You :slightly_smiling_face:

1 Like

I think the issue is that lst lives outside the loop. If you insert print(lst) in the loop after samples_mean[pos] = sample_mean, you can see that after each iteration through the grouped.groups.items() list, lst gets longer and longer, until at the very end when it contains all of the indices of the dataframe. Because lst is outside the loop, it has retained all the information from the previous iteration. Let me know if that doesn’t make sense. :slight_smile:

2 Likes

Yes, I forget about the list getting longer and storing the index of the previous loop. And I am using this list to extract required rows, which will not give the correct rows.

Yeah, after defining the list inside the loop, it works correctly.

Thank You @april.g :slightly_smiling_face:

2 Likes