Alternative solution for stratified sampling (Sampling - screen 7)

I am apparently not the first to note that the provided solution seems a bit clunky. Here is another alternative that is simple and uses the series.unique() method:

wnba["Points/Game"] = wnba["PTS"]/wnba["Games Played"]

points_per_position = {}
# iterate over all unique positons
for position in wnba["Pos"].unique():
    # get sample for current position
    sample = wnba[wnba["Pos"] == position].sample(10, random_state = 0)
    # get average points/game from current sample and store in dict
    points_per_position[position] = sample["Points/Game"].mean()    
     
position_most_points = max(points_per_position, key=points_per_position.get)

Screen Link:
https://app.dataquest.io/m/283/sampling/7/stratified-sampling

7 Likes

Hello,
This is my alternative answer, using groupby()

wnba['scored_per_game'] = wnba['PTS']/wnba['Games Played']
def sampling(df):
    sample=df.sample(n=10, random_state = 0)
    return sample['scored_per_game'].mean()
positions_scores=wnba.groupby('Pos').apply(sampling).to_dict()
position_most_points = max(positions_scores, key = positions_scores.get)

Any advice is appreciated. :grinning: