Stratified sampling - alternative solution, only pandas methods used

Screen Link: Learn data science with Python and R projects

Considering that we are working with a pandas DataFrame, I decided to take full advantage of pandas.
Here´s my alternative solution to this mission:

wnba['pts_per_game'] = wnba['PTS'] / wnba['Games Played']

#compile a dataframe of samples of each stratum
stratum_samples = wnba.groupby('Pos', group_keys=False).apply(lambda x: x.sample(n=10, random_state=0))

#calculate the mean of 'pts_per_game' per position
points_per_position = stratum_samples.groupby('Pos')['pts_per_game'].mean()

#get the index with the maximum value
position_most_points = points_per_position.idxmax()

NB: in a newer version of pandas (if to compare to the one used in DQ environment) it´s possible to apply .sample() directly on a groupby object without .apply(lambda x: ...)

2 Likes