While working on “Finding the Best Markets to Advertise In”, I came across a bit of a problem: .mode doesn’t work with groupby, I’ve solved the problem but I am wondering if there are better solutions out there,
the whole story:
I was plotting Monthly spendings(‘Money_month’) vs Age data, I’ve started with mean:
ages_mean = normal.groupby('Age')['Money_month'].mean() plt.scatter(ages_mean.index.tolist(), ages_mean)
But we know that mean is not great for that dataset (too many extreme high-values)
So moved to median:
ages_median = normal.groupby('Age')['Money_month'].median() plt.scatter(ages_median.index.tolist(), ages_median)
That plot looked way better and more realistic so lets try the last one:
groupby doesn’t like mode! it won’t work, so how do we make a ‘mode’ plot with groupby ‘Age’?
quick search online give us this solution:
ages_mode = normal.groupby('Age').Money_month.apply(lambda x: x.mode())
but Matplotlib doesn’t want to swallow it, so I came up with this:
# list all ages in ascending order: list_of_ages = normal['Age'].value_counts().sort_index().index.tolist() # loop them, check value_counts for every age, sort the value_counts and extract the 1st value: mode_of_age =  for age in list_of_ages: mode_of_age.append(normal[normal['Age'] == age].Money_month.value_counts().index.tolist()) # create a dataframe out of these two lists age_mode_df = pd.DataFrame(list(zip(list_of_ages, mode_of_age)), columns =['Age', 'Mode']) plt.scatter(age_mode_df['Age'], age_mode_df['Mode'] ) plt.show()
And it works, Any ideas on improvement?