While working on “Finding the Best Markets to Advertise In”, I came across a bit of a problem: .mode doesn’t work with groupby, I’ve solved the problem but I am wondering if there are better solutions out there,
the whole story:
I was plotting Monthly spendings(‘Money_month’) vs Age data, I’ve started with mean:
ages_mean = normal.groupby('Age')['Money_month'].mean()
plt.scatter(ages_mean.index.tolist(), ages_mean)
But we know that mean is not great for that dataset (too many extreme high-values)
So moved to median:
ages_median = normal.groupby('Age')['Money_month'].median()
plt.scatter(ages_median.index.tolist(), ages_median)
That plot looked way better and more realistic so lets try the last one:
normal.groupby('Age')['Money_month'].mode()
groupby doesn’t like mode! it won’t work, so how do we make a ‘mode’ plot with groupby ‘Age’?
quick search online give us this solution:
ages_mode = normal.groupby('Age').Money_month.apply(lambda x: x.mode())
but Matplotlib doesn’t want to swallow it, so I came up with this:
# list all ages in ascending order:
list_of_ages = normal['Age'].value_counts().sort_index().index.tolist()
# loop them, check value_counts for every age, sort the value_counts and extract the 1st value:
mode_of_age = []
for age in list_of_ages:
mode_of_age.append(normal[normal['Age'] == age].Money_month.value_counts().index.tolist()[0])
# create a dataframe out of these two lists
age_mode_df = pd.DataFrame(list(zip(list_of_ages, mode_of_age)), columns =['Age', 'Mode'])
plt.scatter(age_mode_df['Age'], age_mode_df['Mode'] )
plt.show()
And it works, Any ideas on improvement?