I noticed in the sample code below(and in the answer provided by DQ), double-brackets was used in genres_mean which creates a DataFrame which leads to specifying index 0 in the function’s if/else statement where:
if price < genres_mean.loc[(aff, gc)]:
Not at all necessary. Completely depends on how you wish to use it further.
Setting it as a DataFrame allows us to then use .loc to index it based on the two columns. That’s how they have their current approach designed, hence the use.
Your approach, however, is roughly 2-3 times slower than theirs if you time it and compare, at least for that specific Pandas version. So, on much larger datasets, it could be helpful to know this difference.
Please create a separate post to ask your question since it’s not related to this one. I would also recommend checking out existing questions corresponding to this Mission Step’s tag. There are a couple of such existing questions which should help you out as well.
…takes Price and adds it as a column on the df.groupby() result as a DataFrame. But, tweaking with the code’s output, I found that genres_mean = affordable_apps.groupby(["affordability", "genre_count"])[["Price"]].mean() gives exactly the same result. Do you know why is that?
Wouldn’t it be more readable to have the code layed out like this? Would it affect as a processing time constraint?
Thank you in advance for your feedback. Take care!