genres_mean = affordable_apps.groupby( ["affordability", "genre_count"] ).mean()[["Price"]] def label_genres(row): """For each segment in `genres_mean`, labels the apps that cost less than its segment's mean with `1` and the others with `0`.""" aff = row["affordability"] gc = row["genre_count"] price = row["Price"] if price < genres_mean.loc[(aff, gc)]: return 1 else: return 0 affordable_apps["genre_criterion"] = affordable_apps.apply( label_genres, axis="columns" )
I understand that what is happening in the function is that we want to determine whether the price point for a specific APP that has a certain affordability (AFF) and genre count (GC) if greater than or less than the average price for an app with the stipulated AFF and GC.
I understand that the genre_mean DF is a multi-level DF, therefore we use a tuple to select the mean price for the said AFF and GC conditions:
if price < genres_mean.loc[(AFF, GC)]: ...
What I don’t understand is the need to include the
 after the selection? As I understand it, the selection without the
 returns a single data point, but if I remove the
 I get an error as follows:
ValueError: ('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', 'occurred at index 0')
The odd thing is, if I try to make the selection outside of the boolean comparison, so I make a selection as follows:
selection = genres_mean.loc[(AFF, GC)]
I get an error. However, if I make the regular selection without the
, it pulls the single data point I am looking for.
The provision of clarity on this discrepancy would be forever appreciated