Screen Link:
My Code:
affordable_apps["genre_count"] = affordable_apps["Genres"].str.count(";")+1
genres_mean = affordable_apps.groupby(
["affordability", "genre_count"]
).mean()[["Price"]]
def label_genres(row):
"""For each segment in `genres_mean`,
labels the apps that cost less than its segment's mean with `1`
and the others with `0`."""
aff = row["affordability"]
gc = row["genre_count"]
price = row["Price"]
if price < genres_mean.loc[(aff, gc)][0]:
return 1
else:
return 0
affordable_apps["genre_criterion"] = affordable_apps.apply(
label_genres, axis="columns"
)
categories_mean = affordable_apps.groupby(['affordability', 'Category']).mean()['Price']
def label_categories(row):
price = row['Price']
category = row['Category']
affordability = row['affordability']
if price < categories_mean.loc[(affordability, category)][0]:
return 1
else:
return 0
affordable_apps['category_criterion'] = affordable_apps.apply(label_categories, axis="columns")
What I expected to happen:
I should return the category criterion with 1 or 0 based on the function.
What actually happened:
IndexError: ('invalid index to scalar variable.', 'occurred at index 0')
What is confusing is the former function ‘label_genres’ works just fine with the “[0]” when looking up the table values. When I omit “[0]” from my “label_categories” function, I get the code to work. My hypothesis is that there is an issue with data types, “genre_count” being an integer versus ‘Category’ being an object. Regardless, I am confused why one version works and the other does not.