Screen Link:
https://app.dataquest.io/m/467/communicating-results/7/price-vs-category-and-genres
My Code:
categories_mean = affordable_apps.groupby(["affordability", "Category"]).mean()[["Price"]]
def label_categories(row):
aff = row['affordability']
cat = row['Category']
price = row['Price']
if price < categories_mean.loc[(aff, cat)][0]:
return 1
else:
return 0
affordable_apps['category_criterion'] = affordable_apps.apply(label_categories, axis="columns")
What I expected to happen:
I just want to understand what’s happening here: “categories_mean.loc[(aff, cat)][0]”
What actually happened:
I copied this code from the example, but I dont quite understand what is happening there. I know that ‘aff’ is the row and ‘cat’ is the column when using Dataframe.loc but what’s up with indexing [0]? Can anyone explain to me what’s happening there please?
This is what categories_mean
looks like -
So, for a particular affordability
and Category
value,
categories_mean.loc[(aff, cat)]
Has the following output -
Price 6.823333
Name: (reasonable, 2), dtype: float64
As you can see, (reasonable, 2)
is the (aff, cat)
value, and then you have the corresponding Price
for it.
As you know, in Python, you can use indexing to extract values at certain indices. So,
categories_mean.loc[(aff, cat)][0]
That [0]
allows us to extract the value of the Price
(which would be 6.823333
as per the above example) and then compare it to the price
variable.
4 Likes
This is what my categories_mean dataframe look like. What you posted above is for genres_mean… maybe that doesn’t matter. Would you share on how printed output? I wasn’t able to print it, so I couldn’t even see what was going on… Thank you so much in advance!!
Yes, it doesn’t matter. Because you will still get the Price
value based on the (aff, cat)
pair in the format I showed above.
Just a straightforward print
statement in your loop -
print(categories_mean.loc[(aff, cat)])
or
print(categories_mean.loc[(aff, cat)][0])
1 Like
Thank you so much! I keep forgetting that I can print in my loop… Huge help @the_doctor
1 Like
So, this is a different kind of dataset? I mean, usually we have just one index per row and one per column. In this dataset, is like having two indexes per row. isn’i it?
You can call it a dataset, although I am not sure if that’s accurate or not because it’s a subset of the data you are working with.
But, yes, since you are grouping the data, you are accessing the value using the two grouped columns as indices.