What's happening here? .loc with indexing

Screen Link:
https://app.dataquest.io/m/467/communicating-results/7/price-vs-category-and-genres

My Code:

categories_mean = affordable_apps.groupby(["affordability", "Category"]).mean()[["Price"]]

def label_categories(row):
    aff = row['affordability']
    cat = row['Category']
    price = row['Price']
    
    if price < categories_mean.loc[(aff, cat)][0]:
        return 1
    else:
        return 0
affordable_apps['category_criterion'] = affordable_apps.apply(label_categories, axis="columns")

What I expected to happen:
I just want to understand what’s happening here: “categories_mean.loc[(aff, cat)][0]”

What actually happened:
I copied this code from the example, but I dont quite understand what is happening there. I know that ‘aff’ is the row and ‘cat’ is the column when using Dataframe.loc but what’s up with indexing [0]? Can anyone explain to me what’s happening there please?


This is what categories_mean looks like -

So, for a particular affordability and Category value,

categories_mean.loc[(aff, cat)]

Has the following output -

Price 6.823333
Name: (reasonable, 2), dtype: float64

As you can see, (reasonable, 2) is the (aff, cat) value, and then you have the corresponding Price for it.

As you know, in Python, you can use indexing to extract values at certain indices. So,

categories_mean.loc[(aff, cat)][0]

That [0] allows us to extract the value of the Price (which would be 6.823333 as per the above example) and then compare it to the price variable.

4 Likes

This is what my categories_mean dataframe look like. What you posted above is for genres_mean… maybe that doesn’t matter. Would you share on how printed output? I wasn’t able to print it, so I couldn’t even see what was going on… Thank you so much in advance!!

Yes, it doesn’t matter. Because you will still get the Price value based on the (aff, cat) pair in the format I showed above.

Just a straightforward print statement in your loop -

print(categories_mean.loc[(aff, cat)])

or

print(categories_mean.loc[(aff, cat)][0])
1 Like

Thank you so much! I keep forgetting that I can print in my loop… Huge help @the_doctor

1 Like

Well explained! Thanks!

So, this is a different kind of dataset? I mean, usually we have just one index per row and one per column. In this dataset, is like having two indexes per row. isn’i it?

You can call it a dataset, although I am not sure if that’s accurate or not because it’s a subset of the data you are working with.

But, yes, since you are grouping the data, you are accessing the value using the two grouped columns as indices.