Help. Unable to understand part of code

Screen Link:

Kindly provide assistance for below code lines on mentioned issues,

  1. if price < genres_mean.loc[(aff, gc)][0]:
    In the above line of code, I am unable to understand that why we are using a tuple within the .loc[] and particularly what we will get using the index 0 from it?

  2. genres_mean = affordable_apps.groupby(
    [“affordability”, “genre_count”]
    In the above code we can also use [‘Price’] instead of using [[‘Price’]] right? If no then, what is the particular reason for using double sq. brackets?

1 Like

Select data frame using [ ]

We want to use methods of DataFrame. Hence, we use [["Price"]] .

Using [ ] result in a DataFrame

genres_mean = affordable_apps.groupby(
[“affordability”, “genre_count”]

Now, genres_mean is DataFrame.

Using without [ ] result in a Series

genres_mean = affordable_apps.groupby(
[“affordability”, “genre_count”]

Now, genres_mean is Series.

To check for type of object,


Using tuple to select rows from multi-index data frame

genres_mean = affordable_apps.groupby( ["affordability", "genre_count"]).mean().[["Price"]]

genres_mean is a DataFrame.

The genres_mean has hierarchical indexing (MultiIndexing).

Notes: Not all DataFrame has hierarchical indexing.

Multi-indexing due to .groupby( ["affordability", "genre_count"]).

affordability	genre_count	
 cheap	               1	             2.507448
                       2	             3.155672
reasonable	           1	             12.574627
                       2	             6.823333

To access each multi-index row, you have to use tuple format to indicate which row you are selecting.

A tuple is immutable and presented by ( , ) and each element inside the tuple is separate by comma ,. For example, ("cheap", 1) is a tuple.

Since we are using .loc, the values within the tuple must be labels.

The following are examples on how to select a particular multi-index row:

genres_mean.loc[("cheap", 1)]selects the first row in the data frame.
genres_mean.loc[("cheap", 2)]selects the second row in the data frame.

genres_mean.loc[("reasonable", 1)]selects the third row in the data frame.
genres_mean.loc[("reasonable", 2)]selects the fourth row in the data frame.

Further reads on multi-indexing pandas 0.22.0 documentation.

Selecting column by relative positional index

aff = row["affordability"]
gc = row["genre_count"]
genres_mean.loc[(aff, gc)][0]

(aff, gc) determines which multi-index row is selected.
[0] uses relative positional index to select the column. Column at the 0th index is selected. Price column is selected because Price is the 0th column.

Further reads on indexing and selection of data in pandas 0.22.0 documentation.


Hello world, I am still confused with this function, especially aff and gc. What are aff and gc exactly? I thought aff is affordable_apps[“affordability”] and gc=affordable_apps[“genre_count”], but I got an error when I tried to do genres_mean.loc[(aff,gc)][0]. Can anyone help me to explain this? Thank you

def label_genres(row):
aff = row[“affordability”]
gc = row[“genre_count”]
price = row[“Price”]
if price < genres_mean.loc[(aff, gc)][0]:
return 1
return 0