If a selection pulls up one data point, why do I need to stipulate "[0]"?

Screen Link:

https://app.dataquest.io/m/467/communicating-results/7/price-vs-category-and-genres

My Code:

genres_mean = affordable_apps.groupby(
    ["affordability", "genre_count"]
).mean()[["Price"]]


def label_genres(row):
    """For each segment in `genres_mean`,
    labels the apps that cost less than its segment's mean with `1`
    and the others with `0`."""

    aff = row["affordability"]
    gc = row["genre_count"]
    price = row["Price"]

    if price < genres_mean.loc[(aff, gc)][0]:
        return 1
    else:
        return 0

affordable_apps["genre_criterion"] = affordable_apps.apply(
    label_genres, axis="columns"
)

Explanation

I understand that what is happening in the function is that we want to determine whether the price point for a specific APP that has a certain affordability (AFF) and genre count (GC) if greater than or less than the average price for an app with the stipulated AFF and GC.

I understand that the genre_mean DF is a multi-level DF, therefore we use a tuple to select the mean price for the said AFF and GC conditions:

   if price < genres_mean.loc[(AFF, GC)][0]: ...

What I don’t understand is the need to include the [0] after the selection? As I understand it, the selection without the [0] returns a single data point, but if I remove the [0] I get an error as follows:

ValueError: ('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', 'occurred at index 0')

The odd thing is, if I try to make the selection outside of the boolean comparison, so I make a selection as follows:

selection = genres_mean.loc[(AFF, GC)][0]

I get an error. However, if I make the regular selection without the [0], it pulls the single data point I am looking for.

The provision of clarity on this discrepancy would be forever appreciated :slight_smile:

1 Like

Hi @johnedwardferreira5,

When you only use genres_mean.loc[(AFF, GC)] it returns a Series

Price 6.823333
Name: (reasonable, 2), dtype: float64
<class ‘pandas.core.series.Series’>

and when you use genres_mean.loc[(AFF, GC)][0] it returns a 1-sized numpy array

6.823333333333333
<class ‘numpy.float64’>

That is the reason the boolean comparison fails without zero (cannot compare an int to a Series) and when you use it outside of the comparison it works.

Hope this answers your question!

2 Likes

Thank you! you da bomb!

1 Like

:grinning:

Happy to help!