Not getting the new 'price_criterion' column into the affordable_apps table

Screen Link:
https://app.dataquest.io/m/467/communicating-results/6/price-vs-rating

My Code:


cheap_mean=affordable_apps[cheap]["Price"].mean()

def label(x):
    if x < cheap_mean:
        return 1
    else:
        return 0

affordable_apps[cheap]["price_criterion"]=affordable_apps[cheap]["Price"].apply(label)

What I expected to happen:
Create a new column called ‘price_criterion’ which applies 1 or 0 values based on the price

What actually happened:

  • The new column does not show up in the affordable_apps table at all
  • We have been taught to use df[“new column”]= … as our basis when we are creating a new column. Why are we told to use .loc(row, column) now in this case? what difference does this make?
5 Likes

This requires an understanding of how Pandas functions internally (underlying code).

The approach of using affordable_apps[cheap]["price_criterion"] is called chained indexing. To keep it simple and brief - it can end up creating a temporary object and not an actual column in affordable_apps.

The above behavior is also related to the SettingWithCopy Warnings that have been mentioned before in the content.

In the end, it’s a matter of pre-defined behavior and how it works in Pandas. Using .loc, as indicated, allows you to index rows based on the mask and also create (or later, modify) that new column.

If you wish to get a deeper understanding of this, you can check out these two resources -

3 Likes