Using DataFrame.loc versus filtering with brackets

I’ve noticed that the answers in the lessons tend to use DataFrame.loc, for example:

cheap_mean = affordable_apps.loc[cheap, "Price"].mean()

When my inclination is to use

cheap_mean = affordable_apps[cheap]["Price"].mean()

My answers usually pass the solution check. I’m wondering if there is any particular reason to use .loc instead of brackets. I did find a this post, which confirms they are the same in certain situations, but there are things you can do with .loc that you can’t do with brackets (selecting a single row or slice of rows, slicing columns). But if I don’t need to do those things is it OK to use brackets or is it just better practice to use .loc?

I just thought of another question. Not sure if it should be a different topic (I can split it off if I need to). On that same screen, the answer key shows

affordable_apps.loc[cheap, "price_criterion"] = affordable_apps["Price"].apply(
    lambda price: 1 if price < cheap_mean else 0)

But isn’t that redundant? why filter for only the cheap apps if none of the reasonable apps are < $5? Since cheap_mean < 5, none of them will get marked with a 1 anyway. I tried

affordable_apps["price_criterion"] = affordable_apps["Price"].apply(
    lambda price: 1 if price < cheap_mean else 0)

and even

affordable_apps.loc[:, "price_criterion"] = affordable_apps["Price"].apply(
    lambda price: 1 if price < cheap_mean else 0 

And neither passed. Is my thinking wrong? The code wrong? Or just some quirk of the checking system that it doesn’t pass?

I think that’s perfectly fine.

Because there is a separate instruction for reasonable apps:

Repeat instructions 1 and 2 for the reasonable apps. Assign the mean to reasonable_mean.

The exercise ends up focusing on separate subsets of data:

  1. If price < cheap_mean it will be marked as 1.
  2. If cheap_mean <= price < 5 it will be marked as 0.
  3. If 5 <= price < reasonable_mean it will be marked as 1.
  4. Ifprice >= reasonable_mean it will be marked as 0.

Your approach focuses on just the following (assuming you add the code for reasonable_mean as well):

  1. If price < reasonable_mean it will be marked as 1.
  2. Ifprice >= reasonable_mean it will be marked as 0.

I don’t quite remember the rest of the lesson to know which one is better here, honestly.

Based on just a quick glance, it’s unclear why that 2nd step in their approach would be needed. It is possible, yours is a better approach but it depends on why all apps cheaper than the reasonable_mean should be considered instead of having separate buckets depending on the pricing. If the latter is needed, then it might be helpful to have a separate column for tracking things for reasonable apps instead of combining them as per the instructions.

If you think that your approach makes more sense from an analysis viewpoint than the given instructions, I would recommend writing it down and sharing that feedback with them directly and they can look into it accordingly.

I will try to see if their approach is better or not by going over the content again, but I can’t guarantee a timeframe right now.

I’m thinking of submitting a suggestion, but want to make sure I’m not missing something first. To my mind, both steps are needed since they use different means, but the first step could be simplified because if you mark all apps with price < cheap_mean, none of the reasonable apps will be marked one because cheap mean < $5.