Why use ".loc"?

Screen Link:
https://app.dataquest.io/m/291/introduction-to-pandas/7/selecting-columns-from-a-dataframe-by-label-continued

My Code:

countries = f500["country"]
revenues_years = f500[["revenues","years_on_global_500_list"]]
ceo_to_sector = f500[:,"ceo":"sector"]

What I expected to happen:
get a slice of ceo to sector columns

What actually happened:
I needed to use f500.loc instead of just f500

Replace this line with the output/error

Why do I not need .loc for manipulating other dataframe data but I do to slice it?

2 Likes

loc() has been explained in the 5th Step of the Mission - https://app.dataquest.io/m/291/introduction-to-pandas/5/selecting-a-column-from-a-dataframe-by-label

I would recommend going through that again if you are confused about when to use loc().

But, I think, this is more of a “how Pandas works” situation which isn’t really helpful to go into more details since it would require trying to understand the underlying code and how the library was created/structured/designed.

These are two distinct use-cases as per the library. One allows you to select specific columns from the DataFrame with all their rows, and one allows you to select specific columns and specific rows from those columns.

Plus, loc can also be helpful if your DataFrame’s Index is not just the row numbers. This might be a more advanced concept/use-case so you can ignore that for now.

3 Likes

A summary of techniques to select columns.

Select by Label Explicit Syntax Common Shorthand
Single column df.loc[:,"col1"] df["col1"]
List of columns df.loc[:,["col1", "col7"]] df[["col1", "col7"]]
Slice of columns df.loc[:,"col1":"col4"]
1 Like

how i interpreted this is as follows -
the first 2 commands are trying to retrieve different slices of the data frame and hence we do not need the ‘loc’ term while using the shorthand
but in the third command we are trying to retrieve a smaller data frame from the bigger f500 data frame - hence we need loc

1 Like

Thanks for this. So simple yet exactly what I needed.

There is a more confusing event in the next section of this chapter i.e., 6.
columns:
df[["column1","column2"]]

row:
df.loc[["row1","row2"]]

I simply think this is how it works. you need to add loc in some cases in order to achieve the correct syntax.