DataFrame.loc[] syntax when accessing pandas object

Screen Link:
https://app.dataquest.io/m/381/exploring-data-with-pandas%3A-fundamentals/9/using-boolean-indexing-with-pandas-objects

I got a little confused with syntaxes we use to manipulate data in pandas. I used two different notations in this exercise and seem to have gotten similar output returned.

motor_countries_df = f500.loc[motor_bool,["country"]]
motor_countries_series = f500.loc[motor_bool,"country"]

The first line of code returned a dataframe. Can someone explain why dataframe type was returned? And when one syntax is preferred over the other?

Hi @igor.amelichev,

The first line of code returned a Dataframe because of the use of double square brackets. where as the second line of code returned a Series, which is essentially a one-dimensional(1D) array. Both methods are important in indexing and selecting data an it all depends on the output you want to return for your subsequent analysis.

Hope this helps!

  1. A DataFrame is a collection of Series.

Here [“country”] is a list of series hence the output will be a DataFrame because passing a list or array of lables would return a DataFrame.

Note using [[]] returns a DataFrame. Read More here in the documentation. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html

Here it is only series "country" hence the output will be a series.

1 Like

Thank you!
It totally makes sense if I think of it that way.

Hi @info.victoromondi

Thanks for your explanation above. I still have a question about the output from:

motor_countries_series = f500.loc[motor_bool,"country"]

Although printing motor_countries_series returns two columns (company and country), it is still a series because the company column is an index. Therefore, only the country column is the result. Otherwise, two series would become a DataFrame.

Is this correct?

thanks!