Without using loop to generate dataframe

Backing up previously asked question in our previous slack forum:

Mission reference url: https://www.dataquest.io/m/292/exploring-data-with-pandas/10/challenge-calculating-return-on-assets-by-country

Hello All! I’m working on the “Exploring data with pandas/Calculating return on Assets by Country”. I was thinking about how could I resolve it without ‘for loop’ and I found the ‘groupby’ method of DataFrame. I’ve made the following code, but I’m not sure it has any disadvantage.

f500['roa'] = f500['profits'] / f500['assets']
grouped_by_sector = f500.loc[f500.groupby('sector')['roa'].idxmax(), :]
select_company = grouped_by_sector['company']
select_company.index = grouped_by_sector['sector']
top_roa_by_sector = select_company.to_dict()

The following examples below uses .to_dict together with .set_index, .groupby, and .idxmax.

Here is the one ugly liner code without using any loop:

top_roa_by_sector = f500.loc[f500.groupby("sector")["roa"].idxmax(), ["sector", "company"]].set_index("sector", drop=True).to_dict()["company"]

or a two liner:

df =  f500.loc[f500.groupby("sector")["roa"].idxmax(), ["sector", "company"]].set_index("sector", drop=True)
top_roa_by_sector = df.to_dict()["company"]

Using groupby and idxmax to retrieve each sector’s roa max row

.set_index to set index label as sector column, in order for the .to_dict to return a dict = {column -> {index: value}}.

The inner dict ({index: value}) that column (in the code above, our column is company) references is what we want.

The dict keys for the inner dict are values for column sector. And, the dict value for the inner dict are values for column company.

Note: In this example, .to_dict returns a dictionary representation of a DataFrame object. That is, to_dict returns {column label : {index: column's value}}

Another alternative:

top_roa_by_sector = f500.loc[f500.groupby('sector')['roa'].idxmax(), :].set_index("sector")["company"].to_dict()

Or a two liner:
series = f500.loc[f500.groupby('sector')['roa'].idxmax(), :].set_index("sector")["company"]
top_roa_by_sector = series.to_dict()

Note: In this example, .to_dict returns a dictionary representation of a Series object. That is, to.dict returns {index: column}

WHat about just showing as a dataframe. I am stuck with that. Here is my code:

f500[‘roa’] = f500[‘profits’] / f500[‘assets’]

I figured it out.

f500[‘roa’] = f500[‘profits’] / f500[‘assets’]

idxmax() was the key to this. and then using loc for the row and then I specified the columns I wanted…
Thanks for the idea to use idxmax() up top