Challenge: Calculating Return on Asset by Country. Need to check first no company with "0" assets

I think that before calculating the Return on Asset we have to check that there are no companies with “0” (Zero) assets and with Null values for Profits and Assets and add the “roa” column only to those companies that fullfill that double condition.

#Check there are no company with 0 assets
companies_roa = f500[~(f500["assets"]==0)]

#Check there are not Null values for profits and assets
companies_roa = f500[(f500["profits"].notnull())&(f500["assets"].notnull())]

print (companies_roa.shape[0])
#There is one company with null value in profits or assets

f500["roa"] = companies_roa["profits"]/companies_roa["assets"]

top_roa_by_sector={}

sector= f500["sector"].unique()

for c in sector:
    selected_sector=f500[f500["sector"]==c]
    top_roa_company=selected_sector.sort_values("roa", ascending=False).iloc[0]["company"]
    top_roa_by_sector[c]=top_roa_company
1 Like

Hello,

That’s a good point.

Though, from what I can tell in this case, having null in either columns doesn’t matter much because the results will still the same. The roa will be a NaN when either assets or profit or both are NaN. For the purpose of finding the top roa by sector, those NaN roa company will be ignored and that’s expected because we don’t have enough data to quantify their roa.

And sometimes you don’t want to remove but keep the companies with null roa e.g. for reference. Division by 0 can be problematic though luckily we don’t need to worry about this because all companies have non-zero assets.

Using those checks is useful but they are normally done on a case-by-case basis. Most of the time, we’ve done some exploratory work to understand the peculiarities of each data set and we’ll do the necessary data cleaning afterwards. In this data set, those checks are unnecessary but they might be necessary for different operations/analyses or for other datasets.

What I’ve tried to see if the results for top roa companies will be affected:

f500["roa"] = f500["profits"] / f500["assets"]

# to compare with data frame with null profits/assets or 0 assets
companies_roa = f500.copy()

top_roa_by_sector = {}
for sector in f500["sector"].unique():
    is_sector = f500["sector"] == sector
    sector_companies = f500.loc[is_sector]
    top_company = sector_companies.sort_values("roa",ascending=False).iloc[0]
    company_name = top_company["company"]
    top_roa_by_sector[sector] = company_name

#Check there are no company with 0 assets
companies_roa = companies_roa[~(companies_roa["assets"]==0)]

#Check there are not Null values for profits and assets
companies_roa = companies_roa[(companies_roa["profits"].notnull())&(companies_roa["assets"].notnull())]

# print (companies_roa.shape[0])
#There is one company with null value in profits or assets

companies_roa["roa"] = companies_roa["profits"] / companies_roa["assets"]

no_null_top_roa_by_sector = {}
for sector in companies_roa["sector"].unique():
    is_sector = companies_roa["sector"] == sector
    sector_companies = companies_roa.loc[is_sector]
    top_company = sector_companies.sort_values("roa",ascending=False).iloc[0]
    company_name = top_company["company"]
    no_null_top_roa_by_sector[sector] = company_name

Anyhow, you can also share your feedback with the Content & Product teams of Dataquest. Just click the ? button in the upper-right corner of any screen of the Dataquest learning platform, select Share Feedback, fill in the form, and send it.

Cheers.

1 Like