How was dataframe - 'regions' created?

Hi there!

In ‘Working With Missing And Duplicate Data’ (step2 --> course 4/6)

Screen 5/14
I’m not sure how ‘regions’ was created, with region values for 2017.

Kindly clarify. Thank you!

Can you please provide a link?

Hi Bruno!
Sure thing, here it is:
https://app.dataquest.io/m/347/working-with-missing-and-duplicate-data/5/using-data-from-additional-sources-to-fill-in-missing-values

First line under instructions…
" We’ve already created a dataframe named regions containing all of the countries and corresponding regions from the happiness2015 , happiness2016 , and happiness2017 dataframes."

I’d like to know how was the dataframe ‘regions’ created.

Thanks!

Here’s the code:

import pandas as pd

happiness2015 = pd.read_csv("wh_2015.csv")
happiness2016 = pd.read_csv("wh_2016.csv")
happiness2017 = pd.read_csv("wh_2017.csv")

happiness2015.columns = happiness2015.columns.str.replace(r'[\(\)]', '').str.strip().str.upper()
happiness2016.columns = happiness2016.columns.str.replace(r'[\(\)]', '').str.strip().str.upper()
happiness2017.columns = happiness2017.columns.str.replace('.', ' ').str.replace('\s+', ' ').str.strip().str.upper()

combined = pd.concat([happiness2015, happiness2016, happiness2017], ignore_index=True)
regions = combined[['COUNTRY','REGION']].dropna().drop_duplicates()
3 Likes

Got it!
Thanks Bruno :smile:

Thank you so much!! I tried figuring it my own for a while and could not figure it out.