(1)
Regarding the regions dataframe, I wondered how I would construct it in order to complete this task, since I am working with the csv files from Kaggle?
Recall once more that each year contains the same countries. Since the regions are fixed values - the region a country was assigned to in 2015 or 2016 won’t change - we should be able to assign the 2015 or 2016 region to the 2017 row.
In order to do so, we’ll use the following strategy:
Create a dataframe containing all of the countries and corresponding regions from the happiness2015 , happiness2016 , and happiness2017 dataframes.
Use the pd.merge() function to assign the REGION in the dataframe above to the corresponding country in combined .
The result will have two region columns - the original column with missing values will be named REGION_x . The updated column without missing values will be named REGION_y . We’ll drop REGION_x to eliminate confusion.
We’ve already created a dataframe named regions containing all of the countries and corresponding regions from the happiness2015 , happiness2016 , and happiness2017 dataframes.
Use the pd.merge() function to assign the REGION in the regions dataframe to the corresponding country in combined .
Set the left parameter equal to combined .
Set the right parameter equal to regions .
Set the on parameter equal to 'COUNTRY' .
Set the how parameter equal to 'left' to make sure we don’t drop any rows from combined .
Assign the result back to combined .
Use the DataFrame.drop() method to drop the original region column with missing values, now named REGION_x .
Pass 'REGION_x' into the df.drop() method.
Set the axis parameter equal to 1 .
Assign the result back to combined .
Use the DataFrame.isnull() and DataFrame.sum() methods to check for missing values. Assign the result to a variable named missing
we’ve already created a dataframe named regions containing all of the countries and corresponding regions from the happiness2015 , happiness2016 , and happiness2017 dataframes
regions is a dataframe that only have two columns country and region. So to create it you need to combine the three original dataframes, in the exersice they did this
regions = pd.merge(left=wh_15, right=wh_16, on=['Country', 'Region'], how='left')
regions = pd.merge(left=regions, right=wh_17, on='Country', how='left')
If you look at the files from Kaggle the last 3 years doesn’t have a region column, but it doesn’t matter. You need to do 4 merges. After this, you can continue with the cell that you’re showing.
Hope you understand, i’ll be here if you need more help or if you have more questions