Different Countries Different Regions- How do we proceed?

Screen Link:
https://app.dataquest.io/m/347/working-with-missing-and-duplicate-data/5/using-data-from-additional-sources-to-fill-in-missing-values

My Code:

Replace this line with your code

What I expected to happen:

What actually happened:

Replace this line with the output/error

Recall once more that each year contains the same countries. Since the regions are fixed values - the region a country was assigned to in 2015 or 2016 won’t change - we should be able to assign the 2015 or 2016 region to the 2017 row.

In order to do so, we’ll use the following strategy:

  1. Create a dataframe containing all of the countries and corresponding regions from the happiness2015 , happiness2016 , and happiness2017 dataframes.
  2. Use the pd.merge() function to assign the REGION in the dataframe above to the corresponding country in combined .
  3. The result will have two region columns - the original column with missing values will be named REGION_x . The updated column without missing values will be named REGION_y . We’ll drop REGION_x to eliminate confusion.

I understand that we have same countries then regions will also be same f. Hence, it is easy to fill the missing values for 2017 region in our case. However, i am curious to know what if we have different countries then different regions for each year? Assume again that we have a missing values for 2017 region . How do we proceed in this case?

I tried to explore on this case but i am not able to figure out the solution. Would be really great if someone help me to understand on the same?

Thanks in advance for your help.
Best
K!

hi @prasadkalyan05

for the sake of all living beings, I do hope countries don’t start changing regions but then it is 2020 and we are only halfway through :scream:!

this question is pretty straightforward, but the answer is not!
There are multiple things you will have to consider, the dependency of Region on Country and/or vice-versa as in together these two columns lead us to important information, or without one column we can still analyze the data.

If there is a pattern between countries listed under one region for one year, then another region for another year. If so what is the basis for the pattern. Can we interpolate/ impute the values using this pattern?

And… if there can’t be a pattern established… well Oops :open_mouth: ! you may have to figure out a basis for imputation/ interpolation

a basic understanding of the two terms is here

The importance of this question is not so much on the technical solution, but the process of reaching to that solution.