[Data Cleaning Problems] How to merge two columns into one columns?

How to merge the two columns ‘REGION_y’ and ‘REGION_x’? Since some value appear in ‘REGION_y’ column, while some values appear in ‘REGION_x’. I want to have a new column having values from two of them.

COUNTRY REGION_x REGION_y
120 Nepal Southern Asia Southern Asia
121 Ethiopia Sub-Saharan Africa Sub-Saharan Africa
122 Sierra Leone Sub-Saharan Africa Sub-Saharan Africa
123 Mauritania Sub-Saharan Africa Sub-Saharan Africa
124 Kenya Sub-Saharan Africa Sub-Saharan Africa
125 Djibouti Sub-Saharan Africa NaN
126 Armenia Central and Eastern Europe Central and Eastern Europe
127 Botswana Sub-Saharan Africa Sub-Saharan Africa
128 Myanmar Southeastern Asia Southeastern Asia
129 Georgia Central and Eastern Europe Central and Eastern Europe
130 Malawi Sub-Saharan Africa Sub-Saharan Africa
131 Sri Lanka Southern Asia Southern Asia
132 Cameroon Sub-Saharan Africa Sub-Saharan Africa
133 Bulgaria Central and Eastern Europe Central and Eastern Europe
134 Egypt Middle East and Northern Africa Middle East and Northern Africa
135 Yemen Middle East and Northern Africa Middle East and Northern Africa
136 Angola Sub-Saharan Africa Sub-Saharan Africa
137 Mali Sub-Saharan Africa Sub-Saharan Africa
138 Congo (Brazzaville) Sub-Saharan Africa Sub-Saharan Africa
139 Comoros Sub-Saharan Africa Sub-Saharan Africa
140 Uganda Sub-Saharan Africa Sub-Saharan Africa
141 Senegal Sub-Saharan Africa Sub-Saharan Africa
142 Gabon Sub-Saharan Africa Sub-Saharan Africa
143 Niger Sub-Saharan Africa Sub-Saharan Africa
144 Cambodia Southeastern Asia Southeastern Asia
145 Tanzania Sub-Saharan Africa Sub-Saharan Africa
146 Madagascar Sub-Saharan Africa Sub-Saharan Africa
147 Central African Republic Sub-Saharan Africa NaN
148 Chad Sub-Saharan Africa Sub-Saharan Africa
149 Guinea Sub-Saharan Africa Sub-Saharan Africa
150 Ivory Coast Sub-Saharan Africa Sub-Saharan Africa
151 Burkina Faso Sub-Saharan Africa Sub-Saharan Africa
152 Afghanistan Southern Asia Southern Asia
153 Rwanda Sub-Saharan Africa Sub-Saharan Africa
154 Benin Sub-Saharan Africa Sub-Saharan Africa
155 Syria Middle East and Northern Africa Middle East and Northern Africa
156 Burundi Sub-Saharan Africa Sub-Saharan Africa
157 Togo Sub-Saharan Africa Sub-Saharan Africa
158 Belize NaN Latin America and Caribbean
159 Namibia NaN Sub-Saharan Africa
160 Puerto Rico NaN Latin America and Caribbean
161 Somalia NaN Sub-Saharan Africa
162 Somaliland Region NaN Sub-Saharan Africa
163 South Sudan NaN Sub-Saharan Africa

I think it can be done the following way, assuming that you do not want to preserve the original REGION_x and REGION_y fields:

df[‘REGION_x’].fillna(df[‘REGION_y’], inplace=True)
del df[‘REGION_y’]
df.rename(columns={‘REGION_x’: ‘REGION’}, inplace=True)

2 Likes

Supposing your dataset is a list of lists (without the header row) and the NaN values are of type string you can merge the REGION_x and REGION_y columns by creating a fifth column like this:

for row in dataset:
  region_x = row[2]
  region_y = row[3]

  if region_x == 'NaN':
    row.append(region_y)
  elif region_y == 'NaN':
    row.append(region_x)
  else:
    row.append(region_x)

Thank you so much! It works