Changing Column Names

Screen Link:
https://app.dataquest.io/c/60/m/347/working-with-missing-and-duplicate-data/4/visualizing-missing-data

happiness2017.columns = happiness2017.columns.str.replace('.', ' ').str.replace('\s+', ' ').str.strip().str.upper()
happiness2015.columns = happiness2015.columns.str.replace('.',' ').str.replace('(','').str.replace(')','').str.strip().str.upper()
happiness2016.columns = happiness2016.columns.str.replace('.',' ').str.replace('(','').str.replace(')','').str.strip().str.upper()
combined = pd.concat([happiness2015, happiness2016, happiness2017], ignore_index=True)
missing = combined.isnull().sum()

The above code is from this screen:Learn data science with Python and R projects

Here in the screen it states that ‘we corrected some of the missing values by fixing the column names.’
What I am not getting is how changing the column names had affected the data in those columns and changed the null to non-null values.

1 Like

Hi @ch20btech11031,

I agree with you that the phrase on that screen is quite confusing. What really happened on the previous screen, though, was that the dataframes were merged not in a way we expected, because the columns that were supposed to contain similar information had actually different names. Hence, we obtained a lot of unnecessary columns, while in reality many of them should have been merged correspondingly. Returning a step back and changing the column names (i.e., making them uniform) resolved the issue.

1 Like

Thank You Elena,

I have completely understood it now. :grin:

1 Like