Renaming columns before combining dataframes problem

https://app.dataquest.io/m/347/working-with-missing-and-duplicate-data/3/correcting-data-cleaning-errors-that-result-in-missing-values

I have ran this to get rid of brackets , to make column names uniform before combining dataframes, however when I tried I see that this does not save result back into original datframe and just returns an index .

happiness2015=happiness2015.columns.str.replace("(", "")

However there would still be leading and trailing spaces which causes this as mentioned —

You may have also noticed that some of the column names differ only by punctuation, which caused the dataframes to be combined incorrectly:

Trust (Government Corruption)
Trust…Government.Corruption.

Therefore I still have null values and the columns have not been combined

Then I realised I have to amend column names for each year of each separate csv --2015,2016,2017 before combining to make them all same

happiness2015 = pd.read_csv("World_Happiness_2015.csv")
happiness2016 = pd.read_csv("World_Happiness_2016.csv")
happiness2017 = pd.read_csv("World_Happiness_2017.csv")

So then I tried this solution from stackoverflow to also solve a trailing space problem, but got error

AttributeError: Can only use .str accessor with string values!

happiness2015['Health (Life Expectancy)']=happiness2015['Health (Life Expectancy)'].str.replace(r"\s+\(.*\)","")

https://stackoverflow.com/questions/40836423/removing-parenthesis-from-a-string-in-pandas-with-str-replace/45256708

Even if it did work it seems like a very time consuming task to do this for each column of '16 and '17 to get same spelling and ensure no leading and trailing spaces , is there a more efficient way or if not please advise on how to fix?

Hi!
You did a great job looking for solutions. But actually you almost did it right on the step 1. There’s a small but crucial error in your code.

You save the results to the data frame itself instead of saving it to the .columns attribute:
happiness2015.columns = happiness2015.columns.str.replace("(", "")

1 Like

Hi,
@jamesberentsen

For happiness2017dataset,you could do this.

happiness2017.columns = happiness2017.columns.str.replace(’.’, ’ ‘).str.replace(’\s+’, ’ ').str.strip().str.upper()

For renaming columns in happiness2016 try this

happiness2016.columns=happiness2016.columns.str.replace(’(’,’’).str.replace(’)’,’’).str.upper().str.strip()

For renaming columns in happiness2015 try this

happiness2015.columns=happiness2015.columns.str.replace(’(’,’’).str.replace(’)’,’’).str.upper().str.strip()

You don’t have to rename each column like happiness2015[‘Health (Life Expectancy)’] .Instead you can do it together ie, rename all columns in a dataset at once like above .

hope this helps.

1 Like

Hi ksenia.kustanovich
Thanks for your feedback and correction.

Regards,
JB

1 Like