Rename columns of different datasets with a def and for loops


I am working on a project regarding several wechat data sets from different countries.

For the curious ones, the project is dedicated to analysing the interests of the Chinese diaspora in different countries and identify key components of subjects that brand can use in their content strategy when trying to develop their companies in the Chinese market. The datasets will be available on Kaggle once I am finished with my kernel (I want to see first if what I think is feasible) so you could give it a try. I will make a post about it :slight_smile:.

Lucky me, the website where the datasets were hosted already decomposed them into countries so I did not need to decompose them myself into subcat. I already cleaned the data a little bit by removing the columns that would not be necessary for my analysis.

The only prob is, despite the contents being the same, the column names tend to vary according to the country where the data were collected:

France: Index([‘微信名’, ‘微信号’, ‘标题’, ‘摘要’, ‘发布时间’, ‘阅读数’, ‘点赞数’], dtype=‘object’)
UK: Index([‘wx_nickname’, ‘wx_name’, ‘news_title’, ‘news_digest’, ‘news_posttime’,
‘news_read_count’, ‘news_like_count’],
Russia: Index([‘name’, ‘wx_name’, ‘title’, ‘content’, ‘posttime’, ‘readnum_newest’,
Australia: Index([‘微信名’, ‘微信号’, ‘标题’, ‘摘要’, ‘发布时间’, ‘阅读数’, ‘点赞数’], dtype=‘object’)

I’d like to define a function with a for loop to change the names in plain English for all of them but i am not sure about the procedure as it is different dataset that I am studying at the same time. Could someone guide me?

You can store all dataframes in list and then iterate each through for loop.

df_list = [df1, df2]
for df in df_list:
    df.columns = [ columns_name_list ]

hi @loicchamplong

You want to translate/ transform the content of dataframes as well or just the column names of each dataframe.

If it’s the latter case you want to work through, you can follow @DishinGoyani’s suggestion for just the dataframe’s column names only instead of applying loop to complete dataframe.

