Pandas to Replace Column Headers vs a List of Headers

Hi all. I have a project I am working on to stretch my skills a bit and would like to see how Pandas would do this.

Say I have this DataFrame with the stock, default, column list as shown below. If I have a list with two columns, or even another DataFrame, where each of these column names has an ‘alias’, how do I make it so these names below are replaced with their respective alias? Thank you as always :slight_smile:

Example:
Default_name, Alias_header
company, companhia
rank, rango
revenue, dinero_entrada
profits, ganancia
ceo, el_chapo

#   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   company                   500 non-null    object 
 1   rank                      500 non-null    int64  
 2   revenues                  500 non-null    int64  
 3   revenue_change            498 non-null    float64
 4   profits                   499 non-null    float64
 5   assets                    500 non-null    int64  
 6   profit_change             436 non-null    float64
 7   ceo                       500 non-null    object 
 8   industry                  500 non-null    object 
 9   sector                    500 non-null    object 
 10  previous_rank             500 non-null    int64  
 11  country                   500 non-null    object 
 12  hq_location               500 non-null    object 
 13  website                   500 non-null    object 
 14  years_on_global_500_list  500 non-null    int64  
 15  employees                 500 non-null    int64  
 16  total_stockholder_equity  500 non-null    int64  
dtypes: float64(3), int64(7), object(7)
memory usage: 66.5+ KB

Hey.

You worded your question in a way that makes me unsure if I’m interpreting it correctly. I’ll be answering the question “How does one rename columns in a pandas dataframe?”

>>> tips = __import__("seaborn").load_dataset("tips").head()

Don’t worry about this line, it’s just loading a dataset. Let’s take a look at it:

>>> tips
   total_bill   tip     sex smoker  day    time  size
0       16.99  1.01  Female     No  Sun  Dinner     2
1       10.34  1.66    Male     No  Sun  Dinner     3
2       21.01  3.50    Male     No  Sun  Dinner     3
3       23.68  3.31    Male     No  Sun  Dinner     2
4       24.59  3.61  Female     No  Sun  Dinner     4

Now to rename some (or all) of the columns we can use the pandas.DataFrame.rename method:

>>> tips = tips.rename(columns={"sex": "gender", "smoker": "healthy"})
>>> tips
   total_bill   tip  gender healthy  day    time  size
0       16.99  1.01  Female      No  Sun  Dinner     2
1       10.34  1.66    Male      No  Sun  Dinner     3
2       21.01  3.50    Male      No  Sun  Dinner     3
3       23.68  3.31    Male      No  Sun  Dinner     2
4       24.59  3.61  Female      No  Sun  Dinner     4

I hope this helps.

1 Like

Thanks Bruno. This helps some yes. I see in the code you used the format “ tips = tips.rename(columns={“sex”: “gender”, “smoker”: “healthy”})”

What if I have 70 columns and the values are mapped in a list or data frame?

The goal task to be completed is: compare two reports versus each other and find any errors where data is not the same given certain conditions.

I need to make sure the columns are named the same.
The first report is pulled from Say, YouTube’s social campaigns platform.
The columns in those reports are capitalized as say for example: Clicks, Impressions, Campaign ID.

The second report Is pulled with SQL queries from Redshift which has about 12 tables and the same columns have names as: clicks, impressions, campaign_id.

I have a .csv file with the column name equivalents all mapped out where Campaign ID = campaign_id
For example.

I need to replace all of the capitalized with their lower case counterpart and I cannot type in one by one as I have to do this for 6-8 reports.

What do you suggest? Let me know and I can rephrase if needed. Lots of new terminology… thanks

I think you’ve given lots of clear information, but I’m still fuzzy on the differences between the column names.

If we’re talking about switching between uppercase and lowercase, that’s one thing. If the differences can be diverse, then more information is needed.

In your example you mentioned company vs companhia. This is a translation from English to Portuguese, which is very different from changing the case. As I said, it depends on the diversity of the differences in the columns names.

1 Like

I have 3 documents in my project. In the documents I have one which just has the column headers, the remaining documents are the two documents I am working to compare. I’m essentially trying to recreate a vlookup to make sure the columns in both documents have the same name. If you see the code below I did it using a list comprehension. I would like to see how pandas handles this.

image

# update the Google Doc's headers with the headers from the column_headers document
co_no_headers = column_headers[1:]
for items in co_no_headers:
    req = items[0]
    reds = items[1]
    source = google_doc[0]
    for i, rows in enumerate(source):
        if rows == req:
            source[i] = reds

In my code here I removed the header columns from my doc and then just used enumerate to replace wherever that value existed with the redshift column. Does this help?

Yes. And now I understand the following question.

Then you just get that data into a dictionary appropriately and mimick the technique I displayed here.

If this still isn’t clear, please provide example files and code.

1 Like

Wow… that is so simple.

Is there a name for this technique I can read on google or some documentation?

Oblrigado por seguir com a minha pregunta :slight_smile::rofl:

1 Like

Let’s call it Carlos. We’ll be pioneers.

It really is nothing special. If you want to explore, a good start is searching for renaming columns pandas or variations of this.

:grin:

LOL. That’s hilarious. Thanks man.

1 Like