Function to Converting CamelCase to SnakeCase

Hi Everyone.

For the Ebay Car Sales Exploration, I’m trying to figure out an efficient way to clean the Column names. The solution provided as well as a similar query on the community showed a manual approach to doing so.

I was wondering if there’s more efficient ways to do so. Here’s how I’m approaching it (not all of it working code). I also used StackOverflow

My Code:

**Code for conversion to snakecase

import re
def camel_to_snake(name):
  name = re.sub('(.)([A-Z][a-z]+)', r'\1_\2', name)
  return re.sub('([a-z0-9])([A-Z])', r'\1_\2', name).lower()

snake_columns = []

for i in range(len(new_columns)):
    snake_columns.append(camel_to_snake(new_columns[i]))
    print(snake_columns[i])

That section worked perfect for me. For the other section of the exercise to rename some of the columns like:
yearOfRegistration to registration_year

Is there a way you could do something like below. I.e. Replace specific values in a list with different values?

new_columns[["A", "B"]] = new_columns.str.replace(["Y", "Z"])

@reeftim

I do not know if this fits your purpose, but I did something like this:

def edit_columns(col):
    column = col.lower()
    
    if column == 'monthofregistration' or column == 'yearofregistration':
        column = column.replace('of', ' ')
        column = column.split()
        column = column[1] + '_' + column[0]
    elif column == 'notrepaireddamage':
        column = column.replace('not', 'un')
        column = column.replace('dd', 'd_d')
    elif column == 'datecreated':
        column = column.replace('date', 'ad_')
    else:
        mm = []
        for a in col:
            if a == a.upper():
                a = a.lower()
                mm.append('_'+a)
            else:
                mm.append(a)
            column = ''.join(mm)       
    return column   

print(edit_columns('dateCreated')); 
print(edit_columns('notRepairedDamage'))
print(edit_columns('monthOfRegistration'))
print(edit_columns('yearOfRegistration'))
print(edit_columns('offerType'))

Output:
ad_created
unrepaired_damage
registration_month
registration_year
offer_type
1 Like

Hi,

I tried something similar like that. Appreciate the help. Just wondering if there’s a more efficient way to do it.

Here’s one response I saw from StackOverflow: https://stackoverflow.com/questions/35450755/replace-values-of-a-pandas-series-from-a-list

Thoughts?

I would say “efficiency” in this context is not that important. Because you are not trying to repeatedly change the column names. You only, mostly, do it once.

So, picking one method that you have a clear idea of, should be fine.

But, of course, nothing wrong with trying different methods out. In this case, using replace and a dictionary should be good enough as well.

The dictionary would save the key, value pairs where the key would be the current name, and the values would be the new column names.

And then you pass the dictionary into replace. I think the stackoverflow post also suggests something similar. I would recommend checking out the documentation for replace as well.

1 Like

Because you are not trying to repeatedly change the column names. You only, mostly, do it once.

Thanks for that explanation. Makes sense from a data analyst point of view. My sysadmin/devops brain wants to reduce any manual work/repetition of tasks.