For the Ebay Car Sales Exploration, I’m trying to figure out an efficient way to clean the Column names. The solution provided as well as a similar query on the community showed a manual approach to doing so.
I was wondering if there’s more efficient ways to do so. Here’s how I’m approaching it (not all of it working code). I also used StackOverflow
**Code for conversion to snakecase
name = re.sub('(.)([A-Z][a-z]+)', r'\1_\2', name)
return re.sub('([a-z0-9])([A-Z])', r'\1_\2', name).lower()
snake_columns = 
for i in range(len(new_columns)):
That section worked perfect for me. For the other section of the exercise to rename some of the columns like:
Is there a way you could do something like below. I.e. Replace specific values in a list with different values?
new_columns[["A", "B"]] = new_columns.str.replace(["Y", "Z"])
I do not know if this fits your purpose, but I did something like this:
column = col.lower()
if column == 'monthofregistration' or column == 'yearofregistration':
column = column.replace('of', ' ')
column = column.split()
column = column + '_' + column
elif column == 'notrepaireddamage':
column = column.replace('not', 'un')
column = column.replace('dd', 'd_d')
elif column == 'datecreated':
column = column.replace('date', 'ad_')
mm = 
for a in col:
if a == a.upper():
a = a.lower()
column = ''.join(mm)
I tried something similar like that. Appreciate the help. Just wondering if there’s a more efficient way to do it.
Here’s one response I saw from StackOverflow: https://stackoverflow.com/questions/35450755/replace-values-of-a-pandas-series-from-a-list
I would say “efficiency” in this context is not that important. Because you are not trying to repeatedly change the column names. You only, mostly, do it once.
So, picking one method that you have a clear idea of, should be fine.
But, of course, nothing wrong with trying different methods out. In this case, using
replace and a dictionary should be good enough as well.
The dictionary would save the key, value pairs where the key would be the current name, and the values would be the new column names.
And then you pass the dictionary into
replace. I think the stackoverflow post also suggests something similar. I would recommend checking out the documentation for
replace as well.
Because you are not trying to repeatedly change the column names. You only, mostly, do it once.
Thanks for that explanation. Makes sense from a data analyst point of view. My sysadmin/devops brain wants to reduce any manual work/repetition of tasks.