For the Ebay Car Sales Exploration, I’m trying to figure out an efficient way to clean the Column names. The solution provided as well as a similar query on the community showed a manual approach to doing so.
I was wondering if there’s more efficient ways to do so. Here’s how I’m approaching it (not all of it working code). I also used StackOverflow
My Code:
**Code for conversion to snakecase
import re
def camel_to_snake(name):
name = re.sub('(.)([A-Z][a-z]+)', r'\1_\2', name)
return re.sub('([a-z0-9])([A-Z])', r'\1_\2', name).lower()
snake_columns = []
for i in range(len(new_columns)):
snake_columns.append(camel_to_snake(new_columns[i]))
print(snake_columns[i])
That section worked perfect for me. For the other section of the exercise to rename some of the columns like: yearOfRegistration to registration_year
Is there a way you could do something like below. I.e. Replace specific values in a list with different values?
I would say “efficiency” in this context is not that important. Because you are not trying to repeatedly change the column names. You only, mostly, do it once.
So, picking one method that you have a clear idea of, should be fine.
But, of course, nothing wrong with trying different methods out. In this case, using replace and a dictionary should be good enough as well.
The dictionary would save the key, value pairs where the key would be the current name, and the values would be the new column names.
And then you pass the dictionary into replace. I think the stackoverflow post also suggests something similar. I would recommend checking out the documentation for replace as well.
Because you are not trying to repeatedly change the column names. You only, mostly, do it once.
Thanks for that explanation. Makes sense from a data analyst point of view. My sysadmin/devops brain wants to reduce any manual work/repetition of tasks.