Hi everyone, I would like a better understanding in Data Cleaning Basics mission “https://app.dataquest.io/m/293/data-cleaning-basics/3/cleaning-column-names-continued”. The exercise states “The column labels still have a variety of upper and lowercase letters, as well as parentheses, which will make them harder to work with and read”. Why is it harder to work with and read? And, why should all column names be in lowercase?
snake_case is a useful naming convention when assigning names to variables, and a huge part of that is because you don’t have to worry about at what point in the middle of your variable’s name you used an upper-case letter.
Its popularity also means that writing your variable names in snake_case makes it more intuitive to other people who read your code.
Similar goes for the parenthesss - a lot of this just comes down to the fact that it’s less messy and therefore more readable to adhere to an established convention, where reasonable. Your future self will find the process of making sense of your old code much easier!
In addition to what has already been said, another shortcut to access a pandas column is by using the “dot” notation. For example, if I have a data frame named “df” that has a column named “created_date”, then
df.created_date is the same as
df["created_date"] . Since the general Python formatting standard calls for lower snake_case variables, naming the columns in lower snake_case makes sure that the Python code is consistent with the formatting standard in case the “dot” notation is used for column access.