Ebay dataset - practical question

One of the first elements of the project is to rename the headings. I realize it’s just to emphasize what was taught in the module, but would you do this in the real world? The data dictionary was already given using existing header names. I don’t see any benefit to changing them.

Hi @robertpreseau

Sometimes datasets have really long and not useful column names, so yeah, it’s good idea to change their names, for example the album ratings dataset at Kaggle has really long column names


Here Metacritic Critic Score could’ve been MetacriticCS or MCS

The HCC (Hepatocellular Carcinoma) dataset has columns with short names, but difficult to understand and remember
imagen
If you don’t know anything about medicine or Hepatitis Blood Test it could be difficult to understand what those columns means. HBsAg coulda been Hep_B_Antigen

Good Luck!

Hi, @robertpreseau.

Changing the header names was definitely not necessary but is just generally good practice. I’m also just starting out but noticed that these style guides and conventions for programming definitely help in terms of code readability and debugging. Sticking to these conventions also helps in making workflows faster and more efficient (e.g. not having to capitalize letters when you want to access a certain column, knowing whether you’re looking at a class object or a variable, etc.) by a small amount, and all these small efficiencies pile up and make a difference when it comes to larger programs or codes.

For the particular e-bay practice project, it may not indeed be necessary or even introduce efficiency but getting used to following these conventions will help in the long run.

Here are a few references that might come in handy:

Awesome info. So, would you keep a header mapping file somewhere so you knew you turned ThisUnhelpfulColumnName into miles_per_gallon later on in the project or would you rely on the notes you left in a notebook?

So, would you keep a header mapping file somewhere so you knew you turned ThisUnhelpfulColumnName into miles_per_gallon later on in the project or would you rely on the notes you left in a notebook?

Yes, either would work. Although I would personally prefer a mapping variable (dictionary) that is self-contained in the code. I may want to have a separate file for data cleaning code where all the cleaning and transformations I performed on the raw unprocessed data can be reviewed and recreated. For very minor changes such as renaming columns, placing a comment in the notebook should suffice since having to refer to a separate file (for metadata and other information) may end up being inconvenient and unwieldy for the reader.

It really depends on personal preference. At the end of the day, as long as the code is readable, you may choose not to rename the columns.