Jupyter notebook, any standard or best practice for including data dictionaries?

While working on a next ‘guided project’ (Data Analyst in Python), I was wondering whether there is any standard or best practice for showing a data dictionary for data that you include in your Jupyter notebook?

Typically, after importing some data (e.g. into a pandas dataframe), you’ll do some initial exploration e.g. by showing the first 5 or 10 lines. To understand what I see though, and also to be able to later refer back to it, around this time I would also want to see a description of the meaning of all columns.

I can think of multiple ways how to show such data dictionary:

  • Add a markdown cell with this info
  • Create the data dictionary itself in a file, read it in as another dataframe, then display that dataframe
  • Refer to the data dictionary (if there is one) on the website where you got the data from in the first place

I was wondering whether there is any standard or best practice for doing this?

1 Like

Hi Jasper,

The second way seems a bit excessive, as for me. The issue with the third way is that sometimes the original dataset can be removed from its place (as it happened, for example, with the Exploring ebay Car Sales Data guided project). I would say, the best practice then is the first way, but you have to add the link on the original dataset anyway, or to mention that it’s not available anymore.

When mentioning column names in markdown, consider including each of them in backticks for better readability.

1 Like

Hi Elena,
Thank you for the reply and advice. Makes sense, will proceed accordingly!

1 Like