Screen Link: Popular Data Science Questions - Getting the Data
What is the most straightforward method to take the output from the Stack Exchange Data Explorer into a Jupyter Notebook cell to recreate the tables (not screenshots) as they appear in the “Stack Exchange Data Explorer” and “Getting the Data” sections of the DQ solution notebook? The underlying code makes them look like simple HTML tables.
In other words, go from this:
The closest I could manage: I copied the text output from the Stack Exchange Data Explorer, then pasted that into an online markdown table converter, then pasted that into a Jupyter markdown cell. But I think I’m missing something very obvious and complicating something that should otherwise be a quick and easy task.
There’s a Download CSV button on the top right corner of this screenshot.
Thanks, Bruno. I downloaded the CSV prior to posting my question. But looking at the solution notebook, I’m still puzzled by how those tables appear as they do, because:
- Those early tables (like the one in the image I posted above) reside in markdown cells. If I inspect the raw of the .ipynb file, I see table, tr and th tags in those cells. The code does not appear to be the result of rendering of a print/display of the head of a dataframe.
- Reading the CSV into pandas happens a step or two later in the section entitled “Exploring the Data”. Any tables that appear thereafter are what I would expect: they reside in code cells indicated as executed, for example the table at “Out ”.
So, the mystery remains for me. Is there some simple cut and paste of table data (from the CSV, the StackExchange Data Explorer website, etc.) step into those markdown cells that I’m missing?
Thanks for your help and patience!
Sorry, I misunderstood your question at first. To create the tables in Markdown cells, I loaded the dataset as pandas DataFrames and then used
index=False to get the HTML code easily.