Guided Project: Exploring Hacker News Posts .csv file

I am just starting the Guided Project: Exploring Hacker News Posts. In the Introduction it says, " You can find the data set here, but note that it has been reduced from almost 300,000 rows to approximately 20,000 rows by removing all submissions that did not receive any comments, and then randomly sampling from the remaining submissions." I went to the site and downloaded the file so I could use it on a local version of Anaconda/Jupyter Notebook.

However, the only file I found was the one with 300,000 rows, and I kept getting error messages when I tried to open the file with the csv reader. So, I went into the file and deleted all rows that had 0 comments. That left 80,000 rows, not the 20,000 referred to in the intro. I still get an error message when trying to open the file with csv reader. How do I get the same file referred to in the Introduction?

Here is my code and the error I get when trying to open the file.

Hi @skwcos, welcome to the community!

The dataset in the link is, as you found, the original dataset. The one we’re using for the guided project that’s already been cleaned up a bit is downloadable within the Dataquest platform. See this post to see how to download the necessary files.

Thanks for your quick reply april.g. I clicked on the post you referred to, and the first image does not show. Even at that I tried following the instructions, but it isn’t working. Not sure what I’m doing wrong.

You only need the directions for the “Downloading Jupyter project data”. The first set of instructions is for the other projects (and, regrettably, I can’t see that first image either).

When you start the project, in the panel on the right side the Jupyter notebook should open up. On the top there’s extra buttons that you won’t have on your local copy, and that is where you will find the download button.

When you click it, it should download a .tar file that contains all the needed files.

Let us know if it’s still causing you trouble.

Thank you so much! That worked.

