How can I download the CSV files used for the guided projects if I want to do them on my own computer. Sometimes they are not on github or they have been slightly modified so I cant get them myself.
Thanks for the help.

Hey @pierorubini

any specific project you are facing difficulty to download the file?

Because for almost all of the projects, the sources are mentioned in the intro page with the links (kaggle/ github/ others) or in the next page.

It is for: Clean and Analyze Employee Exit Surveys.

hi @pierorubini

I am not sure if you have tried the below, but that’s how I downloaded the datasets to work on the project.

  1. click on the two links provided in the intro page of project

  2. scroll down at the end of the source page (this if for TAFE - the first link):

  3. repeat step 2. for the DETE dataset using the second link in step 1.

Let us know if there is an error or issue you face while downloading dataset while following the above steps.

Thanks a lot!
Also, it says that they have been modified to be easier to work with them, how do I know what has been changed?

hey @pierorubini

As far as I understand, if the course content mentions directly the changes or the process of clean-up applied only then we can be certain as to what all changed.

Or else, if we can get our hands on both the datasets then, maybe we can compare them to see what is more and what is less!

This will basically fall under being able to replicate the author’s work, which may be out of scope.
As a student, you can proceed with instructions and the updated dataset. Then to reinforce your learnings, you may then perhaps use the original dataset.

hey @Rucha

I wanted to add to this thread with another question. I’ve been going through the guided projects and noticed that I typically can only find/download the original files directly uploaded to Kaggle and NOT the ones used in the online instance of Jupyter Workbook. Since I like to do these projects directly on my desktop rather than online within the UI, is there anyway to download the sampled data so that my solutions match what is shown in the solutions guide?

For example, for the Guided Project: Exploring Hacker News Post, the link to download on the first page directs me to the Kaggle link with all 300,000 rows of data, but for the solutions it uses a randomly sampled portion of this, so I am unable to replicate the results on my end. (data:

This seems to be the same seems to be the same with the Guided Project: Exploring Ebay Car Sales Data as well, where it says that the dataset used was dirtied up and sampled to only 50,000 rows, but when I download the data with the link provided, it has 370,000 rows, not the sampled data used in the online jupyter instance. (data:

This isn’t a huge deal, but I’d like to have that assurance that my code is calculation the correct results.


hey @ryanpozzi

Have you tried workaround provided in this post?