Could someone please explain how to download a csv off of GitHub to use it offline?

I am doing the Star Wars Survey Walkthrough right now, and I want to work offline on jupiter for it. I try to download it, but when I do the file is an html file and if I make it a csv file it isn’t the same file at all and doesn’t work. Trying to find help online but I don’t really understand the Github terminology I think. Could I get a walkthrough?

2 Likes

Hi @spatterss135,

All you need to do is to click the ‘Raw’ button in the csv file, and then save that. Here are the screenshots to help you:

  1. Go to the csv file page and click the ‘Raw’ button

  2. It will take you to this screen:

Then just hit ctrl+s or command+s or right-click to save the file.

Btw, just making sure you know that you can also download the project from DataQuest, which will automatically include the csv file.

Hope this helps! :wink:

6 Likes

These are the methods i use:

  1. As explained by @veratsien, I copy and paste the raw file contents
  2. Python - * Use the urlretrieve function from the urllib.request to download CSV files from a raw URL directly.
from urllib.request import urlretrieve
urlretrieve(url, filename)

Example:

from urllib.request import urlretrieve
 urlretrieve("https://raw.githubusercontent.com/VictorOmondi1997/Machine-Learning-for-Software-Engineers/master/datasets/data.csv", "data.csv")
  1. R - using download.file
download.file(url, destfile)

Example:

download.file("https://raw.githubusercontent.com/VictorOmondi1997/Machine-Learning-for-Software-Engineers/master/datasets/data.csv", "data.csv")
  1. Download the dataset using the opendatasets Python library
4 Likes

I think R has this attribute too (downloading file directly from online).

1 Like

Got it! I tried this initially but turns out there is an extra step for users working in safari. When you save the document, you need to change the format to ‘Page Source’, since the default option will be ‘Web Archive’. I actually ended up using “curl https://raw.githubusercontent.com/fivethirtyeight/data/master/star-wars-survey/StarWars.csv -o outfile” to get the csv.

Screen Shot 2020-10-07 at 12.26.08 PM

Now that I know there are a lot of options though, I’m curious if there are strengths/weaknesses to these different methods. Is there documentation for this?

4 Likes

Congrats on figuring it out yourself!

I think the ‘strengths/weaknesses’ are relative to specific scenarios for these different methods. For instance, the python request method allows you to assign string values to a url variable thus has the potential to be more dynamic, along with other parameters. The best way to find out is to try and experiment yourself. @info.victoromondi has kindly included links to documentations in his comment.

Happy coding!

2 Likes

Yes. I’ve included in my comment.

2 Likes

Thank you so much @info.victoromondi for being so generous to me, You are so helpful.

2 Likes