Personal Project: Scraping Dataquest

Hi everyone! I’m the kind of person who likes writing to-do lists and tracking my study/work progress. Although most websites have some way of tracking progress, it bothers me that I can’t see my progress all at once. So, I wrote some code to get the list of courses of the Data Scientist track here on Dataquest and save it to a CSV file. It also works for the Data Engineering track, just change the path_url to "https://www.dataquest.io/path/data-engineering-v2/"
Unfortunately, I couldn’t scrape the dashboard and include the progress as well. But I guess it’s better than nothing.

Scraping_Dataquest.ipynb (30.2 KB)

Click here to view the jupyter notebook file in a new tab

4 Likes

@evelin.kanda: Great start! Maybe the DQ staff can share more about their website. Would love to see this deployed haha.

@Bruno maybe you could share some details?

1 Like

What do you mean by this?

What would help is if Evelin would share more details. I’ll reply in another post to pursue this.

1 Like

What did you try?‎‎‎

I’m doing both the Data Science and Data Engineering path here, another course on a different platform and working on a project (actually two, including this one).
My progress is tracked separately for each, but I’d like to see them all at once.

Well, scraping it the way we learnt it here, but the html code of the dashboard is quite different from the usual websites because of the interactive elements. I tried googling, but I couldn’t find any solution I that wasn’t too difficult to understand and would also work.

One simple thing I tried was to right click on the page → Inspect → Network → look for json files and get the info from there. But I couldn’t find any json files lol.

So I scraped a different page of Dataquest, but I’d love to know how to scrape the dashboard.

1 Like

I suspect you haven’t logged into your account with the scrapper. That alone would make it very different. I suggest you look into Selenium.

2 Likes

True! haha Thanks for the tip! I’ll check it out!