My DataQuest Journey (with plots and everything)

Hello everyone,

Like the title suggests, this is an analysis project on my DataQuest journey. I’m really excited to have finished this project just in time! What’s a better way to send off 2020 than a thorough look-back at my focus of the year? :star_struck:

It has been a year of grief for the world, but here in the DataQuest community, I see people from all over the world trying their best to learn and make progress every day. My incentives to do this project are not only revisiting my journey of learning but more encouraging beginners of this journey by giving them a more thorough picture of what’s ahead.

This project is inspired by the people in this community, especially @otavios.s’s amazing project I hope this is not a problem, but I scraped the Community. I was introduced to Selenium and ChromeDriver thanks to his project. Yes, I also scraped the DQ website to get the full Data Scientist curriculum and I also hope it’s okay… It was a lot of fun with automation powered by ChromeDriver. I also tried out parsing email content for the first time to collect data.

Enough talk, here’s a peek at my project:

My DataQuest Learning Curve


The main motivation for this project is to help beginners and potential learners who want to get a better idea on how much time and work the Data Scientist path in Python on DataQuest is involved. Although keep in mind that the time and effort to finish an online course is highly relevant to personal situations.


Here are the questions that get answered in this project:

  1. How many days did it take for me to finish this path? (timespan, including intervals I didn’t spend on studying)
  • 175 days. From June 19th, 2020 to December 11th, 2020.

  1. What’s my best learning steak and average learning streak?
  • My best learning streak was 20 days, and 6.6875 days on average. From my personal experience, it’s important to get into the groove and keep going. I a week-long break in October and it took another week to get back to the same learning efficiency as before.

  1. How much time was spent in total?
  • Total hours spent in finishing the path was 306.4 hours. This means if I studied 24/7, the path could be finished in roughly 13 days. Instead, it took me 175 days. I’m sure the robots are laughing at us humans.

  1. How many hours did I spend on average in weeks I studied?
  • Assuming I studied 5 days out of a week on average, in the 24 weeks I did study, I would have studied for 120 days. This means I spent 3 hours a day studying on Data Quest on average. That sounds about right, but note that it’s a rough estimation. Plus I did spend quite some time in the community too that’s not counted in this project.

  1. What’s the average time spent to finish a mission?
  • 111.43 minutes, in other words close to 2 hours. It looks like it takes a dauntingly long time to finish a mission. But this also includes time spent on guided projects, which are most definitely more time consuming than just learning missions. It’s not uncommon to spend days on a guided project.

  1. What are the speed bumps in the curriculum?
  • Steps 2, 4, 5, 6 took more weeks than others to finish. Among them, Step 2 and 6 have the most number of missions, Step 2 also have the most number of guided projects. That makes Step 4 and 5 the most time-consuming steps of all. Between the two, Step 4 is more time consuming than Step 5. Which reflects my memory pretty well. In Step 4, the time-consuming part was SQL, and in step 5, it was the courses on probability.

This project is my personal learning progress analysis from finishing the Data Scientist path in Python on DataQuest. The path consists of 165 missions in total, including 22 guided projects. ^{[1]}

A little context about my personal learning situations:

  • I started the Data Scientist path in Python on June 19th, 2020, and finished it on December 11th, 2020. Although I didn’t spend a lot of time in the last two weeks, it’s mostly spent on finishing two last guided projects(counts as 2 missions) and extracurricular projects. That’s probably why I didn’t get any learning progress emails after the last of November.
  • I used to be a digital marketing account manager and had close to none coding experiences. I learned Python fundamentals from a data scientist course on Udemy for a couple of weeks right before I decided to switch to DataQuest.
  • I finished Andrew Ng’s Machine Learning course on Coursera a few weeks before starting the path. I learned basic Octave during the course.
  • I’m currently unemployed so I have a lot of spare time for learning.

The progress data in this project comes from the weekly accomplishment email I get from DataQuest on Mondays if I made progress the previous week. It consists of:

  • date: receiving date of the email
  • missions_completed: number of missions completed
  • missions_increase_pct: percentage increase/decrease compare to last week on number of missions completed
  • minutes_spent: minutes spent on learning
  • minutes_increase_pct: percentage increase/decrease compare to last week on minutes spent
  • learning_streak(days): number of consecutive days spent on learning
  • best_streak: best learning streak


The curriculum data in this project comes from the DataQuest dashboard for the Data Scientist path. It consists of 8 Steps, 32 courses, and 165 missions in hierarchical order.


[1] Although the dashboard shows 149 missions and 31 projects, after scraping the dashboard page, there are actually 165 missions, including 22 guided projects.

Github and nbviewer messed up some of the formatting and have trouble showing the Plotly plots, so here are the Visualizations in this project:

  • My learning curve
dq_learning_curve
  • Hours spent weekly and the corresponding number of missions completed and the steps they belong to
dq_hour_mission_line
  • Number of missions and guided projects in each learning Step
dq_mission_num_scatter
  • Full curriculum table of the Data Scientist in Python path on DataQuest
curriculum_table

Apart from answering all the questions at the beginning of this project. I also want to add, to the beginners of this course: what I’ve done in this project is more data collecting, data cleaning, and imputation, which you will learn in the first 4 Steps. That means you will be equipped to do all of this halfway through the course!

@nityesh Again, I hope the scraping won’t be a problem. But I will leave that part out if it is. Speaking of which, the number of missions and projects shown in the dashboard is different from the scraping results. I wonder why? :thinking: Also, if it’s not inappropriate to ask, I’m wondering how is the learning progress tracked and how is the change rate of missions completed and the minutes spent in the weekly accomplishment email calculated?

P.s. if anyone has more questions regarding this project or the DQ data scientist path, feel free to ask me in the comment or reach me at veratsien@gmail.com. I will try my best to provide an answer. :relaxed:

Click here to view the project. (Note that GitHub and nbviewer messed up some formatting and the plots are not showing. Please let me know if you know a workaround. :pray:)

Oh, and happy new year, everyone! Good riddance!


Update:
I managed to show the plotly plots in GitHub with a simple fig.show('svg'), in case anyone finds it useful. You can also define the desired width and height of the output svg. So the link above to the project should show the plots just fine now.

13 Likes

Hello @veratsien, thanks for sharing your work. It’s a great way to measure your skills and also to encourage others in their learning.
Happy New year!

2 Likes

Awesome project, @veratsien! Congratulations!

I really liked the idea, the execution, and the explanation! Great visualizations, by the way. You should submit it to Dataquest Direct, I think it would be a great fit.

Also, I’m glad my project was helpful to you, that was my intention when I published it.

Congratulations again and happy new year, everyone!

1 Like

@otavios.s thank you for the compliment and the suggestion to submit it to Dataquest Direct!

Your project was the biggest inspiration for this project. I really admire your thoughts and effort in helping fellow learners in this community. So I did the same. :relaxed: And btw, thanks for introducing me to Selenium and ChromeDriver. You were right, they are powerful tools for web scraping!

1 Like

Hey, I’m really glad to read this, you just made my day!

And yes, selenium is very very powerful and scraping is so much fun! If you haven’t yet, You should also try BeautifulSoup, it’s very powerful too. I have some articles on it published on Medium if you’re interested.

1 Like

@otavios.s Wow, you’ve done great publishing articles on Medium! I’m thinking about starting to do the same.

I have tried BeautifulSoup with request before to download images. They are very handy for static pages. Although in this project, there’s sign in and clicking to proceed in steps and courses, also image CAPTCHA, so I had to use chromedriver. I found this article Bypassing CAPTCHAs with Headless Chrome just now as I was searching for an alternative. There’s an unofficial port of Puppeteer for Python called Pyppeteer I might try out later to bypass the image captcha.

Btw, I’m wondering about submitting to Dataquest Direct, do I just format this post to standard and change the category or do I draft a new post in Dataquest Direct? Thanks ahead!

1 Like

You definitely should do it!

Yes, selenium is very useful with dynamic pages, but I meant BeautifulSoup for static pages, as you said. It’s way faster and less memory consuming than opening the chormedriver.

You just need to create a topic in the Dataquest Direc section. Just like you did it here.

1 Like

Awesome, thanks for the tips!

1 Like

Hey @veratsien,

Let me know if you need any more help publishing this in Dataquest Direct. I would love to get this in there (and promote it with our Twitter audience too! :wink: )

1 Like

@nityesh That would be awesome! Thank you! I’m editing the post in Dataquest Direct right now. Just want to tailor it better for the ‘school magazine’. :grinning:

1 Like

Hi @veratsien- great work and I really appreciate all the detailed analysis, especially for starters like myself! Quick question - you mentioned that you completed the Andrew’s Machine learning course before starting this path on DQ. Question - did you see any benefits or drawbacks of doing that first? I read on a lot of articles to not start with Machine Learnings directly for beginners (rather doing the liner algebra and few other statistics concepts before). Now that you have completed both would you advice doing Machine Learnings fundamentals first before starting with the DQ Data Science path? Thanks in advance!

1 Like

Hi @apgodse,

Welcome to the community! :clap:

Personally, I definitely don’t see any drawbacks from doing the Machine learning course first.

I think the answer here really depends on what you are comfortable with. I know it doesn’t sound very constructive, so here are my thoughts based on personal experience:

  • I feel very confident getting into Data Science after finishing the Machine Learning course. Cause it’s not an easy course if you really try to understand all the knowledge points and finish all the assignments. Other than that, it helped me get through Step 6 and 7 of the Data Scientist path a lot quicker and be able to expand on the knowledge I had.

  • That being said, it’s absolutely not necessary to do the machine learning course before the Data Scientist path here. If you read my project, none of the stuff in there is remotely relevant to the machine learning course. What you will get out of Andrew Ng’s machine learning course is a good foundation and understanding of the subject.

  • If you still can’t decide, and we are specifically talking about Andrew Ng’s machine learning course, it’s free, it has a well-planned curriculum that spread out into 11 weeks. So it’s actually not hard to look into the course material and get a good assessment of the time and energy you will spend taking the course. (I followed the weekly plan faithfully and I would suggest you do the same if you were to take the course.) The time-consuming part for me was the programming assignments. You will either use Octave or Matlab to do the assignments. If you find the course not so challenging, there’s nothing stopping you to do both at the same time.

  • I have recommended this Youtube channel so many times, but in case you don’t already know it, 3blue1brown is probably the best math channel there is. And he has a full playlist on linear algebra if you want to learn that.

I still have the same questions just like yours, which course/material to learn next? How do I prioritize all the things to learn out there. It’s probably going to be a constant topic as we go further down the road in the data science subject, considering how fast this subject updates. So I would say, just start, as you are here right now. Focus on what you are doing and explore what you are interested in.

I hope this helps. If you have further questions, I would be more than happy to answer them. :grinning:

That is amazing @veratsien! Thanks for providing such a thorough look at your progress, which gives me more motivation to finish my path.

1 Like

Very nice projects. It help compare to my speed. I think I should speed up lol

1 Like

@dungvn1999 I’m glad to hear it motivates you, exactly my motivation to do this project. :wink:

1 Like

@Dowreung Welcome to the community! :clap:

I’m glad it helps you. Remember, it’s only my learning speed based on my personal learning situation. If you really do need some motivation, here’s a great post to check out Staying Motivated: Check-in Thread to Share DQ Progress. :wink: