How I learnt Data Science in 175 days as a complete beginner

Happy New Year everyone! As the title suggests, this is an analysis project on my DataQuest journey. I’m really excited to have finished this project just in time before the new year came! What’s a better way to send off 2020 than a thorough look-back at my focus of the year?

It has been a year of grief, but in the DataQuest community, I see people from all over the world trying their best to learn and make progress every day. So my incentives to do this project are not only revisiting my journey but more encouraging beginners of this journey by giving them a peek into the road ahead. But please keep in mind, that the time and effort to complete this path is highly relevant to personal situations. I will explain mine later in this article.

This project is also inspired by the people in this community, especially @otavios.s’s amazing project I hope this is not a problem, but I scraped the Community. I was introduced to Selenium and ChromeDriver thanks to his project. Yes, I also scraped the DQ website to get the full Data Scientist curriculum and hope it’s okay…

Before I go into the details of this project, I want to first share my findings.

The questions that get answered in this project:

  1. How many days did it take for me to finish this path? (timespan, including intervals I didn’t spend on studying)
  • 175 days. From June 19th, 2020 to December 11th, 2020.

  1. What’s my best learning steak and average learning streak?
  • My best learning streak was 20 days, and 6.6875 days on average. From my personal experience, it’s important to get into the groove and keep going. I a week-long break in October and it took another week to get back to the same learning efficiency as before.

  1. How much time was spent in total?
  • Total hours spent in finishing the path was 306.4 hours. This means if I studied 24/7, the path could be finished in roughly 13 days. Instead, it took me 175 days. I’m sure the robots are laughing at us humans.

  1. How many hours did I spend on average in weeks I studied?
  • Assuming I studied 5 days out of a week on average, in the 24 weeks I did study, I would have studied for 120 days. This means I spent 3 hours a day studying on Data Quest on average. That sounds about right, but note that it’s a rough estimation. Plus I did spend quite some time in the community and reading up excurriculum materials, those are not counted in this project.

  1. What’s the average time spent to finish a mission?
  • 111.43 minutes, in other words close to 2 hours. It looks like it takes a dauntingly long time to finish a mission. But this also includes time spent on guided projects, which are most definitely more time consuming than just learning missions. It’s not uncommon to spend days on a guided project. I wish I had more granular data on time spent on each mission so I can see the average time spent on projects and non-project missions, but I don’t know if that data even exists.

  1. What are the speed bumps in the curriculum?
  • Steps 2, 4, 5, 6 took more weeks than others to finish. Among them, Step 2 and 6 have the most number of missions, Step 2 also have the most number of guided projects. That makes Step 4 and 5 the most time-consuming steps of all. Between the two, Step 4 is more time consuming than Step 5. Which reflects my memory pretty well. In Step 4, the time-consuming part was SQL, and in step 5, it was the courses on probability.

Now, a little context about my personal learning situations:

  • I started the Data Scientist path in Python on June 19th, 2020, and finished it on December 11th, 2020. Although I didn’t spend a lot of time in the last two weeks, it’s mostly spent on finishing two last guided projects(counts as 2 missions) and extracurricular projects. That’s probably why I didn’t get any learning progress emails after the last of November.
  • I used to be a digital marketing account manager and had close to none coding experiences. I learned Python fundamentals from a data science course on Udemy for a couple of weeks right before I decided to switch to DataQuest.
  • I finished Andrew Ng’s Machine Learning course on Coursera a few weeks before starting the path. I learned basic Octave during that course.
  • I’m currently unemployed so I have a lot of spare time for learning.

A closer look at the project

A) Data collection (email parsing & web scraping)

The data I used in this project are collected from two sources:

  1. The progress data in this project comes from the weekly accomplishment email I get from DataQuest on Mondays if I made enough progress the previous week. It consists of:
    • date: Receiving date of the email. Always a Monday.
    • missions_completed: Number of missions completed.
    • missions_increase_pct: Percentage increase/decrease compared to last week on the number of missions completed.
    • minutes_spent: Minutes spent on learning.
    • minutes_increase_pct: Percentage increase/decrease compared to last week on the minutes spent.
    • learning_streak(days): Number of consecutive days spent on learning.
    • best_streak: Best learning streak.

To get the weekly emails, I first created a tag in my Gmail to group the emails I want and then went to Google Takeout to download them. You can choose the file format in the process, what I had downloaded was a .mbox file. Python has a library for parsing this type of file called mailbox. You will find the code used in this project in the GitHub link at the end of the post.

A screenshot of the weekly accomplishment email

  1. The curriculum data in this project comes from the DataQuest dashboard for the Data Scientist path. It consists of 8 Steps, 32 courses, and 165 missions including 22 guided projects in hierarchical order.
    As mentioned at the beginning of the post, I used Selenium and ChromeDriver for the first time. The dashboard page where the curriculum information resides contains a grid of steps and collapsible lists of courses and missions, there was auto-login and a lot of clicking involved. I will probably write another article on scraping this page later.

B) Data Imputation

The weekly email dataset in this project is very small, with only 16 rows containing data from 16 weeks. But my learning span was in fact 26 weeks. There were weeks where I didn’t study at all, but still, for such a small dataset, I can’t really afford to lose 10 weeks of data.

Luckily, on the profile page, DataQuest provides the learning curve throughout a path. So I came up with an imputation strategy: fill in the blanks where possible, plot the existing data then compare with the DataQuest generated learning curve, and integrate with my personal experience( and memories of taking vacations & slacking :slight_smile: ) to impute the missing number missions completed data. Then impute minutes spent based on average minutes spent on a mission. It’s more detailed in the project.

While I think the imputation was pretty successful (in serving the needs in this project), I wish we could have more data on our learning journey from DataQuest.

C) Visualizations in this project:

I used Plotly to plot all the visualizations in this project. I’m pretty happy with the Hours Spent vs Missions Completed plot below. It helped me make quite a few interesting observations and answered the curriculum related questions at the beginning of this post. Again, you can read the details in the GitHub link at the end of the post.

To share the plots in posts like this one, I also tried out Chart Studio. The plots below are from the chart studio cloud and embedded using chart studio generated html.

  • My learning curve
  • Hours spent weekly and the corresponding number of missions completed and the steps they belong to
  • Number of missions and guided projects in each learning Step
  • Full curriculum table of the Data Scientist in Python path on DataQuest

Apart from answering all the questions at the beginning of this project. I also want to add, to the beginners of this course: what I’ve done in this project is more data collecting, data cleaning, and imputation, which you will learn in the first 4 Steps. That means you will be equipped to do all of this halfway through the data scientist path!

P.s. if anyone has more questions regarding this project or the DQ data scientist path, feel free to ask me in the comment or reach me at I will try my best to provide an answer. :relaxed:

Click here to view the full project.


May I ask you, how much time does it take you to do this personal project?


Hi @sergibtrader,

I’d say about a week’s worth of working time.

I think what took this long was mostly the process of designing this project.

  • I didn’t work on it every day and didn’t start off including the web scraping part. After I was done with the email parsing, it just didn’t feel like much data to work with. That’s when I decided to scrap the curriculum.
  • It took some time to come up with a data imputation strategy that I was happy with.
  • Also, it was the first time I worked with the email parsing and scraping with Selenium.

Hi @veratsien,

Thank you so much for sharing this info! Was your ultimate goal for doing the course to start a career in Data Science? If so, have you had success? If yes, would you mind sharing your experience in the current job market?

Thank you so much.

1 Like

Hi @flherron01, welcome to the community! :clap:

Sorry but I’m afraid I don’t have a successful job hunt story to share. I can’t say I had much of an ‘ultimate goal’ when I started this course. It was more the result of a combination of random things: moving to the US, a conversation with a Python programmer friend, and the pandemic.

I haven’t started job hunting yet, and I’m trying to avoid it if I can… (It’s more of a personal choice. I had worked in an office setting for five years and just don’t see it as a long-time life plan.) I’m also doing Flutter programming right now and trying to combine what I’ve learned here. That being said, I’m still building my own projects for my portfolio(they also help build confidence). I just don’t do projects for job hunting ONLY.

So… you can tell I don’t have personal experience with job hunting in data science from the babbling… But I did come across this great article I think might have some answers to your question from DataElixir’s news letter --> How Can We Fix the Data Science Talent Shortage?


Got it, thank you. I appreciate the additional link!

1 Like

Hi @veratsien

Thanks for sharing your journey. It’s amazing to see what you can do now on Python starting with very little experience. I just finished the first guided project and must say, even though I thought I was comfortable with the lessons learned leading up to the first project I found it quite challenging and had to refer to the solutions quite often for help.

Can you share some advice or tips that you learned on what to do when you get stuck on a guided project apart from looking at the solution?

I don’t want to lose hope so quickly but I feel a bit worried considering that I struggled with the basics and there is still so much to learn.

Kind regards

1 Like

Hi @kloppersjj02,

You are welcome!

About the first guided project, I assure you, you are definitely not alone. I think I sorta dodged a bullet there skipping the first step at the beginning of my journey. I went back to that first project as my last project and it was still a lot to handle. Things actually get a lot easier once you learned all the libraries like Pandas and Numpy, etc. Now I’m excited for you lol.:laughing: Here’s my project and you will see people agree that it’s almost too challenging for a first project: Going back to the first project. Imo, just finishing it is an achievement.

  • First, this community is one of the best for learners. Almost every problem you encounter, you will find someone with the same problem already asked a question in the community, and it might even have been discussed extensively. If not, you get to ask a unique question and benefit people after you!
  • If it’s a bug in your code, don’t overlook the power of a simple print() statement, especially as a beginner. It’s much faster to spot a bug in some testing result than just staring at your code. Also, there’s the famous ‘rubber duck’ debugging method. It may seem silly but I assure you it works like a charm!
  • The great thing about Dataquest is that they assume you can take the initiative to learn, thus forcing you to look things up, read on excurricular materials, and so on. Really, Google is your best friend in this case.
  • Something else I want to mention: when you are doing guided projects, expect it to be an extension instead of a validation of what you’ve learned. I think shifting the focus from why can’t I figure this out to how can I figure this out would help in a surprising way.

The community is your best friend for all those feelings you have in this journey, cause by no chance you will be alone here.

Keep calm and code on. :vulcan_salute:

@veratsien Thanks so much for the detailed feedback and encouragement. I really appreciate your help.

1 Like