Is there more to life than Dataquest?

Data Scientist in Python

TL;DR
Which skill should I get good at? There is so much to choose! Should I even get good at anything? Help?!

After 8 weeks and one day of sweat and lots of junk food I’ve made it to 100%!

Before I’ve fantasized about this day, how I would feel. I have to admit it was most definitely a modest party, with some ice cream.


So am I a unicorn already?

Well sadly, a few days later I’ve started to feel a bit sad, and that is mostly because I have no clue about what to do next. I know I can refine my portfolio (which I even did not put on github yet).

IMPORTANT: It does not matter who you are reading this, if you have (big) data dreams that can make me sparkle again, please share them here. This is your place to let your data hopes and dreams out. Let’s inspire each other through great work.

Well, If I’m not a unicorn who am I then? let me share a very brief introduction:

David Miedema

I’ve been a professional chess player for a few years, but the pay is becoming less and less, and with Corona, the tournaments have stopped completely. Besides I have authored some books on chess openings and my education is a MA in Philosophy.

Now, math is amazing, and data became an immediate addiction. (Un)fortunately I love many many things, which makes it difficult to choose between what to do.

It’s not about the right skills it’s about the problems you can solve.
At this point I believe I’ve been reading a ton of articles on what I should know, what the industry needs etc. like probably most of you have read. The above quote is the thing I see the most.

So what are the problems encountered most?
I have scrolled through many pages of upwork and freelancer.com, only to find that there are actually a lot of data scientists working for $5 an hour. And those people are claiming to have years of experience in the field. This made me almost give up my freelance data dreams already.

It’s mostly Business Intelligence and Big data problems.

And what about skills to learn then?
Browsing through job descriptions all the roles seem to be very different. Python, R and SQL are asked for most. But then there are many things to learn after hitting 100%:

  • Scraping, an art in itself. This is really funny, you won’t regret clicking this!
  • Kaggle, although there seem to be only beginner’s competitions and competitions requiring big GPU’s for image processing.
  • TensorFlow or PyTorch, probably on the list as well
  • Cloud computing, most notably AWS and Google cloud services.
  • Tableau, the best visualization tool ever (it seems)
  • Big data, building on the Apache Spark introductory course seems to be a great idea as well. (How do I get the computing power and disk?)
  • Power BI
  • The DQ Data Engineering path

I’m glad you’ve made it to the end of this terribly desperate post :crazy_face:. I really hope you can offer me any advice! And who knows, we might actually start having a very valuable discussion for everyone on this platform :smile:

9 Likes

Hi @DavidMiedema

I love chess too! Not good enough to be competitive though, maybe (1300-1400).
Congratulations on completing the Data Scientist Path, i’m not there even after 2 years.

Looks like you’re thinking about how to handle the massive “expected to know” learning list, and satisfy an urge to use the skills you accumulated here.

For each tool, try to learn it because you want to do something meaningful with it and there is no other better tool. For example, most open source unsupervised machine translation implementations are written in pytorch, and if i join a competition involving that, i have to learn pytorch.
Some other reasons are limitations of existing tutorials, such as Kaggle community mostly providing notebooks in tensorflow, then i have to learn tensorflow.

As you get better with python internals (https://www.youtube.com/watch?v=cKPlPJyQrt4), it will be a breeze to learn new technologies which conform to a framework. Eg. In the deep learning field, there is losses, optimizers, subclassing, forward/backward propagation, so skills from tensorflow will pretty easily transfer to pytorch. Maybe some people like a full book of theory first, but my mind is lacking RAM and I always try to learn enough engineering (how to use the api) to get started tweaking and the error messages (which leads to google searches) teach me the theory along the way.

You indicate an interest in freelancing, how about making some projects from that to benefit yourself and others?

Example Questions

  • What are the possible tags?
  • How many types of each project are there?
  • Can i predict what projects would appear?
  • Who are the best/worst employers/employees, and on what metrics?
  • How can i define a metric to check whether it’s good to bid on a project?
  • Which people are copy-pasting proposal messages everywhere?
  • How can i summarize the FAQ section so i learn more in less time?

Each of these questions could be a simple analytics project to full-blown AI research, and as you can guess already, web-scraping is a must have skill for gathering data for questions that others may not have asked, so that could drive your learning. Such learning would be put in a bigger context than what short tutorials could offer you.

You can subscribe to daily aggregator emails and filter them down to the more informative ones to keep abreast of what’s possible and keep your python skills sharp and expose yourself to what others are doing.

How about

  • Contributing to open-source: https://www.codetriage.com/
  • Publish a package on pypi that is useful to the world?
  • Write a chess engine (including handicapping features) for 4 player chess with GPT-2?
  • Build a end to end ETL dashboard to organize news?
  • Commit to every day asking a question and getting started with answering it even if you know nothing about the tools required. Success in trying. Even a 5 min youtube intro counts.
6 Likes

hi @hanqi,

Thank you so much for your kind response. It puts things in perspective! By the way, great that you like chess, it’s always great to meet another enthusiast :slight_smile:

I assume that despite the fact that you haven’t finished here you did manage tot tackle a lot of problems already.

Thus far, the freelancing option seems extremely unattractive, and I’ve decided to focus on getting a remote job. That way I will get more peers and some financial stability.

This discussion has led me to believe that there are three important projects I would really like to show off in my portfolio:

  • An awesome deep learning project run on google colab (I didn’t know they are actually hosting GPU and TPU power, :smiley:).
  • A scraping of multiple chess websites. Including chess.com news, amazon chess books top 100, chess stack exchange, chessgod101, etc. This will teach me to scrape and draw valuable insights for me and my peers.
  • Program python chess in such a way that I can query database.lichess.org so that I can select whatever I want from it (especially to make analysis of the French, I’m currently writing my third volume on rare lines, and where else to find those rare birds but online!). The challenge here is parsing the files that are HUGE. Ordinary chess software crashes at this point…

That’s it for now, I basically spend 6-8h a day learning new stuff so it should internalize quite quickly.

Great topic @DavidMiedema!

Data science is such a big field that sometimes I feel a little lost in its vastness.

I’m about to finish the data analyst path and looking forward to diving into the data scientist path. Although I’m already familiar with machine learning, I do like Dataquest’s approach, especially how they care about the theory behind it. Recently, I became more interested in the data engineering path, so I might start it eventually.

What I’ve been doing (besides applying for jobs) is to use the skills I’ve learned in projects that interest me. For example, I wrote a web scraper and got stats from every match in the English premier league since 2011 and now I’m trying to use it to predict football stuff (you can see a piece of the data and the scraper here). I haven’t succeeded in the predicting part yet, which reminds me how excited I am to learn about deep learning and neural networks and apply it to that data.

While helping a friend with some data issues for his master’s degree, I created a package the is now on Pypi and has helped others. It helps people reading the very unfriendly data published by the Brazilian Institute of Geography and Statistics. It is not a big deal, but there was no python solution for this yet. In fact, if you google “IBGE python” (IBGE is the institute initials), you’ll see this article (sorry, it’s in Portuguese) I wrote about it.

So, what I’m trying to say is that there is always room for developing new projects and solutions. While developing, I always have to learn more, or maybe I discover something that interests me, and after it is done I sometimes have something that I can publish and increase my portfolio and my network. Only positive outcomes!

So, when I finish Dataquest, that’s probably what I’ll keep doing. I became so passionate about it, that I cannot think about not working with data anymore.

6 Likes

Awesome! @otavios.s
Your courage and commitment to learning are truly admirable :clap: :clap:

3 Likes

Well @otavios.s,

In a big vast space one gets lost quite easily. And there is a lot to do.

After watching hours and hours of videos about the application process I believe it’s best to get a job asap. And I would have given a completely different answer to what that takes yesterday than now.

Frankly I am quite surprised you have not gotten a job yet, I mean with the projects you pick up it seems you’ve long outgrown the dq skills. I believe if programming is more your thing you should definitely check out the engineering path. Especially with the scraping projects you’ve showcased and the time you like to invest in helping others optimize their code, it might be a great path for you!

Personally now I am looking for something that would go viral. The thing that has changed my mind is that the amount you can learn as a self-learner is very limited. There is no replacement for helping customers, you can never be prepared for ML questions from horse breeders etc. You get savvy along the job.

Hence, I’ve decided to actually focus much more on landing an opportunity. That way you’re actually learning a lot more like soft-skills, presentation, collaboration etc. There isn’t really a replacement. With all the versions of myself I can still be I’d rather struggle on every job and get it done than be over-prepared (and not get paid in the meantime).

Apart from that I am thinking about making matplotlib less of a pain by writing a custom functions representing a non-standard and pleasant style. It would be really nice to speed up the final data viz by having a template for a double-donut-plot e.g. It’s one day of work, and a lot of pleasure. I have not found any one who has done it before!

But yes, tell me more about your style of applying and why the heck (oh hamburgers! I feel like Butters now) you don’t have a job yet.

2 Likes

Hey @DavidMiedema,

I’ll take your surprise with the fact that I haven’t gotten a job yet as a compliment :sweat_smile: so thank you for that!

And I agree with you that the best thing to do is to get a job. I do believe that there are things you can only learn by working in a professional environment, dealing with real-life problems. What I said before was under the assumption that one doesn’t have a job just yet.

But as it turns out it is not easy to get a job in my country these days. To give you more context, Brazil is a continental country with over 200 million inhabitants that goes through a political/economical crisis since 2013 and the country just keeps digging deeper and deeper into it. And then 2020 came and everything just got worse as COVID is finishing the job of ruining the economy and the population (the country is averaging 1,000 deaths per day for the last two months and this number won’t go down any time soon). The unemployment rate is high, there’s no confidence in the market and the government does not exist.

So what I’ve been doing is to keep improving myself by learning new things and new tools. I’ve come a long way from not ever having written a line of code in January to some machine learning and scraping projects in July. I think this is good.

I’ve been looking and applying for jobs for the last months, but most of them demand years of experience, which makes it harder. Also, there’s a bit of noise in the data science field here, which makes the entry-level jobs receive thousands of applications.

The problems I’ve described make it harder for everyone to get a job in any field. So I rather face them as a data scientist than as anything else. I believe an opportunity will eventually show up. I mean, it has to, right? :joy: :sweat_smile:

P.S: I saw you’re learning scrapy, are you taking a course on it or something? I’d love to learn that too.

6 Likes

It will come, it is difficult in any country as we speak. Data is still a relatively new field, you have companies asking for a degree in computer science while others ask for a PhD in applied maths (or other field, I even saw marketing last time).

If you have not done it yet, train ys with other platform: Tableau (I am completely hooked on this thing), Power BI, Alteryx, etc. It would also make a difference on your cv as not all companies have exactly finished their digital transformation and sometimes, you have to clean your data through Python but make a dashboard with Power BI so everyone can read the results.

Hang in there :slight_smile:

4 Likes