Hey, all. Today I accepted a role as a Data Scientist at a well-funded start-up. I will be the first Data Scientist and this is a mid-level/Senior DS role. Before this role I worked as a Data Engineer for 7 months. You can look at this article for more info about my background to help you compare and prepare!
The role is undoubtedly a mix of data engineering and data science (modeling/analysis). For this role, aside from Data Science/DE I was expected to know AWS S3, Redshift, AWS Glue/Glue Studio, VPC, IAM, and more… read on. (If you feel scared that you don’t know these, don’t be scared. In one weekend you can pick up enough skills with these to be job ready. It’s not rocket science.)
Backstory - I was laid off from my data engineer job a week before Christmas here in the US. For the last 5 weeks I’ve been on ~40-50 interviews with about 20-30 companies. In the last 3 days I’ve received 8 other interview invites to more companies that I turned down… Of all of the interviews I actually participated in I received 4 offers and narrowed it down to 1 based mostly on culture and the people. The interviews I’ve been on recently have ranged from live coding, take-home tests, codility.com tests, SQL word doc tests, etc. Below are two I completed which you can use as a reference of what to expect. Take a look at how I answered questions I did not know. FYI these are both from companies I received either an offer from (Sapphire) or a invite to last round and the position was closed (ouch!) (Sumitovant).
gino_sapphire_data_engineer_test.pdf (640.5 KB)
sumitovant.py (2.7 KB)
To give you an idea, my success rate with the coding challenges was about 90%. What I did to prepare:
SQL: Dataquest’s practice problems for SQL - do them. All of them! Even if you think you’re good, you can always do better. Time is important and getting good at writing subqueries efficiently, is huge as a time clock starts to roll down. v
Python: Edabit - Literally solve everything you can between Very Easy and Medium. Again, even if you think you’re amazing at coding, review. Look at how other people solve issues and attempt to improve next time and make your solutions more Pythonic.
Data Structures and Algorithms: The man is a mastermind. Makes it so simple.
**Object Oriented Programming Python: ** Again, dataquest’s practice problems. Just make sure you run through all of them 2-3 times until you’re solid on using classes, building classes, etc.
HackerRank: I use hacker rank for more challenging code problems. Believe me, these helped so much. They helped more in line of getting efficient and having to think logically about the steps I took. Always, always, always look at the solutions others make to learn from them and improve yourself.
My go-to for learning anything which dataquest doesn’t have, are and probably will be: youtube.com, medium.com, udemy.com, acloudguru.com
If you have any questions or want any advice please feel free to ask! I really want to help!! : This role is the culmination of ~2 years of extremely hard work and very long hours studying and learning. In this time-frame I am super grateful to dataquest.io for the help they offered me, even allowing me a short ‘scholarship’ while I was unemployed after my recent lay-off. The least I can do is give back to the community that made me!
In order to help I am attaching a link to my github, with the skill assessment info, files, and results which I was asked to do and turned in, respectively, for this role’s interviews. You will notice the readme has the source for the data files. I highly recommend attempting to re-create something similar to this. The project had data cleaning and EDA, statistics (sampling, skewness, Gaussian distributions, confidence intervals, quantiles (quartiles/deciles), and much more! Honestly, this is probably the ‘capstone’ to my whole ‘self-taught’ process. IF you can do something similar to this, I think you will be ready.
Please note. Not included are the questions I was asked on the interview panel. I was asked about tech stacks (AWS, GCP, Cloud Notebooks (Google Collabs, Databricks, Sagemaker, etc.) I was also asked about modeling and any experience building ML models. The interviewers (panel of 4) really liked that I had other projects to speak about and SHOW them my modeling techniques [hyperparameter tuning, feature optimization, feature engineering, model validation, etc.]
I hope this helps. I gain nothing from letting you know I obtained this role other than to help out those who are lost!
I’m building a discord community for self-taught folks to go and interact real-time. The goal is to help us get unstuck during late-night coding sessions, network, etc.
I am still setting up the site and the rules and could use help if anyone is interested. https://www.curricode.space/
Feel free to reach out on LinkedIn! I would love to chat!!