With this article I aim to list down a couple of additional resources and project ideas to refer to, for each step in the Data Scientist Path. All the resources I’ve included in this post are absolutely free of cost, accessible to everyone and created by sources that I truly believe in!
Step 1- Python Introduction
- Automate The Boring Stuff With Python — This book is a classic and it’s great because it contains practice projects at the end of each chapter to give you a more hands-on experience.
- Python 3.4 Programming Tutorials — If you dig video tutorials, this is the playlist for you. The New Boston is definitely one of my favourite YouTube channels to learn programming from. The founder, Bucky Roberts makes everything so easy and fun for beginners.
- Python Tutorial for Beginners — This is another great video tutorial by the amazing Corey Schafer. He’s a great teacher and his videos also cover some important topics that very few courses cover.
- PyGame — Once you’ve mastered the basics, learning to make a game using Python can be a really cool way to practice those skills you just picked up and creating a fun thing at the same time.
- Twitter Bots — You could also try to make your own Twitter bot to beef up the Projects Section of your resume. It’ll be a great way to get over your fear of writing code. Also, who doesn’t want a Twitter bot of their own?
Step 2- Data Analysis and Visualization
- NumPy — After completing the missions on NumPy on DQ, you could always refer to W3Schools’ tutorials when you can’t remember a particular syntax while working on any data science project. Their short and simple tutorials make everything seem so easy to learn.
- Pandas — Kaggle’s short and comprehensive guide on Pandas will be the perfect course to take for revisiting those concepts that might have seemed overwhelming at first.
- Libraries for Data Visualization in Python:
3.1. Matplotlib — This article on Towards Data Science summarizes the basics covering everything from the installation to the general concepts to creating different kinds of plots.
3.2. Seaborn — Python has a lot of powerful libraries for data visualization and I definitely recommend taking this course on Seaborn to amp up your data viz knowledge.
3.3. Plotly — Plotly is another really powerful library and lets you create beautiful, interactive graphs. I remember how excited I was the first time I created a Sunburst chart using Plotly. The official website explains creating all the graphs with a lot of examples and you needn’t look elsewhere for learning all there is about this amazing library.
- WhatsApp Chat Analyser — Creating a chat analyser will be one of the best things to do for putting your recently acquired data analysis skills to use. I learnt a lot of data science concepts just by working on this project. You also get to practice a bit of regex by working on this project.
Netflix Movies and TV Shows — This dataset consists of tv shows and movies available on Netflix as of 2019. Some of the interesting tasks which can be performed on this dataset are-
a. Understanding what content is available in different countries
b. Identifying similar content by matching text-based features
c. Network analysis of Actors / Directors and find interesting insights
d. Is Netflix has increasingly focusing on TV rather than movies in recent years.
You can go through the most popular notebooks to understand how they’ve worked on this dataset and learn some new things.
- Dark Net Marketplace Data — The Dark Net is a fascinating place.
Description of this dataset:
“This data set was made from an HTML rip made by Reddit user “usheep” who threatened to expose all the vendors on Agora to the police if they did not meet his demands (sending him a small monetary amount, a few hundred dollars in exchange for him not leaking their info).
Most information about what happened to “usheep” and his threats is nonexistent. He posted the HTML rip and was never heard from again. Agora shut down a few months after.
It is unknown if this was related to “usheep” or not, but the raw HTML data remained.”
Facebook hacking guide, ATM hacking tutorial, 50000 facebook likes, fake IDs, licenses, lots of drugs and prostitution-related entries — the kinds of items in this dataset
Step 3- The Command Line
- Learning the Shell — This is the only resource I’m including in this section as this book is really, an all-inclusive guide to the command line.
This book teaches you everything you need to know about the shell and does it with ease. It starts by giving you a solid foundation and builds from there. Its simplicity and informative structure is ideal for all new beginners switching to Linux.
Step 4- Working with Data Sources
- W3Schools’ SQL Tutorial — This website can help you revisit topics with ease after you’ve completed DQ’s courses on SQL. If you’re not the kind of person who likes taking notes, this website can prove very helpful to you.
- Khan Academy’s Intro to SQL — SQL can be a tricky set of concepts to wrap your head around, particularly when it comes to conditionally displaying and grouping the results of multiple joins. You can check out these video tutorials whenever you feel a bit lost with some topic.
- Web Scraping — You can checkout Chapter 11 of Automate The Boring Stuff With Python that talks about Web Scaping.
- SQL Project — You can go through this article and then head over to the assignment to work on some quite challenging problems.
Step 5- Probability and Statistics
- Statistics Fundamentals — This awesome playlist by StatQuest will make you feel a lot less uncomfortable with Statistics. The explanations are really fun and easy to grasp and the happy, fun song in the beginning of each video will surely lighten up your mood.
- A Visual Introduction to Probability and Statistics — This is such a cool website that lets you explore Probability and Statistics in an interactive manner. You absolutely have to check this one out!
Step 6- Machine Learning Introduction
- Machine Learning is Fun — This article is proof that the internet is filled with really helpful people who write about complex topics in simple ways for everyone(including themselves) to be able to understand those topics better.
- Machine Learning for Humans — Another awesome article for getting up-to-speed on high-level machine learning concepts in ~2–3 hours.
- Kaggle — Exploring popular datasets in here can give you lots of ideas for your next ML project.
- Data is Plural — This is an email newsletter where the author sends you a bunch of curious datasets each week.
- Machine Learning Projects — This is a list that contains project ideas for beginners. You can check this out if you’re lost as to what to work on for your first ML project.
Step 7- Machine Learning Intermediate
Neural Networks and Deep Learning — This book teaches you the core concepts and the problems in this book help you search for ideas for creative personal projects of your own.
The book will teach you about:
- Neural networks, a beautiful biologically-inspired programming paradigm which enables a computer to learn from observational data
- Deep learning, a powerful set of techniques for learning in neural networks
The Best Machine Learning Resources — A curated list of all the best machine learning resources available online, including some projects.
Cheat Sheets — These cheat sheets can help you revise everything you’ve learnt, quickly before you start working on projects.
Use Kaggle to start your ML and Data Science Journey — This article written by @nityesh talks about why he believes Kaggle is an amazing platform for beginners. He also links a few Data Science blogs at the end of the article which you should definitely check out.
Step 8- Advanced Topics in Data Science
- Decorators — You can get an in-depth look on decorators from this question that’s been asked on Stack Overflow. If you don’t know what Stack Overflow is, it’s the place where you’ll likely find a solution to any kind of technical problem you might be facing — be it installation of packages or very specific kind of doubts that are particular to your project.
- Developer Hacks for the Jupyter Data Scientist — This is a really helpful video which teaches you how to use Jupyter Notebooks effectively.
- Version Control —This link includes a list of resources for learning version control. This list is a part of a guide that includes lots of other lists of resources for people who are willing to contribute to open source. I totally recommend reading about Open Source, if you’re interested.
- Git-it — An amazing desktop app that teaches you how to use Git and GitHub on the command line.
- Spark and Map-Reduce — Actually, I don’t have a clue about what Spark and Map-Reduce are, so it would be great if you could comment down some helpful resources that you might know for me and others to follow below.
And that’s it! Hope this list helps you in finding atleast one helpful resource. Do comment down the links to your favourite data science resources below. Have a great day!