This is a new initiative we are starting where people will be able to do projects on real data outside of Dataquest. The goal is to get exposed to tools that are being used in the industry (Jupyter Notebook, Gitgub…) to learn by comparing our solutions to the ones from other people and to drive communication & networking.
Project:
Here we have air pollution data for the city of Skopje, Macedonia.
The measurements go back ~10 years and include 6 measuring stations. The measurements are taken every hour but there is a lot of missing data and some of the stations are active just for a few years.
In terms of rating, higher means worse air quality.
Anything over 50 is considered not good. Anything over 200 is considered hazardous.
Let’s try to answer the questions below for the last 5 winters.
A winter is the period from November including February, we are looking at the winters 2013/14 until 2017/18.
- Which have been the top 3 worst months overall?
- Which measuring station has the highest ratings on average?
- Make a pie chart with the average rating for each station
- Which is the worst month per measuring station on average? Is it the same for them all?
- Make a horizontal bar chart showing how many days in total the measurements have been over 50 for each.
- Same chart for over 200.
This is a beginner-friendly project focusing on data wrangling and basic visualization.
Tools/Methods:
We use Jupyter Notebooks to work on the projects as this is an industry-standard. So all of your code must be entered and run there.
There are YouTube videos on how to get going if this is new to you.
Once you complete it you should share it via Github.
If you don’t have an account there you first have to open one and then:
- Download it to your computer
- Add it to your Github by creating a new repository, then upload your file and finally share the URL to it here.
Here is an example submission. (obviously you should not look at the solutions for your own sake)
Feel free to ask questions/comments.
More projects are going to follow if there is interest.