[Project] Air Pollution in Skopje

This is a new initiative we are starting where people will be able to do projects on real data outside of Dataquest. The goal is to get exposed to tools that are being used in the industry (Jupyter Notebook, Gitgub…) to learn by comparing our solutions to the ones from other people and to drive communication & networking.

Project:

Here we have air pollution data for the city of Skopje, Macedonia.
The measurements go back ~10 years and include 6 measuring stations. The measurements are taken every hour but there is a lot of missing data and some of the stations are active just for a few years.

In terms of rating, higher means worse air quality.
Anything over 50 is considered not good. Anything over 200 is considered hazardous.

Let’s try to answer the questions below for the last 5 winters.
A winter is the period from November including February, we are looking at the winters 2013/14 until 2017/18.

  • Which have been the top 3 worst months overall?
  • Which measuring station has the highest ratings on average?
  • Make a pie chart with the average rating for each station
  • Which is the worst month per measuring station on average? Is it the same for them all?
  • Make a horizontal bar chart showing how many days in total the measurements have been over 50 for each.
  • Same chart for over 200.

This is a beginner-friendly project focusing on data wrangling and basic visualization.

Tools/Methods:
We use Jupyter Notebooks to work on the projects as this is an industry-standard. So all of your code must be entered and run there.
There are YouTube videos on how to get going if this is new to you.

Once you complete it you should share it via Github.
If you don’t have an account there you first have to open one and then:

Here is an example submission. (obviously you should not look at the solutions for your own sake)

Feel free to ask questions/comments.

More projects are going to follow if there is interest.

7 Likes

My notebook -

2 Likes

@universalastrostuden I went over your submission and you did a really good job. Some nice techniques/solutions in there.

What I did not like is that you did not answer the questions in order. It would have been nice if you would first state the question and then the answer below.

Re the question ‘Make a pie chart with the average rating for each station’.
You made one with percentages, not with the actual ratings.

Also I do not think you answered those questions:

  • Which have been the top 3 worst months overall? (expecting and answer like Jan’17, Dec’15…)
  • Which is the worst month per measuring station on average? Is it the same for them all?
1 Like