LINK TO MY CODE
I-94 for beautifying plots.ipynb (3.6 MB)
I have a project where the analysis is essentially finished. It’s the project to analyze the I-94 data between St Paul and Minneapolis Minnesota in the westbound direction. I happen to live nearby, so got pretty into this project.
I finished it once, went back and cleaned the data a couple more times after discovering new issues, went back and split the data into new segments after discovering new features in the data, performed a couple of additional analyses, discovered a couple of additional patterns, trends, and correlations. Okay it seems done to me.
I noticed that there was a missing data between 2014-2015, so I split the data into 2012-2014 and 2015-2018. In the second segment of data (which I call the second era), months December, January, and July have a lower average traffic volume than the rest. In the first era, only December and January have a lower traffic volume. So what made July change so dramatically between the two eras?
From my conclusion, the effects I found:
ASSESSMENT OF THE EFFECT IN JULY, JANUARY, AND DECEMBER
While there was a road closure July 22-24 2016 and seems to have been a similar closure July 25 2015, excluding these closures does not resolve the decrease in July traffic volume in 2015-2018.
The squall in the data is a tempting explanation, but is not a good one, since it takes place in May 2013 rather than July between 2015-2018.
Smoke was present on July 6, 2015; May 7, 2016; and August 18, 2018. The traffic volume was most remarkably low in May 2016. It would be challenging to claim that this accounted for the whole July effect in 2015-2018.
The low traffic on Independence Day itself cannot explain the drop in July traffic from the first era to the second because there is a higher average traffic volume in the second era on Independece Day.
July was cloudier in the second era, and clouds correlate negatively with traffic volume.
I believe that so far the effect of lower traffic volume from 2015-2018 in July correlates most strongly to Friday traffic volumes in July. It may also be due to students, especially University of Minnesota students, not driving in the summer.
However, I’m having some trouble writing it up neatly. There’s a lot here. I’m not sure how to emphasize the plots I want, or the relevant parts of the analysis. I’m not sure what parts should take the focus, or how to even make something take focus in a jupyter notebook. I’m not sure how to make my many lines of code less intrusive in the presentation.
Does anyone have any suggestions for how to make this a bit neater and more relevant? Also which plots I should focus on? I think probably the paragraph I excerpted above is the most relevant conclusion, but if some other part strikes you as more interesting, I’d definitely love to know!
Thanks so much for any help you can give me. The science and programming are quite achievable, but turning this into something presentable is quite a challenge for me!
Click here to view the jupyter notebook file in a new tab