This is my second guided project, and it is also my data analysis project with a programming language (python). While it was quite challenging for me, I tried to following the directives carefully and closely. I did learn a few new tricks, but most significantly, I learnt how to find answers.
While I hope to make progress as I continue to learn (on Dataquest), I look forward to reading your comments and suggestions, I find them very useful.
Finding Heavy Traffic Indicators on I-94.ipynb (420.6 KB)
Learn data science with Python and R projects
Click here to view the jupyter notebook file in a new tab
Hey @agama432.joshua, thanks for sharing the project with the Community! I’m really glad that you learned how to find answers to your questions as it requires formulation of a concise and specific question
A few comments from my side:
- I found it very useful if people include a small summary of their findings at the beginning of the project. Sometimes, it’s the only part your future employer will be reading so it’s worth including. It may also spark interest in reading the project in more detail
- It’s better not to over-comment. You should assume a basic knowledge of Python from the reader. For example “Read the dataset” is really redundant as it’s very clear from the code itself
- I think, you could include some higher-level headings like “Data cleaning”, and “Data analysis” and put your current headings under them
- Improve your plots: remove the grids and spines (top and right), add all axes labels, and change color to more natural-looking shades
- Why did you decide to split the date time into 12 hours chunks? Could you elaborate on this? I believe some people start going to work at 6 am and some can get back home much later than 7 pm (you can actually see it from your night-time histogram)
- A possible explanation of why you’ve got a drop in traffic around 2016 is that some data there is missing as found by @anna.strahl here
- To be fair you cannot conclude much from the scatterplot “Temp vs. traffic volume” because there are too many data points. It would be a better idea to take a random sample from the temperature variable (but make sure it’s representative)
- Order the frequencies of each weather type in descending or ascending order, it’ll make the plots much clearer
I hope that helps. Happy coding
Hello @artur.sannikov96 ,
Thank you much for sharing your thoughts, and very insightful observations . I have learned a great deal from them, and I hope to learn more as I study further.
Thank you .