Going fast! #DataquestChallenge Premium Annual Offer:
500 get 50% & the next 1000 get 40% off.
GET OFFER CODE

Feedback welcome: Analyzing NYC High School Data

Hello everyone,

Sharing my guided project belonging to:
https://app.dataquest.io/m/217/guided-project%3A-analyzing-nyc-high-school-data/6/next-steps

After doing the parts as suggested in the course, I added a section “SAT scores per boro and per district”.
In that section, I practiced my skills to aggregate data, and to create several charts using matplotlib. Including things like ‘conditional coloring of bars in a bar chart’ (which wasn’t easy…).

AnalyzingNYCHighSchoolData.ipynb (419.5 KB)

Looking forward to any feedback that anyone may have!

Kind regards,
Jasper

Click here to view the jupyter notebook file in a new tab

3 Likes

Hello @jasperquak! Thanks for sharing your work with the Community:)

You have a well-structured project with a good description of the steps you’ve taken. You also commented on the code cells which is very important to help other people understand your code.

Here are some suggestions:

  • It’s better to tell us what the surveys are about and make some examples (like the quality of education, safety, etc). You can also tell us your hypotheses about what you expect
  • Make sure to tell the readers what are the project’s objectives
  • You can limit the number of subheadings in the Data Preparation section but make sure that it’s still readable
  • You did not leave any code comment in the first section. Why?
  • Did you find something interesting while exploring the correlation between the SAT score and other measures? Could you tell us about your findings?
  • It’s better to import all the libraries in the first code cell to improve the project’s readability
  • Your plots do not have titles, and axes labels. I would also increase the size of tick labels and rename them to something clearer. Also, the plot of SAT scores by boro is very crowded with narrow bars and it’s extremely difficult to understand what it communicates. You’d also better limit the number of colors and u see less intensive color palette to make the plot more accessible to the reader’s eyes.
  • One of your images in the " Correlation of SAT scores with survey results" section does not load
  • I’d round values in tables to improve the readability

That’s it for me. Happy coding!

1 Like

Hello @artur.sannikov96 , thank you so much for your feedback! I will write a more comprehensive response to it in due course (when I find the time). But thought to at least express my appreciation for the feedback already!

1 Like

Hi @artur.sannikov96 , once more thank you for your valuable comments! They have triggered me to dig deeper, learn a couple of new things while doing so, ask multiple questions on this forum, and make some improvements!

I uploaded an updated version in my original post (after learning how to do that in the first place, see this post).

Let me comment inline to each of your bullet points individually.

  • It’s better to tell us what the surveys are about and make some examples (like the quality of education, safety, etc). You can also tell us your hypotheses about what you expect
  • Make sure to tell the readers what are the project’s objectives

Point taken. I must admit that for this guided project I focused a bit more on the Python code (which I was struggling with) than on the storytelling part. I may improve it later still.

  • You can limit the number of subheadings in the Data Preparation section but make sure that it’s still readable

Updated layout a bit. Also see my comment for the next bullet point though.

  • You did not leave any code comment in the first section. Why?

The answer is in the Notebook, in a code cell that reads like this:

Note: the code that follows next in this section is courtesy Dataquest. That is, during ‘guided excercises’ I actually did write the code to perform all the steps done below, but for convenience, for this notebook I have here reused (uncommented) code that was provided by Dataquest rather than copying all of my own code snippets. Some minor changes were made.

Further down there will be another marker to indicate till where this applies. After that marker, all code was written by me.

I understand this may not be a good practice, it is however how Dataquest set up this guided project and it felt too time-consuming for me to change all the code to my own code.

  • Did you find something interesting while exploring the correlation between the SAT score and other measures? Could you tell us about your findings?*

I think I did put observations below each graph and conclusions at the end, but sure I agree that my storytelling can be improved for this project, but for this guided project I focused a bit more on the Python code. (I may update it on a later occasion.)

  • It’s better to import all the libraries in the first code cell to improve the project’s readability

Good to know!

  • Your plots do not have titles, and axes labels. I would also increase the size of tick labels and rename them to something clearer. Also, the plot of SAT scores by boro is very crowded with narrow bars and it’s extremely difficult to understand what it communicates. You’d also better limit the number of colors and u see less intensive color palette to make the plot more accessible to the reader’s eyes.

Yes, creating good graphs isn’t easy I found, I spent quite some time on this for this project, and will definitely want to improve this further in the future. I have added titles now at least, some have axes labels. I agree that I should better use other colors. Note that the colors do have a function in the last chart. It is not a random coloring of bars, but I have 32 bars for 32 districts belonging to 5 boro’s, so to indicate which boro each distric belongs to, I used coloring and added a legend.

Creating this chart was actually a challenge I created for myself to better learn (1) aggregating data of dataframes (2) creating more complex charts using matplotlib. It definitely did its job in a sense that I learnt from it (with help, e.g. see this post), but also fully agree it can be further improved both in terms of layout as in expressing what we actually see.

  • One of your images in the " Correlation of SAT scores with survey results" section does not load

Good catch! Solved now, with help of this post.

  • I’d round values in tables to improve the readability

Improved (I hope)! With help of this post

1 Like

Hi @jasperquak - just as an FYI, that last link in your post is pointing to a private message I sent you…you may want to update that link.

It’s so great to see you taking @artur.sannikov96 suggestions to heart. Isn’t he AMAZING?! Congrats on completing another solid project.

1 Like

Hello @jasperquak! It’s a pleasure for me that you took my suggestions seriously:)

I agree that this can be time-consuming but you’ll that in most projects you will have to do everything by yourself so it’s a good practice to get used to it.

I’m referring to code cell [648]. You’ve just put the table of correlations. I know that down the road you talk about the correlation but putting a table just for the sake of it is not very informative.

As for the charts: it’s important to have all axes labeled to avoid ambiguity. It’s also a good practice to increase their size. The chart from the cell [674] is not very informative, we just have a bunch of narrow bars squeezed together. What does it tell us? It should be easy to read.

The chart [680] uses very intensive colors, it hard to read.

The chart [684] has a lot of noise and it’s hard to make sense of it. Try to use fewer colors (and make them less intense) and do not color all the bars but only the ones you think are the most important

Happy coding :grinning_face_with_smiling_eyes:

Oops… I updated the link! Thanks!

1 Like

Hello @artur.sannikov96 ,

Thanks again for your comments! I do agree with your comments. (I have to balance between continuing to improve this project and continuing with next courses and projects though; so if I do not follow-up straightaway, don’t misinterpret that!)

I have one question, and that is regarding the last of your comments:

The chart [684] has a lot of noise and it’s hard to make sense of it. Try to use fewer colors (and make them less intense) and do not color all the bars but only the ones you think are the most important

So there are 32 districts, and a bar for each of them in the chart showing the average SAT score for that district. What I do want to indicate for each of these 32 districts, is to which of the 5 boros this district belongs. So we can, at one glance, see something about the spread-between-districts for each of the 5 boros. Therefore I gave each of the 5 boros a color, and applied that to all districts of that boro. It seems you are not so enthusiastic about that though… How would you do that instead?

Hi! In my opinion, you can have one good alternative, then you have to play with them and maybe come up with something more beautiful.

You can order each district in increasing order, desaturate colors (or actually choose one of many color-blind friendly palettes). Also, don’t forget to label the x axis.

1 Like

Thank you for the suggestions!

1 Like