Guided Project - Analyzing NYC High School Data

Hey everyone! It’s been a while since my last project, the last months were difficult, but I’m back!!! :partying_face: :partying_face: :partying_face:

This one was hard to finish, specially because I wasn’t sure about how to conclude it and also kind of lost about how to go further with the analysis.

I would love any feedbacks about the storytelling and specifically about the conclusion, also tell me: which next steps would you take after what has been done?

Thanks in advance! :heart:

You can check it out here: NYC SAT Data Analysis :relaxed:

NYC SAT Data Analysis.ipynb (1.3 MB)

Click here to view the jupyter notebook file in a new tab

1 Like

Hi Nathalia,

Welcome back and thanks for sharing another great project with us! :star_struck::heavy_heart_exclamation: You’ve done a really amazing job: awsome visualizations, perfect project structure and efficient emphasizing (lists, fonts, quotation, separated code cell outputs, links), cool storytelling including introduction and conclusion, clean and perfectly commented code. And my absolute favorites are centered images and applying different colors to the output text! :star2:

Some suggestions:

  • It’s better to remove numbering from the subheadings, using only the markdown hierarchy of subheadings.
  • You can import the re library in the 1st code cell, together with the others.
  • The code cell [22]: here you can use a for-loop.
  • The code cell [30]: no need to rotate x-tick labels here.
  • On the scatter plots, you can consider using the alpha parameter, to distinguish concentration of points.
  • I would drop some code coments like # configurating the size of the plot, # configurating the plot, since they are repeating throughout the project and also these pieces of code are self-explanatory.
  • About the AP test, I noticed one interesting thing. It seems that this crazy vertical line at the SAP ~1250 derives from the way of filling missing values: inserting the mean, as we did in the code cell [23]. Till now, this approach worked well, but in the case of AP test, there were just too many missing values, and they were filled with the mean value. If to ignore those not-so-missing-anymore values and re-crete the graph, the relations AP-SAT is perfectly and clearly positive, without any “noise”.
  • As for a potential way forward, you can take a look at SAT vs. free or reduced lunch. By the way, the insights that I received also showed that not all people have the chance to succeed in SAT. You can find useful my project on the same topic. It’s not the most updated version of it (I’m still going to upload the latest version :joy:), but the sections about AP test and free lunch are worth reading.

Hope my ideas were helpful. Great job your project, Nathalia, and I’m not surprised that it was hard to finish: you’ve done a lot of fruitful efforts and conducted a very profound analysis. Keep up this high level and good luck with your future projects!