Analyzing NYC High School Data - Enrique

Hello, I’m sharing you my guided project of NYC Schools analysis. You can help me with your review in anything that could help to improve or see new things:

  • I used plotly for all the graphs, thanks to the wonderfull project of @tim1albers, please review his project, is the best.
  • I tried to follow the simplicity of @bhavya, her project made the best job visualizing schools data.
  • I only explore further the correlation of SAT scores and free lunchs, if you want to see a simple and better way to analyze the data check the excelent project of @eas-sea

Link to the project in nbviewer


Hello @jemartinezm1, thanks for sharing your work.
It would be great if you could provide us with links to the above-named persons that helped you in doing this project for quick access.
Your project looks amazing, the most amazing part is the visualizations and the explanations you’ve provided. You guys are giving me encouragement to redo my projects :smile:
I don’t have much to say since am still staring amazed at your project :smile:

Happy Learning!


Hello @info.victoromondi, I put the links to the projects, thank you for your kind words.

1 Like

I´ve been wishing to check you project since Monday, and finally I´ve got some time to do it.
It´s a great work you´ve done: first, a background research on the different aspects of the topic and explaining in your own words all the preparation steps we did during the missions, and then the analysis part itself.
I really liked how you did the school ranking and other transitional categorization and used it throughout the analysis. While working on my project I was thinking over implementing school ranking proposed as an additional step but couldn’t find the best logic way to do it. Finally, seeing that it was taking too long to finish the project I gave up. But now I know if I were to do it, I would do it a similar way.


Hi @jemartinezm1,

You project is absolutely great! Starting from the introductional part with the context to SAT as something potentially prejudiced, then your visualizations are just awesome. This library looks a perfect instrument for creating scatter plots, especially its feauture to visualize the attributes of the points under hover. I liked also your idea to put the NY crime map for comparison with safety and respect scores, probably the only additional suggestion here is to add a legend to that map. And I totally agree with you that this safety and respect score is quite a mysterious concept, especially because these two words don’t have much in common, and it’s not clear why they were combined in the same survey question. I noticed in some projects that people even renamed it into “safey score”, ignoring the second part.

From the analytical part of your project, I especially liked your very interesting insights in the racial issues, which resulted to be not so racial, and also your analysis of the free lunch factor.

Here are some suggestions for your consideration.

  • Probably technical details of the methods applied (pandas.to_numeric(), df.merge(), pandas.Series.apply(), etc.) would look better as comments to the code, rather than markdown text. Such comments can be quite concise, only the most necessary information about the reason of applying this or that method. And the name of the method is seen in the code itself.
  • The code cell [3]. You could use here exactly that zfill() method, mentioned in the markdown cell above. The function with it looks very laconic, no if-statement needed. In addition, as @artur.sannikov96 suggested also to me, you can use a lambda function both in this code cell and in [8].
  • About AP test takers, there is actually a good correlation with SAT scores. The “truncated” part of the scatter plot [34] is just due to filling missing values with mean values happened in the code cell [7]. If you omit this step of filling missing values (as an experiment), you’ll see that the correlation here is indeed very good.
  • The matrix plots [36] and [37] look a little less intuitive that the other plots. Well, at least I don’t understand them very well, so probably it’s only my problem :rofl:
  • It’s a good practice to use throughout the whole project the same type of quote marks (or single, or double) for string and regex expressions and for column names.

All in all, your project looks very structured, professional, and highly valuable analytically. Keep that high level! :+1:


Hi Elena, thank you so much for your detailed suggestions, regarding to the plots [36] and [37] they are only a visualization of the correlations above 0.7, but of course you are right, are not intuitive. I’m going to review the project and take your suggestions.