NYC Public Schools and the SAT -- What Factors Lead to High SAT Scores?

Hi all!

This is my third community submission for guided projects. I tried to make this project readable, with a clear thought process and conclusion. Please take a look and let me know if any of my analysis doesn’t track/make sense or if you have any suggestions on how I could improve what I have.


Schools.ipynb (479.2 KB)

Click here to view the jupyter notebook file in a new tab

1 Like

Hey @anna.strahl, thanks for sharing the project with the Community. As in your previous works, you have a great style and concise narrative. Some suggestions from my side:

  • Provide the links to the datasets you are using in case someone wants to reproduce your analysis
  • Import all the packages in the first code cell so that we know which once are required to reproduce the analysis
  • You should add a few code comments or explanations. Like, what’s “GEN ED” in [5]? And why do you select it?
  • Other than test subsections it looks like race, AP data, and socioeconomic indicators (frl (free and reduced lunch), sped (special education), and ell (english language learners)) are strong indicators of SAT results. - I’d clarify that these are strong negative indicators
  • Make sure that your figures have at least a title, and axes labels. In addition, you may also remove the top and right spines, use more natural colors, rename axes tick labels to something more readable, and ensure that everything you have in the plot conveys some useful information. For example, if I look at the plot after [11] I don’t understand what it shows me. Are those correlation values? What’s that vertical line? You explain those things in the text but they should be clearly clarified in the plot itself
  • It is interesting to note that parents consistently rate safety at least a full-point higher than students and teachers. - This is misleading. For example, in Queens, the difference in safety score between parents and teachers is ~8.10 - 7.36 = 0.74, pretty far from a full-point. In Staten Island, the difference is even lower. You can rephrase by saying that mostly parents give higher scores compared to students or teachers
  • As an additional analysis, you could use a map of NYC and generate several maps with means/medians of several quality metrics on them. You can use mpl_toolkits from matplotlib which is able to read a .shape file of a map and plot the data on them

Looking forward to the further analysis of these datasets :slight_smile: Happy coding!