31 Years of Python | 48 Hour Sale Extension!!!
days
hours
minutes
seconds

NYC_HighSchool_Data_Analysis

Hi all,
I am uploading the project on ‘Analysis of NYC High School Data’. It was an interesting as well as tricky project. I did enjoy working on it. I have tried my level best. I want to know how I can further improve. So please get back with your comments/suggestions .

Analyzing NYC High School Data Project — Next Steps | Dataquest
NYC_high_school.ipynb (734.5 KB)

Thanks,
Bhagyashree

Click here to view the jupyter notebook file in a new tab

4 Likes

Hey @Bhagyashree

Thank you for sharing a well-structured and detailed project with DQ Community! :+1:

Your plots are quite interesting and labelled well. Although I didn’t get why they switch from the default plt style to five38 style.

I do wish to highlight a couple of things:

  • Based on what I have studied and understood, for correlation, the general idea is:

    • corr <= 0.3 is weak
    • 0.3 < corr < 0.7 is moderate &
    • corr >= 0.7 is strong (applied in negative direction as well)
      So the correlation between Average Class Size and SAT Score may be feebly moderate but can’t be strong. (Cell 25 & 26). This is a like thumb rule which may not be applicable in some scenarios, so not sure if the same can be said for this.
  • Exponential has a different meaning in statistics. Analysing any two variables showing this kind of relationship is kind of a complex topic. I may be out of my league here, but in case you have materials to explain this code cell 26, please do share. With a 0.35 score of Corr, I would say there is ‘steady growth in SAT Scores vs Avg Class size’.

In the conclusion section, which is again very detailed and well constructed :+1: you have listed out the observations, try to include what might be suspected implications/ consequences of these. For example:

The race wise SAT Score. The schools with higher no. of economically disadvantaged students may not be getting enough funding as well, thereby less infrastructure as compared to higher-ranking schools, disadvantaged neighbourhoods etc. may also be leading to low SAT Scores.

One last thing, have you thought of/tried combining the variables? For example, Race + gender vs SAT Scores or Safety + Gender vs SAT Scores. Just curious.

1 Like

Hello. I can tell that you put a lot of work into the text sections as well the code. It’s always a good idea to wait a day after completing a project and then do a proof read. For notebooks with as much text as yours, it might even be worthwhile to have a version in a .doc format for proofing.

Also, check out my incomplete description of how to load an extension into your local Anaconda set up. It helps with questionable spellings in the markdown cells.

Your analysis does a good job taking the question further and you show an interesting look at property values. Thanks for being part of the community!

2 Likes

Hi @Rucha,
Thanks for the constructive and insightful comments.
I will try addressing these comments one by one.

About the plot style:- I am just trying my hands on plotting (matplotlib). I try different styles just to see how it looks. I should have checked if all plot-styles are identical before sharing. I should keep it in mind for my future projects.

About correlation values:- Thanks for the information on correlation values. I should have done my homework before concluding anything here. I will make appropriate changes in the project. I should find a good article and understand more about correlations for future work.

About Exponential :- May be I should have been more careful before making that statement. Better way to understand is graph fitting ( to exponential and linear equation) and check the fit values. My knowledge about matplotlib is rudimentary. For now, I can correct that statement by just stating that there is an increase in the SAT Score with increase in the average class size without mentioning about the type of the graph.

About Conclusion:- I thought of making more insightful conclusions as the project is all about it ( we mention it in the introduction as well). But it was hard for me to connect them and draw a deeper conclusion ( I need to work in that direction). So I made a superficial conclusion. Let me go through it again and try including it.

About combining variables:- No, it didn’t strike me. I would definitely try it if I get time.

I hope I’ve addressed all the comments and it makes sense.

Thanks for taking time to correct this project.
Bhagyashree

1 Like

Wow! @Bhagyashree

I definitely did not deserve this detailed response. Looks like I need to do a looooooottttttt of homework to do. Your response is a revelation here! :+1:

See as a peer-to-peer reviewer we usually pick on certain details, that may be improvised a bit or we are just curious to know what other approaches can be thought out. The idea is to get newer freshers’ ideas from the submitter and also learn something from the experience of the reviewer (this is regardless of where we are in our respective learning journey!)

So no worries, you have addressed all my queries but it’s not necessary to learn everything in one go and in one project. Please work at your own pace and be selective of the topics you find interesting and wish to complete first before moving on to the next one… this is so stupid :woman_facepalming: I am suggesting to you something you already know!

Till we interact next time, Happy Learning! :ok_hand:

1 Like