So far in the Data Analysis and Visualization portion in the course we have focused on observing and visualizing correlations and patterns in the data.
I am wondering where in our data science learning journey we can learn about predicting a result based on the data?
Using the NYC Public Schools project as an example:
We have the SAT scores and lots of demographic data for each school. We can observe things like the percentage of English language learners correlates negatively with SAT scores, safer schools correlate positively with SAT scores.
What I was thinking about during this project is to what degree each of these factors has on a student’s eventual SAT score. For example if we randomly place a hypothetical student in any of the schools, what SAT score can we expect them to have? What are the chances of them getting a specific score based on the school they attend?
To say it another way: For all of the variables (race, safety scores, percent English leaners, etc.). what impact to each of them have on the student’s eventual SAT score? We know percent English learners has a significant impact but to what degree in relation to everything else?
What area of Data Science, statistics, and programming toolkits focus on this?
Thank you for your time and let me know if I can explain my question any further.