correlations = combined.corr()
correlations = correlations['sat_score']
What I expected to happen:
SAT Critical Reading Avg. Score 0.986820
SAT Math Avg. Score 0.972643
SAT Writing Avg. Score 0.987771
AP Test Takers 0.523140
What actually happened:
SAT Critical Reading Avg. Score 0.472399
SAT Math Avg. Score 0.465612
SAT Writing Avg. Score 0.472854
AP Test Takers 0.254925
Refer the solution notebook provided by DQ here.
Refer my notebook here.
I matched the steps given in the solution with my steps, and there seem to be no problems. However, my correlation values don’t match, and are nearly half of the expected values.
Please suggest what can be done.
Thank you for a clear question, this makes it much easier to understand and investigate. Had this question been poorly asked, I don’t think a proper answer would have been achievable.
Take a look around cell run number 8:
data['sat_results']['sat_score'] = data['sat_results'][cols].sum(axis=1)
Here, you’re summing columns in which there are missing values. In fact, for some rows, all the values are missing.
The behavior of this method with default parameters is such that when summing all nulls, it returns 0. From the documentation:
If you include
min_count=3, you should get the same results.
Thank you so much for clearing this out.
I have added the skipna (=False) parameter and that has solved the problem.
Thank you for demonstrating clarity in thought and formulating questions. Helpful