Mistake: No strong correlation of 'sat_score' with 'el_percent' than 'total-enrollment'

Screen Link: Learn data science with Python and R projects
This statement written in mission doesn’t seem right: “There’s an interesting cluster of points at the bottom left where total_enrollment and sat_score are both low. This cluster may be what’s making the r value so high”.

There are only 12 values in bottom left cluster out of 363 in combined dataframe. How these 12 values will make r value so high as the above statement in dataquest content says.

Same is being said on next screen: Learn data science with Python and R projects
"This indicates that it’s actually ell_percent that correlates strongly with sat_score , rather than total_enrollment".

total_enrollment correlation with sat_score is almost 0.36.
el_percent correlation with sat_score is almost -0.39. If total enrollment has no strong correlation then how come 0.03 diff with make correlation strong. I think they both are equally strongly correlated with sat_score. If not then el_percent is just a smattering strong correlated than total_enrollment but the content creator mistakes to not tell this clearly.

1 Like

I have this exact same doubt. When we plot ell_percent vs sat_score, there seems to be no strong correlation.