Screen Link: https://app.dataquest.io/m/100/multi-category-chi-squared-tests/2/calculating-expected-values
I would like to make sure one thing in this mission.
In this mission(not in practice), we calculate
0.241 * 0.33 to get expected proportion of people who are female and earn
When calculating this, are we assuming that P(Female) and P(earning >50K) are independent since we do not know P(earning >50K | Female) or P(Female|earning > 50K ) ?
So, P(Female)*P(earning > 50K) is the best way we can do to eastimate the proportion. Am I correct?
Thank you so much for your time for this.
Well yes. We can say that we started with the assumption that, gender should not have an impact on income of the given employee. That is our expectation.
But once we start the analysis and we see a pattern that gender does impact the earnings, we can try to establish the connection/ correlation between them.
Again we are not ascertaining the causality, just trying to establish the correlation. So if the analysis gives a result such that
P(Female) * P(earnings > 50k) is different than
P(Female) * P(earnings > 50K | Female), we can then say that the two are dependent variables and not independent. Better way would be Income is a dependent variable on the Independent Gender variable.
Hope that helps.