Blue Week Special Offer | Brighten your week!
days
hours
minutes
seconds

How to explain the final result,chi square, pvalue,df?

Running below code to get chi-square, p-value, and df. I dont know if my interpretation is correct or not.
NULL: Gender and race have no impact on income.
Alternataive: Gender and race have impact on income.

1,P-value is larger than 0.05 in this case ( 5.19>0.05), we accept alternatavie hypothesis.correct?
2, how to use chi-square value 454.27 here?
3, df=4, we did not choose df. why it is 4 and how to explain it ?

chi-square:454.2671089131088
p-value:5.192061302760456e-97
df:4

Screen Link: https://app.dataquest.io/m/100/multi-category-chi-squared-tests/6/finding-expected-values

My Code:

import numpy as np
from scipy.stats import chi2_contingency

observed=pandas.crosstab(income['sex'],[income['race']])

chisq_value,pvalue_gender_race,df,expected=chi2_contingency(observed)

print(chisq_value,pvalue_gender_race,df,expected)

Hey @candiceliu93

I don’t have access to the given course attached. So I am gonna base the response on my own course notes/ practice notebook.

Let’s go in reverse order.

Let’s see the example given in course. This is how the observed values look like:
image

The green colored cells here represent cross-joint distribution and the red cells indicate the marginal distribution. For a tabular data like this we calculate,

dof = (rows - 1) * (columns - 1), where rows and columns do not include the marginal distribution. This will give us dof = (2 -1) * (5 -1) = 1 * 4 = 4

Now to answer the question what does it actually mean? If we observe the following table, we would needed minimum of 4 values to identify the rest of the values ("?")
image.

If we go any below than 4, we won’t be able to calculate the rest of the values. That is minimum no. of independent variables we should know is 4. (the second link gives you multiple example to elaborate this!)

-p-value and alpha: Well here is where you are slightly mistaken. The p-value that is calculated is 5.192061302760456e-97. this p-value is not 5.19. It’s actually 0.000.

The general rule we have is this:

  • if p-value <= \alpha (which has to be pre-decided and should not be varied based on p-value obtained!), we do not have statistically significant results to support H_0 and so we can reject it. We accept the H_\alpha
  • if p-value > \alpha, we do have statistically significant results to support H0 and so we cannot reject it.

So based on the result we have got, we can that we reject the H_0. We can say that at 5% significance level we cannot support Gender and Race do not have an impact on Income. In fact they actually do!

These links are helpful but may not clear all your doubts and that’s okay. Do let know in case you need more help on this.

Just in case a discussion with another DQ student is here

2 Likes