I hope I don’t confuse you more than you already are.
Let’s first make our assumptions:
H_0 : Words in high value and low value questions are present homogenously
H_\alpha : Words in high value and low value questions are not present homogenously
I don’t know if you completely understand the concept of alpha or level of significance or significance and the p-value. So I will start from there.
The p-value is the critical value beyond which the region of rejection starts in the probability distribution. It is the smallest level of significance at which we can reject the null hypothesis.
- if \alpha >= p-value : we can reject H_0
- if \alpha < p-value : we cannot reject the H_0.
(this is best understood graphically, you can either check images on google or let me know if you want a clumsy hand-drawn one )
In this case X^2 = 0.4 with degrees of freedom = 9 gives us a p-value = 0.53.
That means our critical value is 53% or at half of the probability distribution. In order to reject the H_0 we would need an alpha value of more than 0.53.
Now let’s check the chi-squared probability table for \alpha = 0.05 and df = 9 which gives us a X^2 = 16.92. The test-statistic we got is 0.40.
Comparing 16.92 with 0.4 we can say that our test-statistic is way too low. We would need a chi-squared value of more than 16.92 in order to reject our H_0.
If you observe we don’t even have a column where p-value (Upper-tail) is 0.5. And a value this high means the chances of going wrong with the prediction is 0.5 or 50% error .
So a player can’t base the chance of winning jeopardy on the assumption that all they need to focus on, is the set of words used in high-value questions only. (help me correct myself here if I have made a mistake)
Chi-squared table in image is here
Let me know if this didn’t help at all.