I just finished this project but i don’t understand anything about the results. I understand that chi squared test tells you if there’s a significant difference between observed values and expected values but i’m not sure if i understood this correctly

if chi squared = 0, then there isn't a significant difference
if chi squared > 0, then there is a significant difference

So that means that in this result for example: Power_divergenceResult(statistic=0.401962846126884, pvalue=0.5260772985705469) There is a significant difference between observed and expected values? And what does that mean?

I hope I don’t confuse you more than you already are.

Let’s first make our assumptions:

H_0 : Words in high value and low value questions are present homogenously H_\alpha : Words in high value and low value questions are not present homogenously

I don’t know if you completely understand the concept of alpha or level of significance or significance and the p-value. So I will start from there.

The p-value is the critical value beyond which the region of rejection starts in the probability distribution. It is the smallest level of significance at which we can reject the null hypothesis.

if \alpha >= p-value : we can reject H_0

if \alpha < p-value : we cannot reject the H_0.
(this is best understood graphically, you can either check images on google or let me know if you want a clumsy hand-drawn one )

In this case X^2 = 0.4 with degrees of freedom = 9 gives us a p-value = 0.53.

That means our critical value is 53% or at half of the probability distribution. In order to reject the H_0 we would need an alpha value of more than 0.53.

Now let’s check the chi-squared probability table for \alpha = 0.05 and df = 9 which gives us a X^2 = 16.92. The test-statistic we got is 0.40.

Comparing 16.92 with 0.4 we can say that our test-statistic is way too low. We would need a chi-squared value of more than 16.92 in order to reject our H_0.

If you observe we don’t even have a column where p-value (Upper-tail) is 0.5. And a value this high means the chances of going wrong with the prediction is 0.5 or 50% error .
So a player can’t base the chance of winning jeopardy on the assumption that all they need to focus on, is the set of words used in high-value questions only. (help me correct myself here if I have made a mistake)

Thank you so much for this explanation. Now i understand that i need to compare my p-value and chi square with the table to be able to accept or reject the null hypothesis.

But i don’t understand how to find the degrees of freedom (I remember my material balance classes at college where the degrees of freedom was the difference between unknown values and ecuations, but i’m pretty sure here is different)

Well, the p-value is sufficient enough to help us figure out if we should reject or accept the H_0. And it’s not just for X^2 test, it’s also applicable for t-test, z-test or f-test etc.

To better understand the concept we do need the table!

For chi-squared test we calculate df as (Columns - 1) * (Rows - 1). Columns and Row do not include Totals.

Here we have Cols = 2 (high and low) and R = 10 (questions). Thus, df = (2-1) * (10-1) = 9

A detailed response for this I have given is here. Hope this helps.