Hello DQ, I’m sharing my guided project on winning jeopardy. I’ll like to get feed back on the chi square testing and what the pvalues truly represent. I know a pvalue below the threshold means to reject the null hypothesis but I’ll really love more explanations.
here’s a link to the last mission screen Learn data science with Python and R projects
and also my notebook below.
winning jeopardy.ipynb (40.2 KB)
Click here to view the jupyter notebook file in a new tab
2 Likes
Hi @abomayesan! Thank you for sharing your project with the Community Your markdown cells are clear and conclusions are very concise so well done!
Some feedback from my side:
 Good on trying to use docstrings for functions. However, you may consider sticking to one of the most common docstring styles, like NumPy/pandas style
 Your section names are inconsistent, sometimes they have a color, full stop, or no punctuation at the end
 Why didn’t you say anything about “America Revolution” making up to 1/3 of all questions in the Tiebreaker round?
 You have some typos, correct them
 Sometimes, you write Jeopardy with a capital letter, with and without the exclamation mark, in and out of backticks, ``, so unify the naming style
As for your question: the pvalue means the probability that you get the results you get assuming that your null hypothesis is true. Thus, in this case, your null hypothesis is that the number of observed lowvalue questions is equal to the expected number of highvalue questions (this part of your reasoning is not completely clear to me) is equal. Thus, if your pvalue is 0.05, this means that you have only a 5% of chance of observing the result that you observe if you null hypothesis were true. Generally, this is the cutoff everyone uses to reject the null hypothesis, but this is just a convention and sometimes you need to be more conservative and take a pvalue of 0.01 or even lower as your threshold.
Let me know if you have any further questions, and happy coding
1 Like
Thank you Artur. Thanks for sharing the numpy/pandas docstring style guide.

I have corrected the inconsistent naming of my sections.

the reason I didn’t mention the America Revolution making up 1/3 of all questions in the tie breaks round is because the tie breaks made up only a few percentage of the data, less than 1% of the data. It’s the same reason I didn’t mention the final jeopardy round. I’m going to state my reasons for it now that you’ve pointed it out.

for my null hypothesis I assumed that observed frequencies will be evenly split between lowvalue and high value questions.
3 Likes
Got it, thank you. Do you understand what the pvalue means now?
1 Like
Yes I do. The way you explained it was very helpful. Thank you.
2 Likes