Best Way to win Jeopardy

Hello DQ, I’m sharing my guided project on winning jeopardy. I’ll like to get feed back on the chi square testing and what the pvalues truly represent. I know a pvalue below the threshold means to reject the null hypothesis but I’ll really love more explanations.

here’s a link to the last mission screen Learn data science with Python and R projects

and also my notebook below.
winning jeopardy.ipynb (40.2 KB)

Click here to view the jupyter notebook file in a new tab

2 Likes

Hi @abomayesan! Thank you for sharing your project with the Community :slight_smile: Your markdown cells are clear and conclusions are very concise so well done!

Some feedback from my side:

  • Good on trying to use docstrings for functions. However, you may consider sticking to one of the most common docstring styles, like NumPy/pandas style
  • Your section names are inconsistent, sometimes they have a color, full stop, or no punctuation at the end
  • Why didn’t you say anything about “America Revolution” making up to 1/3 of all questions in the Tiebreaker round?
  • You have some typos, correct them
  • Sometimes, you write Jeopardy with a capital letter, with and without the exclamation mark, in and out of backticks, ``, so unify the naming style

As for your question: the p-value means the probability that you get the results you get assuming that your null hypothesis is true. Thus, in this case, your null hypothesis is that the number of observed low-value questions is equal to the expected number of high-value questions (this part of your reasoning is not completely clear to me) is equal. Thus, if your p-value is 0.05, this means that you have only a 5% of chance of observing the result that you observe if you null hypothesis were true. Generally, this is the cut-off everyone uses to reject the null hypothesis, but this is just a convention and sometimes you need to be more conservative and take a p-value of 0.01 or even lower as your threshold.

Let me know if you have any further questions, and happy coding :slight_smile:

1 Like

Thank you Artur. Thanks for sharing the numpy/pandas docstring style guide.

  • I have corrected the inconsistent naming of my sections.

  • the reason I didn’t mention the America Revolution making up 1/3 of all questions in the tie breaks round is because the tie breaks made up only a few percentage of the data, less than 1% of the data. It’s the same reason I didn’t mention the final jeopardy round. I’m going to state my reasons for it now that you’ve pointed it out.

  • for my null hypothesis I assumed that observed frequencies will be evenly split between low-value and high value questions.

3 Likes

Got it, thank you. Do you understand what the p-value means now?

1 Like

Yes I do. The way you explained it was very helpful. Thank you.

2 Likes