Sharing "Winning Jeopardy"

Hello Community,

Just now, I completed Dataquest’s guided project “Winning Jeopardy". This is to link to the last page of the guidelines:

In comparison to the instructions (and solution notebook) provided by Dataquest, here are some differences:

  • I used a dataset with over 200,000+ records (instead of 20,000)
  • I added a count of the terms that are used most across all questions
  • And for some of the most popular of those, did a chi-squared test to figure out whether they are under- or over-represented in high-value question

For the last of those, my conclusion is ‘yes, that is the case’. However my overall conclusion is that if you want to prepare for jeopardy you may rather just study general knowledge.

Here is my notebook:
WinningJeopardy.ipynb (137.1 KB)

A link to show this in Notebook viewer can be found at the bottom of this post.

Any feedback is welcome and appreciated!

Best regards,

Click here to view the jupyter notebook file in a new tab


Hello @jasperquak, thanks for sharing your project with the Community! As always, great job, you’ve always done very extensive research of the dataset’s background and thoroughly explained your decisions. This project is no exception :slight_smile: Also, well done on using the full dataset and answering more questions in addition to those proposed by DataQuest.

Some feedback from my side:

  • Your gameplay description is a bit confusing. I understood that participants have to come up with a question rather than an answer but I don’t understand why the dataset sample does not show that. Could you elaborate on this?
  • In function normalize_string(input), the argument input is a reserved Python word, and it’s not a good idea to use it as an argument because it may cause unpredictable results. Could you think about another name for this argument? Do the same for the normalize_value(input) functions
  • It is also a good idea to import all packages in the first code cell so we are aware of what’s used in the project
  • In this sentence, In the samples so far we see entries like $200 and $1,800 you have a backslash,\
  • In the normalize_value(input) function you have redundant commented variables that you do not use
  • In addition to the above point, you have these comments # commented out after verifying. They are of no use in the final version of the project, you just needed them for debugging, so remove them
  • as those are the termss that typically form the ‘heart’ - a small type
  • You use a lot of functions and comment on what they do. It’s a great practice but you can also use docstrings. A good example of their use is the pandas source code
  • In [46] use strings to describe the numbers of the output
  • Rather, I conclude that the functions seems to work, so let’s apply it to a couple of popular terms. - the typo is highlighted

That’s it for me. Happy coding!


Hello @artur.sannikov96 , thank you for your encouraging feedback and your detailed review comments! Much appreciated!

For most of your feedback points, my response is: “clear, agreed, up-to-me to improve!” (E.g. not using input as an argument name… how could I… ). I will make updates when the time is right.

For some of your feedback points, let me respond.

This is a row from the dataset:
“Question:” In 1963, live on “The Art Linkletter Show”, this company served its billionth burger
“Answer:” McDonald’s

So if a participant gets this question, then he may (or may not) come up with the answer. That I would understand.

However, the game play description online says that people rather get an answer and should come up with a question. Now, suppose you are a participant that is confronted with (answer) “McDonalds”, I can imagine you would come up with a question like “Which hamburger chain has a big yellow letter?”. Or with “Where do you go to daily in case you want to get supersized as soon as you can?” But no way you will come up with “Which company sold its millionth burger live on the Art Linkletter show in 1963?”.

So that is what confused me a bit. I could not match the “Question” and “Answer” columns in the dataframe with the description of the game play that I found online.

This one is odd. In this Markup cell, I have this:

Next, we'll change column Value into a numeric field, to be able to manipulate it easier. In the samples so far we see entries like \\$200 and \\$1,800. Let's first check if there is more.

I added the backslashes, since without those I actually got this when running the cell:

Adding the backslashes solved it for me on my laptop. I just checked once more, it shows correct there. However here in the viewer on this platform it renders differently and there is the \ that you pointed out.

I don’t know how to solve that. Any idea…? (And regularly have such issues that Markup cells appear differently on my own laptop than in an Notebook viewer).

Here I wasn’t sure what you meant. Could you elaborate? Note that in [46] I am creating a function with two numbers as return arguments. I use that function in cell [49] to fill a dictionary with some numbers.

Yep, this is odd. I cannot understand this either.

Maybe it’s just the NB Viewer. No worries, it’s not a big issue for an overall excellent project.

Pardon, I meant the cells 47-48. Use print() to explain what these numbers mean. Otherwise, we need to refer back to the code to understand them.

Thanks again, @artur.sannikov96 . All clear (and agreed) for your additional comments!

1 Like