# Winning Jeopardy (m210)

Hi all!
Upload my notebook for this project.
I use full datases from Reddit.
Wait for your notices and criticism.
Winning_Jeopardy_m210.ipynb (502.9 KB)

Click here to view the jupyter notebook file in a new tab

2 Likes

Little notice
for cell 14 - first case Vector method

# Define function for creating deep_question and deep_answer

def deep_clean_question(row):
sentence_in = row.clean_question.split()
for word in sentence_in[:]:
if word in parasite_normal_words:
sentence_in.remove(word)
if len(sentence_in) == 0:
return np.nan
else:
sentence_out = â€™ â€™
sentence_out = sentence_out.join(sentence_in)
return sentence_out

for word in sentence_in[:]:
if word in parasite_normal_words:
sentence_in.remove(word)
if len(sentence_in) == 0:
return np.nan
else:
sentence_out = â€™ â€™
sentence_out = sentence_out.join(sentence_in)
return sentence_out

jeopardy[â€śdeep_questionâ€ť] = jeopardy.apply(deep_clean_question, axis = 1)

I decided to make an experiment and tried realizing this functionality with for i, row in jeopardy.iterrows() - the second case - standard loop iteration

for i, row in jeopardy.iterrows():
question = row.clean_question.split()
for word in question[:]:
if word not in parasite_normal_words and word not in question_words:
question_words.append(word)
if word in parasite_normal_words:
question.remove(word)
if len(question) == 0:
jeopardy.at[i,â€śclean_questionâ€ť] = np.nan
else:
sentence = â€™ â€™
sentence = sentence.join(question)
jeopardy.at[i,â€śclean_questionâ€ť] = sentence
if word not in answer_words and word not in parasite_normal_words :
if word in parasite_normal_words:
else:
sentence = â€™ â€™

Time of compilation in the second case was lower approximately ten-fifteen times than the first case. Seem that when each cell from 200 K rows extracted from string calculated as list and transformed back from list to string more optimized for specialized vector method.

How I understood from this case - whenever possible, give preference to vector methods in the pandas and numpy.

1 Like

Thanks for sharing another great project with us! Iâ€™m impressed with the high level and very profound data analysis you conducted. My favorite part is your statistical analysis and considering different levels of chi-square (and yes, I agree with you that this approach, in general, has its intrinsic weaknesses, even if itâ€™s widely used). It was a cool idea to use the whole dataset instead of only 10%: indeed, I see the numbers different from mine. Also, it was a very solid approach to try both â€śnaiveâ€ť and more significant (stop-word based) methods for analyzing questions and answers. All in all, your project is well-structured, the code well-commented and easy-to-read, nice format of the outputs of the code cells, and interesting conclusions. Great work, congratulations!

Here are some suggestions from my part, this time mostly about decorations:

• Itâ€™s better to remove the numbers before the subheadings.
• Avoid discussing technical details in markdown, consider only code comments for this purpose.
• Visualizations. Iâ€™d advise you to make bigger titles, axis labels, and legend font. Also, remove the frame around the legend.
• Be careful for typos.

Hope my feedback was hepful. Keep this high level and fast learning pace!

1 Like

Elena, thank you very much!
Keep in you mind yours notices.
I spent a lot of time not on working by this project and on for studding fighting with stats hypothesis and understanding Holley war between Fisher and Pearson about p values))

1 Like

Glad that my ideas were useful! And yes, completely agree with you, it was (and still is) quite a confusing concept for me as well

1 Like