Winning Jeopardy (m210)

Hi all!
Upload my notebook for this project.
I use full datases from Reddit.
Wait for your notices and criticism.
Best regards, Vadim Maklakov.
Winning_Jeopardy_m210.ipynb (502.9 KB)

Click here to view the jupyter notebook file in a new tab

2 Likes

Little notice
for cell 14 - first case Vector method

Define function for creating deep_question and deep_answer

def deep_clean_question(row):
sentence_in = row.clean_question.split()
for word in sentence_in[:]:
if word in parasite_normal_words:
sentence_in.remove(word)
if len(sentence_in) == 0:
return np.nan
else:
sentence_out = ’ ’
sentence_out = sentence_out.join(sentence_in)
return sentence_out

def deep_clean_answer(row):
sentence_in = row.clean_answer.split()
for word in sentence_in[:]:
if word in parasite_normal_words:
sentence_in.remove(word)
if len(sentence_in) == 0:
return np.nan
else:
sentence_out = ’ ’
sentence_out = sentence_out.join(sentence_in)
return sentence_out

Add new columns

jeopardy[“deep_question”] = jeopardy.apply(deep_clean_question, axis = 1)
jeopardy[“deep_answer”] = jeopardy.apply(deep_clean_answer, axis = 1)

I decided to make an experiment and tried realizing this functionality with for i, row in jeopardy.iterrows() - the second case - standard loop iteration

for i, row in jeopardy.iterrows():
question = row.clean_question.split()
for word in question[:]:
if word not in parasite_normal_words and word not in question_words:
question_words.append(word)
if word in parasite_normal_words:
question.remove(word)
if len(question) == 0:
jeopardy.at[i,“clean_question”] = np.nan
else:
sentence = ’ ’
sentence = sentence.join(question)
jeopardy.at[i,“clean_question”] = sentence
answer = row.clean_answer.split()
for word in answer[:]:
if word not in answer_words and word not in parasite_normal_words :
answer_words.append(word)
if word in parasite_normal_words:
answer.remove(word)
if len(answer) == 0:
jeopardy.at[i,“clean_answer”] = np.nan
else:
sentence = ’ ’
sentence = sentence.join(answer)
jeopardy.at[i,“clean_answer”] = sentence

Time of compilation in the second case was lower approximately ten-fifteen times than the first case. Seem that when each cell from 200 K rows extracted from string calculated as list and transformed back from list to string more optimized for specialized vector method.

How I understood from this case - whenever possible, give preference to vector methods in the pandas and numpy.

1 Like

Hi Vadim,

Thanks for sharing another great project with us! I’m impressed with the high level and very profound data analysis you conducted. My favorite part is your statistical analysis and considering different levels of chi-square (and yes, I agree with you that this approach, in general, has its intrinsic weaknesses, even if it’s widely used). It was a cool idea to use the whole dataset instead of only 10%: indeed, I see the numbers different from mine. Also, it was a very solid approach to try both “naive” and more significant (stop-word based) methods for analyzing questions and answers. All in all, your project is well-structured, the code well-commented and easy-to-read, nice format of the outputs of the code cells, and interesting conclusions. Great work, congratulations!

Here are some suggestions from my part, this time mostly about decorations:

  • It’s better to remove the numbers before the subheadings.
  • Avoid discussing technical details in markdown, consider only code comments for this purpose.
  • Visualizations. I’d advise you to make bigger titles, axis labels, and legend font. Also, remove the frame around the legend.
  • Be careful for typos.

Hope my feedback was hepful. Keep this high level and fast learning pace!

1 Like

Elena, thank you very much!
Keep in you mind yours notices.
I spent a lot of time not on working by this project and on for studding fighting with stats hypothesis and understanding Holley war between Fisher and Pearson about p values))

1 Like

Glad that my ideas were useful! And yes, completely agree with you, it was (and still is) quite a confusing concept for me as well :exploding_head:

1 Like