Screen Link:
My Code:
question_overlap = []
terms_used = set()
#Sort jeopardy dataframe by ascending air date
sorted_jeopardy = jeopardy.sort_values(by='Air Date')
import re
#Loop through each row of dataframe
for i, row in jeopardy.iterrows():
split_question = row['clean_question'].split()
split_question = re.sub(r'[\w{6,}]', '', split_question)
match_count = 0
for word in split_question:
if word in terms_used:
match_count += 1
terms_used = set.add(word)
if len(split_question) > 0:
match_count /= len(split_question)
question_overlap.append(match_count)
jeopardy['question_overlap'] = question_overlap
print(jeopardy['question_overlap'].mean())
What I expected to happen:
question overlap mean to print
What actually happened:
TypeErrorTraceback (most recent call last)
<ipython-input-20-2b9cd06f8b25> in <module>()
8 for i, row in jeopardy.iterrows():
9 split_question = row['clean_question'].split()
---> 10 split_question = re.sub(r'[\w{6,}]', '', split_question)
11 match_count = 0
12 for word in split_question:
/dataquest/system/env/python3/lib/python3.4/re.py in sub(pattern, repl, string, count, flags)
177 a callable, it's passed the match object and must return
178 a replacement string to be used."""
--> 179 return _compile(pattern, flags).sub(repl, string, count)
180
181 def subn(pattern, repl, string, count=0, flags=0):
TypeError: expected string or buffer
In the instructions it states to
* Remove any words in `split_question` that are less than `6` characters long.
Can’t this be done with a regex pattern? I used
split_question = re.sub(r'[\w{6,}]', '', split_question)
PatternsinJeopardyQuestions-Copy2.ipynb (22.4 KB)