word_counts_per_sms = {unique_word: [0] * len(training_set['SMS']) for unique_word in vocabulary}
for index, sms in enumerate(training_set['SMS']):
for word in sms:
word_counts_per_sms[word][index] += 1
Everything was fine until this line word_counts_per_sms[word][index] += 1
KeyErrorTraceback (most recent call last)
in ()
8 for index, sms in enumerate(train_df[‘SMS’]):
9 for word in sms:
—> 10 word_counts_per_sms[word][index] +=1
I can’t say what is causing this issue for all of you based on the information you’ve provided. I don’t get the KeyError when I run that code, so there’s a possibility that an error was introduced earlier on that causes a problem when you get to that line. I can have a look at it and try to figure it out if you upload a copy of your .ipynb file.
Hi @DngNguyn,
the matrix word_counts_per_sms is made of columns that are the vocabulary and rows that are counters for each column. The code you mention will keep a vocabulary word count when going through each message.
In combination with the information contained in train_set you have to create something like: