I don't quite understand this line of code in this loop (Build Spam filter mission)

Screen Link:
https://app.dataquest.io/m/433/guided-project%3A-building-a-spam-filter-with-naive-bayes/5/the-final-training-set

DQ Code:

word_counts_per_sms = {unique_word: [0] * len(training_set['SMS']) for unique_word in vocabulary}

for index, sms in enumerate(training_set['SMS']):
    for word in sms:
        word_counts_per_sms[word][index] += 1

Everything was fine until this line
word_counts_per_sms[word][index] += 1

Can someone explain this to me, thanks.

1 Like

I am having the same problem.
I get this error:

KeyErrorTraceback (most recent call last)
in ()
8 for index, sms in enumerate(train_df[‘SMS’]):
9 for word in sms:
—> 10 word_counts_per_sms[word][index] +=1

KeyError: ’ ’

1 Like

Same here. I’m getting KeyError: ’ ’ too and I don’t know why.

1 Like

I can’t say what is causing this issue for all of you based on the information you’ve provided. I don’t get the KeyError when I run that code, so there’s a possibility that an error was introduced earlier on that causes a problem when you get to that line. I can have a look at it and try to figure it out if you upload a copy of your .ipynb file.

hey @DngNguyn
this post might be helpful for you.

for @wdzarif and @arturvieirasousa

you need to use series.str.split() method as given in instruction 4, before you start working on your loop. i.e.

df.series = df.series.str.split()

for row in df.series.....
    # some code

otherwise, the dictionary may have a space for a key, like as below:

image

Hope this helps.

1 Like

Hi @DngNguyn,
the matrix word_counts_per_sms is made of columns that are the vocabulary and rows that are counters for each column. The code you mention will keep a vocabulary word count when going through each message.

In combination with the information contained in train_set you have to create something like:
Screenshot_20200518_153035

Thank you for your response. It worked