Build a spam filter with Naïve Bayes

Can someone please help me with this screen - https://app.dataquest.io/m/433/guided-project%3A-building-a-spam-filter-with-naive-bayes/5/the-final-training-set

I could not understand the enumeration extraction of the data

word_counts_per_sms = {unique_word: [0] * len(training_set['SMS']) for unique_word in vocabulary}

for index, sms in enumerate(training_set['SMS']):
    for word in sms:
        word_counts_per_sms[word][index] += 1

What particularly is confusing you? Have you looked up the documentation for enumerate()?

Yes, word_counts_per_sms = {unique_word: [0] * len(training_set['SMS']) for unique_word in vocabulary}
This line is confusing me

In Python, there is a particular concept called as “comprehensions”.

For example, a list comprehension is something like -

a = [i for i in range(10)]

The above will result in a being a list filled with numbers from 0 to 9. The above is equivalent to -

a = []
for i in range(10):
    a.append(i)

Similar to list comprehensions, we can create dictionaries.

In the code you share, that’s what the curly braces {} refer to. Just like for list comprehensions we have [].

So, that line of code is creating a dictionary for you.

How that dictionary is populated is dependent on the individual elements of that code line. Which you can start to decipher -

  • What is the for loop part doing there?
  • What is len(training_set['SMS'])?
  • What does [0] * len(training_set['SMS']) return?
  • What does the final dictionary look like based on the above?

Break it down, print things out individually if needed, but you should be able to make sense of it. If you get stuck feel free to ask more questions.