Dictionary Comprehension, creating a dictionary with zeroes

Screen Link: https://app.dataquest.io/m/433/guided-project%3A-building-a-spam-filter-with-naive-bayes/5/the-final-training-set

Word_counts_per_sms = {unique_word: [0] * len(training_set['SMS']) for unique_word in vocabulary}

for index, sms in enumerate(training_set['SMS']):
    for word in sms:
        word_counts_per_sms[word][index] += 1

Hello! I’m having a hard time to understand the following part of the code above:

Word_counts_per_sms = {unique_word: [0] * len(training_set['SMS']) for unique_word in vocabulary}

I do understand how’s the indexing is working, but the [0] * len(training_set[‘SMS’]) is a bit confusing for me. Does it mean that it generates a list of zeroes that is equal to the amount of each unique_word(key) in vocabulary so that I have a corresponding value ([0]) for all of those indexes?

Thank you!

You almost had it!

It generates a list of zeros of the size of the DataFrame. Notice that [0] is being multiplied by the length of training_set['SMS'], which never changes, so all the lists are the same size.

It generates such a list for every unique word in the vocabulary, or writing it as code, for unique_word in vocabulary. Each unique word becomes a dictionary key whose value is a list of zeros of the size of the DataFrame.

I hope this helps you.

2 Likes