Building a Spam Filter: How does the

Hi DQ Community,

Can anyone help me better understand the below?

Screen Link:

The Code in Question:

word_counts_per_sms = {unique_word: [0] * len(training_set['SMS']) for unique_word in vocabulary}

for index, sms in enumerate(training_set['SMS']):
    for word in sms:
        word_counts_per_sms[word][index] += 1

Can someone please explain what is happening in this for-loop? For some reason its just not clicking for me. I understand what the output should be (a dictionary that counts the occurences of a word for each row) and I understand what the input is (a dictionary where a word is the key and the value is a list of zeros equal to the rows in the dataset), but what exactly is going on in this loop? What is the index doing?

This seems like it could be a common practice and I want to make sure I understand what’s going on under the hood.


I think I understand after dwelling on it some more. The index tells the dictionary at what position (depth) it should increment by 1 because the index and the row would be equivalent in the dataframe and the dictionary position.

1 Like