Alternate way to calculate Constants For our Naive Bayes Filter

Powerful Python functionality : itertools

#learned chain function from itertools, also beautiful use case of tuple unpacking

from itertools import chain

messages = train.SMS.str.split()
words = list(chain(*messages))
vocabulary = pd.Series(words).unique()
len(vocabulary)

alpha = 1
P_spam = train.Label.value_counts(normalize=True)['spam']
P_ham = train.Label.value_counts(normalize=True)['ham']
N_spam = len(list(chain(*messages[train.Label == 'spam'])))
N_ham = len(list(chain(*messages[train.Label == 'ham'])))
N_vocab = len(vocabulary)


References:
rajtulluri Spam-Filter-using-Naive-Bayes

1 Like