Guided project naive Bayes spam filter

Screen Link:

My Code:

present_probs <- word_counts %>% 
    filter(word %in% words) %>% 
    mutate(
      # Calculate the probabilities from the counts
      spam_prob = (spam_count + alpha) / (n_spam + alpha * n_vocabulary),
      ham_prob = (ham_count + alpha) / (n_ham + alpha * n_vocabulary)

this code calculated probability with unique n_spam why it have to be unique it should be “all possible word” am I getting it right ?

and I know
this below line of code use unique to count the numerator in this ->spam_prob = (spam_count + alpha) / (n_spam + alpha * n_vocabulary)

spam_counts <- tibble(
  word = spam_vocab
) %>% 
  mutate(
    # Calculate the number of times a word appears in spam
    spam_count = map_int(word, function(w) {
      
      # Count how many times each word appears in all spam messsages, then sum
      map_int(spam_messages, function(sm) {
        (str_split(sm, " ")[[1]] == w) %>% sum # for a single message
      }) %>% 
        sum # then summing over all messages
      
    })
  )

but what I really curious is the denomenator in this --> spam_prob = (spam_count + alpha) / (n_spam + alpha * n_vocabulary) why the n_spam denominator still unique ? should it be all words?


because here in lesson you teach all word??? then why in project you use unique()

@casey

Please look into this.