Curious about solution notebook

Screen Link:

My Code:

present_probs <- word_counts %>% 
    filter(word %in% words) %>% 
    mutate(
      # Calculate the probabilities from the counts
      spam_prob = (spam_count + alpha) / (n_spam + alpha * n_vocabulary),
      ham_prob = (ham_count + alpha) / (n_ham + alpha * n_vocabulary)

What I expected to happen:
in n_spam and n_vocabulary why in the solution have to unique() the word in n_spam and n_vocab because in the formula its has to be all probability in all word but why you do probability from unique

What actually happened:
pls ans thx

I think the program uses this code to get the unique number of words in spam, since you do not want to calculate the probability of a word more than once.

spam_vocab <- spam_vocab %>% unique

The program uses this code below to count the number of these unique words in spam that are in the total spam messages. You pick a word in spam, say offer, you count how many times this word occur in the spam messages.

spam_counts <- tibble(
  word = spam_vocab
) %>% 
  mutate(
    # Calculate the number of times a word appears in spam
    spam_count = map_int(word, function(w) {
      
      # Count how many times each word appears in all spam messsages, then sum
      map_int(spam_messages, function(sm) {
        (str_split(sm, " ")[[1]] == w) %>% sum # for a single message
      }) %>% 
        sum # then summing over all messages
      
    })
  )

present_probs <- word_counts >
filter(word in words) >
mutate(
# Calculate the probabilities from the counts
spam_prob = (spam_count + alpha) / (n_spam + alpha * n_vocabulary),
ham_prob = (ham_count + alpha) / (n_ham + alpha * n_vocabulary)
but when we calculate probability the denominator should be “all word” am I getting it right ?