Screen Link:

https://app.dataquest.io/m/432/the-naive-bayes-algorithm/6/multiple-words

What I expected to happen:

I don’t quite get the rationale behind reaching probabilities of w2,w3,w4 given “spam”. Based on the logic of reaching probability of w1 given “spam”, the probability of w2 given “spam” should be 4/7 but not 1/7.

Can someone help me out?

First, try to explain in your post or as a reply why you think it should be `4/7`

. That can help narrow down where your thought process might be diverging from theirs.

Please note that I updated your question title so it better describes what you are asking as per the details. If you wish to rephrase it, as long as it’s descriptive of your actual question (so that other students can find and refer to it easily), then please go ahead.

Thank you for the update.

The reason why I think [P(w2|Spam) = 4/7 is pretty much the same reason of how P(w1|Spam) = 4/7 is reached in the course: the second word, w2, is “secret” in the second spam message, and we see that “secret” occurs four times in all spam messages.

It would be much appreciated if someone can pick up my misunderstanding?

No, that’s incorrect. w_2 corresponds to the 2nd word for the **5th message** (index = 4).

We are trying to find out whether the 5th SMS is spam or not. This is also depicted by the graphic -

w_2 is `place`

.

From the labeled data (the first 4 messages), there are 2 spam messages. In those 2 spam messages, w_2 = `place`

appears only **once** out of a total of 7 words.

2 Likes