We're not actually calculating the true probability here, right?

Screen Link:

We’re being asked to calculating P(Spam^c | w1,w2,w3,w4). This should be
(P(Spam^c) * P(w1,w2,w3,w4 | Spam^C)) / P(w1,w2,w3,w4), right? It’s only for the sake of easier computation that we are only calculating the numerator since it’s not necessary to calculate the dominator for the purposes of the algorithm to be learned in this mission.

I’m just double checking here, since it’s a little confusing when Q1 asks for P(Spam^c | w1,w2,w3,w4). I read that as “probability of not spam, given words 1,2,3 & 4”. The sufficient answer, however, isn’t actually P(Spam^c | w1,w2,w3,w4)

Follow-up Q:
If my reasoning here is correct, if I did want the true probability, I would divide the result by P(w1,w2,w3,w4). I imagine P(w1,w2,w3,w4) = P(w1) * P(w2) * P(w3) * P(w4), but I am not 100% sure about that…

That’s correct. You can go through Step 4 of the Mission if you are still feeling stuck or confused. It does state -

It’s true the probability values are not accurate anymore. However, this is not important with respect to the the goal of the algorithm — correctly classifying new messages (not to accurately estimate probabilities).

The classification itself remains completely unaffected because we ignore division for both equations (not just for one). The probability values change, but they change directly proportional with one another, so the result of the comparison doesn’t change.

1 Like

Thank you, I was just confused since it seemed a little a weird to ask for P(A) when really it was more like “P(A)” being asked. Thank you for making sure what was giving me uneasiness