[HELP] Unable to determine whether to find statistical dependence or independence

Screen Link:
https://app.dataquest.io/m/431/bayes-theorem/2/example-walk-through

If we wish to find → P(Spam inter. “secret”) then can’t we find this by using the formula of Conditional Probability i.e. P(A | B) = P(A inter. B) / P(B) ?

i.e.
P(A inter. B) = P(B) | P(A | B) → under statistical dependence
P(A inter. B) = P(A) . P(B) → under statistical independence

But, in order to do so, we need to determine if the events A, B are statistically dependent or independent for which all below conditions must hold i.e.
> P(A)=P(A|B)
> P(B)=P(B|A)
> P(A∩B)=P(A)⋅P(B)

As shown on the screen link provided the solution can be done by just determining that if events A, B are mutually exclusive or not. If so, then don’t we need to check for dependence or independence in such a case?

I am not entirely sure if I understood your question here.

No, not all. Any one of those must be true.

As per the content -

In mathematical terms, we’ve seen A and B are independent if any of the conditions below are true

We are finding P(spam \cap secret) using that conditional probability.

P(spam \cap secret) = P(spam) * P(secret | spam)

The above becomes, in code -

p_spam_and_secret = p_spam * p_secret_given_spam

We are using the above because events A and B are dependent. As stated -

We can find the word “secret” in many spam emails.

The word “secret” in an email influences the probability of it being spam. That makes them dependent. As per the content -

two events A and B are independent if the occurrence of one doesn’t change the probability of the other

The occurrence of one does change the probability of the other, so they are dependent.

https://app.dataquest.io/m/430/conditional-probability%3A-intermediate/6/statistical-dependence

On the above link, it mentions that if any of the condition is false then it becomes statistically dependent. So, for statistically independent all three conditions below are must right?

> P(A)=P(A|B)
> P(B)=P(B|A)
> P(A∩B)=P(A)⋅P(B)

Ok, I see the problem and confusion.

So, usually, when looking at independence and dependence most texts (from what I know) only rely on checking if P(A \cap B) is equal to P(A)*P(B) or not.

That’s the only thing needed to be checked.

DQ considering all three sort of extends on that but it can be confusing because of how they phrased things.

P(A \cap B) = P(A)*P(B)

If the above is true then the following has to be true as well.

P(A) = P(A|B)
P(B) = P(B|A)

The above comes from the multiplication rule. In the previous Mission Step to the one you linked - Learn data science with Python and R projects, they show

So, we could either say that either

P(A) = P(A|B)
P(B) = P(B|A)

is True or we could say that

P(A \cap B) = P(A)*P(B)

is True or we could say all three are True for them to be independent.

Now, let’s consider the case where one of them is not True. Let’s say

P(A) \neq P(A|B)

Looking back at the image I shared above, that would mean -

P(A \cap B) = P(B)*P(A|B) = P(A)*P(B|A)

Or

P(B)*P(A|B) = P(A)*P(B|A)

If P(A) \neq P(A|B), then from the above we will also have P(B) \neq P(B|A). It’s a simple enough proof by looking at the above so I’m sure you can figure out why this is the case.

We can not have one True while the other is False. Either both are True or both are False.

So, when looking at independence we can say that either one of the statements is True or all of the statements are True because those statements imply the same thing. And for dependency, we can say that either one of the statements is False or all of the statements are False because those statements imply the same thing.

That’s why it’s often fine to only look at P(A \cap B) = P(A)*P(B) and see if that’s True or not.

Hopefully, this clears your confusion.

I couldn’t find any mention of “mutually exclusive” on that screen. How is this related to independence?

By definition, A and B are mutually exclusive if P(A\cap B) = 0.

A and B are independent if P(A\cap B) = P(A)\cdot P(B), and dependent otherwise.

This means that mutually exclusive automatically implies dependence. as P(A)\cdot P(B) \neq 0 (provided P(A) and P(B) are nonzero).

So, mutually exclusive \implies dependent. This also makes sense informally; if both A and B can’t happen simultaneously, then knowing that A has happened makes B impossible, changing its probability. The likelihood of each happening depends on the other.

But if the events are not mutually exclusive, then they can still be dependent or independent; so you would still need to check P(A\cap B) = P(A)\cdot P(B) .

Hope that helps!

On the same note, in the previous screen, we had P(HIV | T^+) = 0.03 and P(HIV) = 0.00014.

How can we determine if statement_3 was either True or False: whether P(HIV^C) and T^+ are dependent?

1 Like

Here’s what I understood so far…

We can generalise the following by saying that if one statement is False, everything will be False and vice-versa:

Based on the limited information we have, we can verify one condition/statement’s truth: P(HIV^C) = P(HIV^C|T^+). If the statement is False, the other statements won’t hold either thus the two events would be dependent.

But since:

\begin{align} & P(HIV^C) = 1 - P(HIV) = 0.99986\\ & P(HIV^C|T^+)=1-P(HIV|T^+) = 0.9698642\\ \\ & P(HIV^C) \neq P(HIV^C|T^+) \end{align}

The two events are thus dependent.

2 Likes

Thank you, @wanzulfikri wanzulfikri!

It makes sense the way you say it. If the P(HIV^C) = P(HIV^C|T^+) turns out to be false, then the idea of them being independent (mutually exclusive) is also False. Thus, they are dependent.

2 Likes