Thinking about Statistical Dependence


I’m trying to understand statistical independence.
It says here if any of the 3 does not hold --> statistically dependent.
I want to prove that if indeed any of the 3 don’t hold, it won’t be only 1 of them that don’t hold.
Currently i can see if 1 is not true, 3 will be not true too. If 2 is not true, 3 will be not true too. So at least 2 of these 3 will be False
My question is the relationship between 1 and 2. If 1 is not true, does it mean 2 is not true also? And vice versa? Any examples to show this?
Common sense says 1 and 2 do not have to be both False together, eg. A depend on B, but B does not depend on A --> 1 is False, 2 is True. However, another intuition is if A depends on B, A’s values will be somehow related to B, making it look like B depends on A too.
Finding it hard to link common sense to math

Using Bayes Theorem, Scenario 1 being untrue will also mean Scenario 2 is untrue.

Bear with the bad hand-writing…

Independence is a symmetric relationship, and the corollary to that should be that dependence is therefore symmetric too. Which means in this case, I think the 2nd of your two intuitions is the right one:

We can make a stronger statement that is easier to prove:

Proposition: Given events A and B, the following are equivalent:

\begin{align} P(A) &= P(A\vert B) \tag 1\\ P(B) &= P(B\vert A) \tag 2\\ P(A\cap B) &= P(A)P(B) \tag 3 \end{align}

Proof:

Begin by noting that by definition, we have P(A\vert B) = \dfrac{P(A\cap B)}{P(B)} and P(B\vert A) = \dfrac{P(B\cap A)}{P(A)} .

Now

\begin{align} P(A) = P(A\vert B) &\iff P(A) = \dfrac{P(A\cap B)}{P(B)} \tag {Definition}\\ &\iff P(A)P(B) = P(A\cap B) \tag {Arithmetic}\\ &\iff P(A)P(B) = P(B\cap A) \tag {$\cap$-commutativity}\\ &\iff P(B) = \dfrac{P(B\cap A)}{P(A)} \tag {Arithmetic}\\ &\iff P(B) = P(B\vert A) \tag {Definition} \end{align}

So we started from (1) proved it is equivalent to (3) and in turn we proved that (3) is equivalent to (2), which proves that they’re all equivalent.\square

Thanks for pointing out the symmetry of independence and the equivalence of all 3 if either 1 or 2 is True. Now i see how 1 being False causes 2 to be False too and vice versa.

Not a math student, so a random question here: Seems like many equations are equivalent, does this mean that mathematicians could have discovered such equivalences from both ends? (or beginning at any point in the equivalence chain) Or is there an asymmetry in difficulty going from 1 end to deriving the other, or are there other constraints that force such knowledge growth to go in 1 direction? (such as support from existing knowledge). Were there examples of independent (not in probability sense) discoveries coming from both ends/ different starting points of the equivalence chain?

P-value has statistical significance and practical significance, does independence have practical independence too? I thought of this while thinking about how people practically use probability theory.

What is the use of knowing 2 events are statistically independent? I can’t see the practical use of it from a business point of view(the predictive value to guide decision making) and change outcomes. I assume statistical independence can only be confirmed after events have occurred, data is collected and calculated and compared, by which time it’s too late to do anything to affect the events or their results. Do people only use it to study cause and effect, such as in biological/physical sciences?

The research process is hardly ever linear. I’d say that most of the time what happens is that two “things” exist on their own right, until someone suspects that they’re connected somehow and tries to prove they are equivalent, equal, whatever. On “what end to start” is something that probably depends on many factors, to the point where I suspect we can say it’s random.

There definitely is asymmetry in difficulty sometimes. A familiar example is the quadratic formula:

Given real numbers a,b and c, where a\neq 0, then for all real numbers x it holds that
ax^2+bx+c = 0\iff x=\dfrac{-b \pm \sqrt{b^2-4ac}}{2a}.

Going from x=\dfrac{-b \pm \sqrt{b^2-4ac}}{2a} to the starting equation is relatively straightforward, but going in the other direction is more complicated.

Usually, big discoveries in mathematics aren’t one single statement, but rather a whole theory, so your question might be hard to answer, but you can try asking on Stack Exchange’s site History of Science and Mathematics

An example of a whole theory that was independently developed by two people is Calculus (Newton and Leibniz). You can see other examples on the List of multiple discoveries Wikipedia article.

In the context of association rules, and to exemplify with a business use case, in the context of market basket analysis — a data mining technique that allows us to answer questions like “What do people purchase together with bread?” — say you have notice that a lot of the baskets that have bread, also have toilet paper.

Consider the two events

  • A: Toilet paper is in the basket;
  • B: Bread is in the basket;

Lots of baskets that have bread also have toilet paper is another way of saying that P(A\vert B) is high. But you also happen to know from your domain knowledge in retail that a lot of people purchase bread, and a lot of people purchase toilet paper, so the rule “Buying bread leads to buying toilet paper” isn’t really all that useful, the events are independent. There isn’t much to gain from creating a promo bundle that contains bread and toilet paper.

There is even a metric that measures how independent two events are. It’s called lift and it’s defined as \dfrac{P(A\cap B)}{P(A)P(B)}. The closer this ratio is to 1, the more independent A and B are (compare with (3) in my previous reply). You’ll want to create bundles where lift is as high as possible.

1 Like

Thanks for the quadratic idea! Didn’t know so many methods exist (i’ve only seen completing the square).
Interesting to see how independence guides decisions to avoid. I have been thinking of independence as something good to seek for. (probably influenced by learning naive bayes)

1 Like