Hi Dataquest team,
I wanted to bring to your attention what I think is an error on the theory part for the first course on conditional probability.
In introducing the concept of P(A|B), this dataset is used.
After the intro on cardinals, the equivalence with probabilities is introduced in an awkward way.
P(HIV | T+) is calculated as P(HIV ⋂ T+) / P(T+)
the values of these two variables are expressed as
P(T+) = 0.12 (?)
P(HIV ⋂ T+) = 0.000015 (??)
Thus leading to P(HIV|H+) = 0.000125
While I think the two values should be
P(T+) = 45/53 (success / possible)
P(HIV ⋂ T+) = 21/53
Thus leading to P(HIV|H+) = 0,466
Maybe the dataset was changed or maybe I have not understood a thing about conditional probability?
Looks like the 0.12 and 0.000015 values are plucked out of thin air?
I agree that using the provided data as you demonstrated would make a more relatable explanation.
You have understood condition probability correctly, but nevertheless there’s nothing theoretically wrong with 0.000015/0.12=0.000125.
Introducing the great allen downey collection to satisfy all your bayes needs: https://colab.research.google.com/github/AllenDowney/BiteSizeBayes
Hi @nlong, the table you showed above describes the results obtained using a certain HIV test. Before introducing the new probabilities (the ones which confused you), we mentioned using a a different HIV test (you might have missed that paragraph, hence the confusion):
This formula is useful when we only know probabilities. For instance, let’s say a different test is used to diagnose a patient. The patient tests positive for HIV, and we want to find P(HIV | T+) — the probability that the patient actually has HIV, given that the test was positive.
This time, however, all we know is P(T+)=0.12 and P(HIV∩T+)=0.000015. We can no longer find cardinals, but using the formula above, we have:
So the probabilities come from a different test, not from that initial table, and this aspect is already mentioned. Let me know if this is still confusing.
However from a UX perspective I found a bit confusing breaking the reasoning flow of the existing sample above introducing a new one ex abrupto. the text is very dense, so using some extra spacing would be helpful in breaking the reasoning and introduce some extra data.
I agree with the observation you’re making from a UX perspective, and I added a change to make this more clear — hopefully, the change will go live next week. Thanks!