This article is based on an example from the second edition of Think Bayes, forthcoming from O’Reilly Media.
Elvis Presley had a twin brother who died at birth. It’s unknown if they were identical or fraternal twins, but we can use Bayes’s Rule and data from the U.S. Census Bureau to figure the odds.
First, we need some background information about the relative frequencies of identical and fraternal twins.
Then we’ll use Bayes’s Rule to take into account one piece of data, which is that Elvis’s twin was male.
Then we’ll take into account a second piece of data, which is that Elvis’s twin died at birth.
Step 1: Get the Data
For background information, I’ll use data from 1935, the year Elvis was born, from the U.S. Census Bureau, Birth, Stillbirth, and Infant Mortality Statistics for the Continental United States, the Territory of Hawaii, the Virgin Islands 1935.
It includes this table, which shows the total number of plural births in the United States.
The table doesn’t report which twins are identical or fraternal, but we can use the data to estimate it.
With the numbers in the table, we can compute the fraction of twins that are opposite sex, which I’ll call
opposite = 8397 same = 8678 + 8122 x = opposite / (opposite + same) x
But the quantity we want is the fraction of twins who are fraternal, which I’ll call
p_f. Let’s see how we can get from
Because identical twins have the same genes, they are almost always the same sex. Fraternal twins do not have the same genes; like other siblings, they are about equally likely to be the same or opposite sex.
So we can write the relationship:
x = p_f / 2 + 0
which says that the opposite sex twins include half of the fraternal twins and none of the identical twins.
And that implies
p_f = 2 * x p_f
We can also compute the fraction of twins that are identical,
p_i = 1 - p_f p_i
In 1935 about 2/3 of twins were fraternal and 1/3 were identical.
So if we know nothing else about Elvis, the probability is about 1/3 that he was an identical twin.
But we have two pieces of information that affect our estimate of this probability:
Elvis’s twin was male, which is more likely if he was identical.
Elvis’s twin died at birth, which is also more likely if he was identical.
Step 2: Apply Bayes’s Rule
To take this information into account, we will use Bayes’s Rule:
odds(H|D) = odds(H) * likelihood_ratio(D)
That is, the posterior odds of the hypothesis
H, after seeing data
D, are the product of the prior odds of
H and the likelihood ratio of
We can use
p_f to compute the prior odds that Elvis was an identical twin.
prior_odds = p_i / p_f prior_odds
The prior odds are about
Now let’s compute the likelihood ratio of
D. The probability that twins are the same sex is nearly 100% if they are identical and about 50% if they are fraternal. So the likelihood ratio is
100 / 50 = 2.
likelihood_ratio = 2
Now we can apply Bayes’s Rule:
posterior_odds = prior_odds * likelihood_ratio posterior_odds
The posterior odds are close to 1, or, in terms of probabilities:
posterior_prob = posterior_odds / (posterior_odds + 1) posterior_prob
Taking into account that Elvis’s twin was male, the probability is close to 50% that he was identical.
Step 3: More Data, More Bayes’s Rule
Now let’s take into account the second piece of data: Elvis’s twin died at birth.
It seems likely that there are different risks for fraternal and identical twins, so I’ll define:
r_f: The probability that one twin is stillborn, given that they are fraternal.
r_i: The probability that one twin is stillborn, given that they are identical.
We can’t get those quantities directly from the table, but we can compute:
y: the probability of “1 living”, given that the twins are opposite sex.
z: the probability of “1 living”, given that the twins are the same sex.
y = (258 + 299) / opposite y
z = (655 + 564) / same z
Assuming that all opposite sex twins are fraternal, we can infer that the risk for fraternal twins is
r_f = y r_f
r_i, we can write the following relation:
z = q_i * r_i + q_f * r_f
which says that the risk for same sex twins is the weighted sum of the risks for identical and fraternal twins, with weights
q_i, the fraction of same sex twins who are identical, and
q_f, compute the fraction who are fraternal.
q_i is the posterior probability we computed in the previous update;
q_f is its complement.
q_i = posterior_prob q_f = 1 - posterior_prob
r_i, we get
r_i = (z - q_f * r_f) / q_i r_i
Now we can compute the likelihood ratio:
likelihood_ratio2 = r_i / r_f likelihood_ratio2
In this dataset, the probability that one twin dies at birth is about 19% higher if the twins are identical.
Finally, we can apply Bayes’s Rule again to compute the posterior odds after both updates:
posterior_odds2 = posterior_odds * likelihood_ratio2 posterior_odds2
Or, if you prefer probabilities:
posterior_prob2 = posterior_odds2 / (posterior_odds2 + 1) posterior_prob2
Taking into account both pieces of data, the posterior probability that Elvis was an identical twin is about 54%.
This example is from the second edition of Think Bayes, forthcoming from O’Reilly Media. The first four chapters are available now as an early release.
The code in this example is in a Jupyter notebook you can run on Colab.
I learned about this problem from Bayesian Data Analysis.
Their solution takes into account that Elvis’s twin was male, but not the additional evidence that his twin died at birth.
Jonah Spicher, who took my Bayesian Statistics class at Olin College, came up with the idea to use data from 1935 to compute the likelihood of the data.