# Prove 'The mean of the sample means is equal to the population mean'

This is a mission from 305-11. In the Learn section, we had an example of `X = [0, 3, 6]` and calculated the mean of the sample means equals the population mean. We were told that this holds true for any other distribution of real numbers.

While it’s intuitively true, it feels a bit muddled in my head. So I decided to try to prove it mathematically. I had fun and definitely have more clarity after this. It’s a bit long, I might not have done a good job writing it down, but I still want to share it with you guys.

Before we go into any code, imagine an extreme case, where every element in the `population` is the same number, which is inevitably the `mean` of the `population`. So all the `samples` will be the same, have the same `mean`, and of course, the mean of the sample means will be the same as the population mean.

While this is already proving our topic here, I did do a little more proving based on my solution below.

I took a different route in this mission, rather than listing all the combinations of samples like the answer, I used for loops to generate the samples. I also added variables like `iteration` to count the number of iterations in the loops below.

My Code:

``````population = [3, 7, 2]
means = []
iteration = 0
samples = []

for i in population:
for n in population:
if i != n:
iteration += 1
means.append((i+n)/2)
samples.append([i, n])

sample_mean = sum(means)/len(means)
unbiased = (sum(population)/len(population)) == sample_mean

``````

I experimented and added numbers to the `population` list. Here are my steps of proving The mean of the sample means is equal to the population mean:

dictionary:
`population`: Population the samples are from
`samples`: A list of all combination of samples from `population`
`iteration`: The number of iterations in the loops above. Also equals `len(samples)`.
`pop_len`: Population length
`sample_size`: Size of each sample. `sample_size = 2` in this mission
`means`: A list of sample means.
`element_iter_times`: The times each element in the population gets picked. It’s the same for every element.

Steps:

• Since every element in the population gets picked equal times:
`element_iter_times` equals `iteration * sample_size / pop_len`

• So `sum(means)` is equal to `sum(element_iter_times * population) / sample_size` which can also be written as `sum(population) * element_iter_times / sample_size`.

• Let’s plug in the equvalent of `element_iter_times` in the equation above:
`sum(means) == (sum(population) * (iteration * sample_size / pop_len)) / sample_size` which equals `sum(population) * iteration / pop_len`

• We already know that `len(means)` equals `iteration`

• So `sum(means) / len(means)` is equal to `(sum(population) * iteration / pop_len) / iteration`.
Viola! Here we go, `sum(means) / len(means) == sum(population) / pop_len`!

On a side note, you will find `iteration == pop_len * sample_size - pop_len` for non-replacement sampling, and `iteration == pop_len * sample_size` for replacement sampling.

I hope this an interesting read and helps fellow learners like me who find ’The mean of the sample means is equal to the population mean’ as confusing but true as it sounds.

6 Likes

Really great share, Thank you

1 Like