Stats - Sampling - 4. Sampling Error: I disagree with the formula?

In the section it says the equation for sampling error is:
sampling error = parameter(population) - statistic(sample)
I took stats in college, also recently took an online stats course, I was taught the equation:
sampling error = statistic(sample) - parameter(population)

Can someone help me understand what’s the correct formula? Thank you in advance :blush:

Hi @yang

Welcome to our Dataquest Community.

Correct formula for sampling error is the first one. i.e

sampling error = parameter(population) - statistic(sample)

You can also think that here parameter is a value from the population(actual value) and statistic is the value from the sample(expected value). So, an error can only be calculated from actual minus expected.

If you have any documentation or video where its reverse is written. Put its link then we will see.

Hope this help :slightly_smiling_face:

Thanks for the reply! I couldn’t find reputable sources that site the formula, but a quick google search does give me these (see following links)

Both are teaching the formula of sampling error as statistic(sample) - parameter(population)

Actually, I believe in Dataquest content. But I will tag @joshdq and @Bruno, they are Data Scientist. They help us.

Sir, can you help us with this problem.

The formula statistic - parameter is indeed preferred because it shows both the difference and the direction of the difference.

If parameter = 5 and statistic = 3, then the sample statistic underestimates the population parameter. The underestimation is shown by the - sign:

statistic - parameter = 3 - 5 = -2

parameter - statistic only shows accurately the absolute difference, but not the direction:

parameter - statistic = 5 - 3 = +2

In our first stats mission, the formula parameter - statistic is mostly introduced to give an alternative understanding for what a sampling error is. In practice, we generally don’t use neither formula because the population parameter is unknown (sampling error is usually measured using the standard error). However, thanks for bringing this up, @yang, we’ll try to make an adjustment to our content to avoid any confusion in the future.


the main problem with this taks is not the order of difference calculation but the actual calculation of metric.

in the description rightfully says you need to take the mean but in the task it takes the max(). that has no real sense at all.
the chance to have the max value in the sample is 1/N*n_sample by common sense. so comparing the max values give no information about sample error whatsoever.