Confused by P Value Page

https://app.dataquest.io/c/6/m/106/significance-testing/8/p-value

Hello,

I was a little confused on the P-Value page. I left a link at the top. I didn’t understand this part

" This probability is called the p value . If this value is high, it means that the difference in the amount of weight both groups lost could have easily happened randomly and the weight loss pills probably didn’t play a role. On the other hand, a low p value implies that there’s an incredibly small probability that the mean difference we observed was because of random chance"

If group B took the weight lost pill and they lost more weight than group A who to took the placebo wouldn’t that be a higher P value, and it was not by random chance and the pills played a role?

It’s important to understand how we get the p-value.

We run a permutation test where we randomly assign values to either of the groups and then for each iteration we calculate the test statistic on these groups.

Our test statistic was 2.52 from the groups that we had from our actual experiment (where we get the data from).

The test statistic itself might tell us that on average group B lost more weight than group A and since group B used those pills that could mean that those pills had an impact on the weight loss.

But, we don’t know if it was the pills themselves that helped or if it was random chance. That’s why we do the permutation test.

The sampling distribution we get is from when we randomize the values in the two groups. So, we can’t say which value in group A corresponds to the placebo or the pill.

Now, in our sampling distribution we try and compare all the values we get to our original test statistic value 2.52. If 2.52 is repeating several times in our distribution then that means, even when we randomized the values in the groups we get 2.52 quite often. Of course, “quite often” is not really a good mathematical/technical way to put it, so we calculate the probability.

That probability is the p-value.

If the p-value is high, that means 2.52 appears often in our sampling distribution. And if it appears often, that means that even when we randomly sample our values for our groups we get 2.52 often and that could indicate that our original test statistic value on our actual data could have happened because of random chance. Because, otherwise, why would the probability be high even when we randomly sampled?

And if it’s low, that means there is a low probability that the test statistic we got was because of random chance.

Let me know if that helped make it clear or not.

1 Like

Hello,

The concept of p-value is really important to understand clearly, and a lot of people are confused about it (even some professional statisticians!). I think what @the_doctor said is great, and I will just give some technical details.

When you compute a test statistic, this number comes from a random variable. Actually, a statistic is just a random variable. If you are not familiar with the field of probability yet and do not understand clearly what is a random variable, think of it as a box which produces numbers randomly (for example, a die can be seen as a random variable that can produce integers between 1 and 6).

In your example, you have a test statistic (let’s call it T) which corresponds to the mean difference of the two groups, and you are asking yourself the following question: given the assumption that “participants who consumed the weight loss pills lost the same amount of weight as those who didn’t take the pill”, what is the probability of observing the event |T| \geq 2.52, i.e. how likely it is to observe this result (or even worse) under the assumption we made before ? This assumption is called the null-hypothesis, and let’s denote it as H_0. In terms of conditional probabilities, we have \text{p-value} = \mathbb{P}_{H_0}(|T| \geq 2.52).

To calculate this p-value, we need to know the probability distribution of the statistic T under the null-hypothesis H_0. Since we cannot directly get this distribution by rigorous calculus (because we do not have enough information in our example), we need to estimate this distribution with a histogram.

The null-hypothesis H_0 can be reformulated as follows: “the pills have no effect on weight loss”. Hence, we can estimate the distribution of T under H_0 by randomly splitting our data values into two groups (and repeating this experiment a certain number of times to get a sampling distribution); indeed, when doing so, you are likely to observe no difference in weight loss between the two groups you created, because in each of your groups, you do not know who took the pills and who did not. In other words:

  • if you know that the pills have no effect on weight loss, you will not observe any significant difference between the two groups you formed,
  • if you know that the pills actually have an effect on weight loss, you will neither observe any significant difference between the two groups because you created them randomly, so in each group there should be the same proportion of people belonging to group A and group B, which will have a compensation effect.

Once we do that, we can get a histogram: it is an estimation of the probability distribution of T under the null-hypothesis H_0. We can then read the (estimated) probability of |T| being greater than 2.52, which corresponds to the p-value.

If this value is “small”, it means that under the null-hypothesis H_0, it is unlikely to observe that |T| \geq 2.52. In other words, under the assumption that the pills have no effect on weight loss, it is unlikely to observe this event, and that is why we reject the null-hypothesis and conclude that there is some evidence that the weight loss pill does affect the amount of weight people lost.

If this value is “big”, it means that under the null-hypothesis, it is quite likely to observe that |T| \geq 2.52, and that is why we fail to reject the null hypothesis that there’s no difference in the mean amount of weight lost by participants in both groups, and conclude that the weight loss pill doesn’t seem to be effective in helping people lose weight.

Here, the terms “small” and “big” depend on a threshold that you have set for yourself before starting the experiment.

Also, it is important to understand that even if your p-value is small (i.e. under your threshold), it does not necessarily mean that the null-hypothesis is false. Do you see why? Because the p-value is a probability calculated under the assumption that the null-hypothesis is true. And one could ask “okay, but what if the null-hypothesis is in reality false?”. So be careful about what you conclude when using p-value! :wink:

I hope it makes sense to you!

1 Like

This is very complicate subject that impossible describe one short module
See these books:

  1. FUNDAMENTALS OF MATHEMATICAL STATISTICS ISBN 81-7014-791-3 Tenth Revised Ec;lition : August 2000
    CHAPTER SIXTEEN Statistical Infer~nce-II Analisis ( Testing of Hypothesis, Non-parametric Methods and Sequential)
    or
  2. BUSINESS ANALYTICS AND STATISTICS FIRST EDITION - Authorised adaptation of Australasian Business Statistics, 4th edn (ISBN 9780730312932), published by John Wiley & Sons, Brisbane Australia. © 2010, 2013, 2016. All rights reserved.
    CHAPTER 9 Statistical inference: hypothesis testing for single populations

these e-books you can find in the public Intenet.

Thank you so much! This was incredibly helpful and now I have a stronger understanding of this.

1 Like

Thank you so much! I will look into these books

Thank you so much! This was very helpful. I appreciate the long and thorough explanation.

These other folks did a good job at giving a detailed explaination of P value, but Im going to try to give a dumbed down version.

For this scenario we have ONE group of people that have lost weight in a given time period, this is our sample population. We could divide this group by hair color, shoe size, favorite color, etc and we would probably see a difference of average weight loss between those groups. But since those variables arent likely to be related to weight loss, we would expect the differences in weight loss between blondes and brunettes to fall within some sort of “normal” range. That is to say that we would expect to see a difference in weight loss between groups any random way that we choose to select the groups, however, if that difference was too large we might suspect that the groups werent chosen randomly.

P value is a measure that tells us the likelihood that we would see a given difference in weight loss “randomly”. Or to put it another way, it tells us how “special” the way we chose the groups is. If the diet pill is effective, we would expect that group to be very special, and have low likelihood of occurring due to random chance, and a very low P value. If the diet pill is a placebo then we would expect the difference between the groups to be similar to the difference between cat people vs dog people.

A good way of thinking about what a P value represents is thinking of a 1000X coin toss. We know that a coin toss has a 50/50 chance, but if you flip it 1000X and get heads 600 times you probably wouldnt be concerned. 700 times probably no big deal. But the odds of getting heads 999+ times out of 1000 are so low, that you could be pretty confident that it is not the results of random chance alone. Tossing heads 998 times is equally unlikely. So the lower the odds of any particular permutation occurring by random chance , (even though technically possible!) the lower the P value, and the more likely that there is something “special” or “non-random” about that group.

Hopefully this helps.

p-value this is a spherical horse in the ideal vacuum. :grinning: when we have ideal symmetric parametric statistics with gaussian dispersion.