For the first time, following the Data Analysis path at Dataquest, I had to consider an outside source to consolidate new concepts.
It might be the case that learning hypothesis testing is indeed challenging, but the fact is that I finished mission 106 feeling a little confusing.
I then tried some outside sources (mainly Khan Acad) and also went through mission #106 again, and I think now I feel more comfortable to share the below comments & doubts:
Everywhere I checked about hypothesis testing, they state that when the p-value is significant (larger than the threshold) the outcome is that you just cannot reject the null hypothesis (and might keep going in putting it to test to new alternative hypothesis), which is different from accepting that the null hypothesis is true (and the latter is clearly stated in 5/10 & 8/10 within this mission).
It might be the case that strictly speaking, in theory, one “cannot reject” the null hypothesis in such a case, and that in the real world they just take it as if the null hypothesis is true (something like “feet on street” wisdom from experienced DA/DS professionals). If that is the case, OK, but I suggest this to be clearer in the mission. Other than that, I think concepts should be corrected in 5/10 & 8/10.
When the permutation test is put to work on 5/10 I just did not get why weights from the initial groups A and B are joined in the all_values list so as to start randomnly sampling it to build the sampling distribution. Wouldn’t it make more sense to keep groups A and B separated (each with 50 samples) and sample from each group to build the sampling distribution on top of each mean difference computed over separated A and B randomized groups?
If you’re mixing up volunteers from both groups and sampling from this joined group for the permutation test, each sample will probably include people who took the pill as well as people who took the placebo and it seems you’re losing the control group at this stage of the process…I just don’t get it.
By the way, I tried to do things this way (randomly sampling with 10 and 5 weights from each separated group of 50) and got a totally different outcome with a very significant p-value (which would in turn not reject the null hypothesis).
In time, I’m very happy with Dataquest but found interesting to add up my comments/doubts.