Blue Week Special Offer | Brighten your week!
days
hours
minutes
seconds

Permutation test questions

I dont understand the process of the permutation test. We reassign data, random number, to either group a or group b, then recalculate the mean difference to see if a mean difference of 2.52 is common or rare to decide to reject or accept the null hypothesis? but from the below code, it seems that we reassign random number to both group a and group b, I get confused.

I try to understand the code below. but I am not sure I get all… what is the all_values and what is the assignmetn_chace below? why do we need the assignment_chase to be between 0 and 1?
Screen Link: https://app.dataquest.io/m/106/significance-testing/5/permutation-test

My Code:

mean_differences = []
for i in range(1000):
    group_a = []
    group_b = []
    for value in all_values:
        assignment_chance = np.random.rand()
        if assignment_chance >= 0.5:
            group_a.append(value)
        else:
            group_b.append(value)
    iteration_mean_difference = np.mean(group_b) - np.mean(group_a)
    mean_differences.append(iteration_mean_difference)
    
plt.hist(mean_differences)

It’s specified in the content -

Since we’ll be randomizing the groups each value belongs to, we created a list named all_values that contains just the weight loss values.

It corresponds to the following instruction

Use the numpy.random.rand() function to generate a value between 0 and 1.

It is sort of simulating a probability. assignment_chance could be a value between 0 and 1, let’s say 0.65.

Then we check if that value is greater than or equal to 0.5.

if assignment_chance >= 0.5:

If it is, we assign the value to group_a. If not, it’s assigned to group_b.

The process for the permutation test is pretty simple, put all the values for group A and group B together in a hat, and then pull the values out at random, assigning them new teams. Next compare the ‘new’ teams averages to find out by how much the winning team won. Repeat this process over and over and then compare the margins of victory, or mean differences, and see if the original group B really had something special about it, or if their win was just a fluke.

The coding part gets a little more complex.

all_values is a pre-loaded variable for this slide. Its a list that combines the values from group A and group B. It is our hat, with all the values inside.

This part is the way that DQ decided to randomly pull the values from the hat and assign them to teams. There are different ways to accomplish this, but the main idea is that you want to mix up these values and randomly reassign them to teams. The way this code works is each time it pulls a value from the hat, it ‘flips a coin’ to decide which team that value gets assigned. The coin flip is generated by np.random.rand(). One interesting aspect of picking the teams this way, is that you do not end up with even numbers on each team and could even end up with an empty team. It doesnt matter much though since you are using averages and not sums. There might even be some benefit to it.

3 Likes