Hi all! I have a little confusion here. The more I learn the more confusion I have. We use strata strategy to take samples proportionally then calculated a proportional sampling means to compare with the sample means. for i in range(100) which means that we take 10 samples 100 times, correct? i means that how many times we take the sample. but what does random_state=i mean here?
To which you got the following responses (I added the highlights to point out the core of the response) -
We can use random_state to reproduce same output every time. Here, In context of sampling we are using pd.DataFrame.sample that return a random sample of items from data frame. (Without setting random_state Every time it will return different results or sample.)
But if we want to re-produce same output each time say for testing purpose then we take use of random_state .
With random_state=123 (We can set any integer number to random_state .)
@DishinGoyani also provided you with code examples on what happens with and without random_state.
Based on the above, try to think through the following -
For each iteration of your for loop, what would i be.
What exactly happens when you set random_state to a specific value.
Based on that we can continue forward with where you might be getting stuck conceptually. But first, try to think of the above two points and share your response.
For my understanding, what matters to the random_state=? is None and integer.If we use none, which means the sample we generate is different every time. If we use the same oneinterger every time we will have the same samples every time, It does not matter what integer we chose to use at the very beginning.
For example, we need to generate samples 100 times. If we do random_state=456, then in this 100 times we have to use 456 every time to generate samples. Samples will be exactly the same. we can’t use random_state=1 at the first 10 times, then userandom_state=456 at the 11th times. correct?
For each iteration of your for loop, what would i be.------i means the times we take a sample from 100?
What exactly happens when you set random_state to a specific value.----if we set a specific value every time, the sample will be exactly the same as last time, correct?
Not quite correct. You don’t have to use a specific number all throughout those 100 times. You can choose differently. It depends on what you are trying to do with those samples each time. That’s what this exercise is doing.
No. i means that we are sampling our data 100 times. We are not taking a sample from 100, we are taking samples from our data set 100 times.
Yes, that’s correct.
Now, for each iteration of our for loop, we set the random_state to be i. So, for each iteration of our loop, that is for each time we sample from our data set, we use the following -
The above gets us samples from our data. And for each value of i we will get different samples for the above for each iteration. Because the value for random_state will be different for each iteration.
If we run our code twice, for both of those, the samples chosen for, let’s say, the iteration 15th would be the same for both those runs. Because the random_state for the 15th iteration will be the same and therefore will generate the same samples.
So, if I was to take your code, and run it myself I would get the exact same results (within the DQ platform). This helps with reproducibility of results (and helps DQ with checking our work as well through their grader).
You are sampling 100 times, but you are storing the sample for each iteration in the same variable. So, after each iteration the value stored in the variable, sample_under_12, gets replaced with the new sample.
If you want to view those 100 samples, then you either print them out at each iteration or identify a way to appropriately save them at each iteration (which is what the instructions ask you to carry out).