When to use random_state=0

When do we use random_state=0 and when do we use random_state=0?

Thank you!
Screen Link: https://app.dataquest.io/m/283/sampling/7/stratified-sampling

1 Like

We can use random_state to reproduce same output every time. Here, In context of sampling we are using pd.DataFrame.sample that return a random sample of items from data frame. (Without setting random_state Every time it will return different results or sample.)

But if we want to re-produce same output each time say for testing purpose then we take use of random_state.


For example: Using stratum_F from the mission
Without random_state

stratum_F = wnba[wnba["Pos"] == "F"]

# Without setting `random_state`
print(stratum_F.sample(2))
print(stratum_F.sample(2))

              Name Team Pos  Height  Weight        BMI Birth_Place      ...
55  Evelyn Akhator  DAL   F     191    82.0  22.477454          NG      ...
76  Kayla Pedersen  CON   F     193    86.0  23.087868          US      ...
[2 rows x 33 columns]

                Name Team Pos  Height  Weight        BMI Birth_Place    ...
17   Bashaara Graves  CHI   F     188    91.0  25.746944          US    ...
108   Ramu Tokashiki  SEA   F     193    80.0  21.477087          JP    ...
[2 rows x 33 columns]

First it return [55, 76] rows and second time [17, 108].


With random_state=123 (We can set any integer number to random_state.)

print(stratum_F.sample(2, random_state=123))
print(stratum_F.sample(2, random_state=123))

               Name Team Pos  Height  Weight        BMI Birth_Place    ...
29   Candice Dupree  IND   F     188    81.0  22.917610          US    ...
103  Nneka Ogwumike   LA   F     188    79.0  22.351743          US    ...
[2 rows x 33 columns]

               Name Team Pos  Height  Weight        BMI Birth_Place    ...
29   Candice Dupree  IND   F     188    81.0  22.917610          US    ...
103  Nneka Ogwumike   LA   F     188    79.0  22.351743          US    ...
[2 rows x 33 columns]

Both time return same [29, 103] rows.

Hope that helps! :slight_smile: @candiceliu93

1 Like

Thank you!! I see it is for testing purposes,we can return the same sample. I thought we have to do either random_state=0 or random_state=1. but you said we can set it 123. Does it mean the integer has no meaning? Thanks again!

1 Like

It does have meaning. It used to set seed for number generator algorithm. Pandas use numpy behind; here you can get more information about seed parameter from doc

Parameters: seed {None, int, array_like, BitGenerator}, optional

Random seed used to initialize the pseudo-random number generator or an instantized BitGenerator. If an integer or array, used as a seed for the MT19937 BitGenerator. Values can be any integer between 0 and 2**32 - 1 inclusive, an array (or other sequence) of such integers, or None (the default). If seed is None , then the MT19937 BitGenerator is initialized by reading data from /dev/urandom (or the Windows analogue) if available or seed from the clock otherwise.

Doc for more

It seems complicated than i imagined. I might need some time to do some research. Thank you anyway!