Estimating Bias and Variance of a model by repeated sampling

Screen Link: https://app.dataquest.io/m/132/overfitting/3/bias-variance-tradeoff

I wanted to write a code to estimate the bias and variance of a model by repeatedly sampling train data as mentioned in link: http://scott.fortmann-roe.com/docs/BiasVariance.html

I have written the steps for doing this below. Please let me know if my understanding of the concepts is correct:

  1. Resample train data 5 times to fit 5 models

  2. Using each of these 5 models predict the values for train data. Assume these to be cols P1 to P5 in dataset which will look something like this:
    image

  3. E[f_hat(x)] = mean(P1 to P5) for each row in train set

  4. Bias for each row, B = E[f_hat(x)] - Actual values in train (assuming actual value in train to be f(x) i.e. the true function)

  5. Variance for each row, var = variance(p1 to p5)

Overall Bias = mean(All B**2 values computed in 4)
Overall Variance = mean(All var values computed in 5)

Please let me know if the above steps are correct.

1 Like

That is how I understand it too. I’ve added some detail in 1st half of my reply here Machine Learning in Python: Intermediate - Course 5/8

1 Like