Dear all,

In the mission about overfitting, we learn that:
error = bias² + variance and this is demonstrated running several linear regressions where we measure the variance of the of all the data points predicted. [e.g. np.var(linear_model.predict(features)) ]

While the demonstration works, it contradicts the blog post in the mission reference ([http://scott.fortmann-roe.com/docs/BiasVariance.html]):

Error due to Variance : The error due to variance is taken as the variability of a model prediction for a given data point. Again, imagine you can repeat the entire model building process multiple times. The variance is how much the predictions for a given point vary between different realizations of the model.

I’m confused. Maybe the underlying math work the same but for me, those two explanations don’t match with each other.

Maxime

Just to illustrate, I tried the same principle used in DQ mission on another data set (iris.csv) and once again, while still being apparently ‘contradictory’ with the blog post, it works: the variance of all the predicted values increases with the complexity of the model.

I have the same question – so glad to hear someone else asking it, although a bummer not to have gotten an answer.

It seems to me (from outside reading and in agreement with the blogpost you mentioned) that variance should refer to the variance in predictions from the same model trained on different training sets. Its a measure of how much the model gets overfit to the random noise in whatever training set is picked.

Instead the dq mission focuses on the variance in predictions for different datapoints within ONE training set. Isn’t that variance just a reflection of the underlying variance in the data itself, with nothing to do with the model?

Not sure what we’re missing – would really appreciate an answer if anyone knows!

Hello,

I have the same question. Can someone explain this, please?

Hi,