Just want to validate what I've understood

Hi ,
I have a few questions and point I’d like to validate based on the following statement in this page. I’ve completed the exercises but just had to make sure that I was on right path with regards to my understanding.

https://app.dataquest.io/c/11/m/132/overfitting/7/conclusion

While the higher order multivariate models overfit in relation to the lower order multivariate models, the in-sample error and out-of-sample didn’t deviate by much. The best model was around 50% more accurate than the simplest model. On the other hand, the overall variance increased around 25% as we increased the model complexity.

  • Why does it say …the higher order multivariate models overfit in relation to the lower order multivariate considering that in exercise 5 we used Cross validation and found that the model works well with testing data.
    Overfitting, as I understand, is the situation where a model can predict with minimal error using the training data but fails to do the same when you use test data. Since we used test data by way of cross validation and the MSE is less, shouldn’t that be enough? I’m lost as to why we are considering the order of the multivariate.

  • How did we come to the conclusion from the graph that…the best model was around 50% more accurate that the simplest model? (The graph has been attached below and is generated from exercise 6)
    As I understand we just used Linear Regression with Cross Validation. I don’t see a “simple” model or a “best” model.

  • Finally, can I know how DQ came up with the last statistic… the overall variance increased around 25% as we increased the model complexity.
    If you look at the graph below, the blue scatter highlights variance growing from roughly 40 to about 50 while the number of variables is between 2and 7. This is just a 20% increase ((50-40)*100/50). Just want to know whether this was based on an approximation or whether this was calculated in some other way.

image

Cheers

I’m just replying to this so that it comes to the top of the discussions. Please let me know if there is a lack of clarity in the question. I’ve read it a couple of times and it seems to be clear to me.

Thanks