Hi ,
I have a few questions and point I’d like to validate based on the following statement in this page. I’ve completed the exercises but just had to make sure that I was on right path with regards to my understanding.
https://app.dataquest.io/c/11/m/132/overfitting/7/conclusion
While the higher order multivariate models overfit in relation to the lower order multivariate models, the insample error and outofsample didn’t deviate by much. The best model was around 50% more accurate than the simplest model. On the other hand, the overall variance increased around 25% as we increased the model complexity.

Why does it say …the higher order multivariate models overfit in relation to the lower order multivariate considering that in exercise 5 we used Cross validation and found that the model works well with testing data.
Overfitting, as I understand, is the situation where a model can predict with minimal error using the training data but fails to do the same when you use test data. Since we used test data by way of cross validation and the MSE is less, shouldn’t that be enough? I’m lost as to why we are considering the order of the multivariate. 
How did we come to the conclusion from the graph that…the best model was around 50% more accurate that the simplest model? (The graph has been attached below and is generated from exercise 6)
As I understand we just used Linear Regression with Cross Validation. I don’t see a “simple” model or a “best” model. 
Finally, can I know how DQ came up with the last statistic… the overall variance increased around 25% as we increased the model complexity.
If you look at the graph below, the blue scatter highlights variance growing from roughly 40 to about 50 while the number of variables is between 2and 7. This is just a 20% increase ((5040)*100/50). Just want to know whether this was based on an approximation or whether this was calculated in some other way.
Cheers