I'm lost again, overfit and underfit definition

Hi guys, i thought that i had understood this but apparently i didn’t

Overfit: According to my notes, overfit happens when training error > test error

However according to this an overfit model will have extremely low training error but a high testing error

I don’t even know wich one is true anymore, i’ve been reading a lot about it but i’m definitely lost, the only thing i’m sure (maybe not much) it’s that a model is underfit when both training and test have really big values.

Thanks for your help guys!


Img refe: https://images.app.goo.gl/VWxvCQUBFTpX1dJy5

  • Overfitting - Overfitting when the model only good at predicting training dataset (Means low training error) but not good at predicting validation dataset (Means High validation error).

  • Underfitting - Underfitting when the model is not good at predicting both training dataset and validation dataset (Means High training error and High validation error).

1 Like

There is, indeed, somewhat confusing interpretations around this the more resources you check out.

Broadly speaking -

  • Underfitting: As you said, it’s when both the training and validation/test errors are high.
  • Overfitting: When the training error is low, but the test/validation error is larger than the training error.

There are some nuances to this at times.

  • Generally, there will be a small difference between them when the model is trained well (good generalization capability).

  • If both of them are decreasing, that’s fine too

  • If the validation error starts to stagnate or plateau while training error decreases, then it might indicate a sign of overfitting

  • If the validation error starts to increase, then definitely overfitting.

The last point is what’s considered to be the deciding factor as per quite a few experts. If the validation error is increasing then the model is overfitting.

So, the cases that arise from this -

  1. Validation error > > > Training error; Validation error is increasing
  2. Validation error > Training error; both decrease
  3. Training error > Validation error

1 indicates overfitting. 2 indicates everything is fine.

3, when Training Error is higher than Validation Error. This is not a sign of overfitting. And this is also where many people seem to get it wrong.

This can happen because of several reasons -

  • Regularization techniques are usually only applied at the time of training and not at the time of validation/testing. This affects the final error values
  • This is likely the cuplrit when such a pattern appears - the distribution of the validation set/test set is different than that of the training set. Maybe the validation set is too small, if it wasn’t sampled properly, or if there is data leakage etc.

Some resources I recommend that help take away this confusion, and what I also based my answer on -


@the_doctor So that’s why we have to do a cross validation with every model, so we get different errors to plot and be able to analyze well the values and conclude if the model is overfitted or underfitted, right?

Thank you so much for that explanation i read it over and over and now it’s more clearer :smiley:

Edit: I’m not sure if i’m right. I’m training a decission tree regressor and we all know that this model tends to get overfitted, i used a for loop to change the value of the min_samples_leaf parameter and got this values and plot as a result

It is correct to say that with 9 samples the model works well because the difference between train and test is just 10 units, and that it becomes overfitted at 10 samples? Or this model is just overfitted since the beginning?


Thank you for asking this question and I appreciate @the_doctor reply.
I am not sure if I am right but I would try to make a trade off between these three metrics:

  • Errors (lower better)
  • The difference between training and validation errors (lower better)
  • R square (high but not too high)

I would not choose 9 because the training error is very high. I would choose 5 or 6 since their validation errors are almost the same and the rate of decreasing the difference between errors gets lower. I would like to look at r2 as well to choose the best min_samples_leaf.

I have used this analogy in my project. I appreciate it if you can look at it and let me know your idea.