I have a question concerning model variance, which keeps me puzzled. How should this variance be properly calculated? Throughout DQ courses the variance of predictions was used, but is it correct? Why don’t we compare error metrics between train and test datasets to approach variance - i.e., the larger the difference between the two the more overfit is present? Or probably there are some other measures, which might be better and comparable with bias (i.e., underfit)?

Hi @erzinrost nice to meet you.

Statisticians use variance to see how individual numbers in a data set relate to each other, rather than using broader mathematical techniques such as ordering numbers into quartiles. To do. The advantage of variance is that all deviations from the mean are treated the same regardless of direction. The sum of the squared deviations is never 0, and the data appears to have no variability at all.

However, the disadvantage of variance is that it puts extra weight on outliers. These are far from average numbers. Squaring these numbers can skew the data. Another pitfall of using variances is that they are not easy to interpret. Users mainly use this to get the square root of a value which gives the standard deviation of the data. As mentioned earlier, investors can use the standard deviation to assess consistent returns over time.