Roughly speaking,
\epsilon = Xa - y
The above is already provided to us.
The mean squared error is given by, by definition -
\frac{1}{n} \sum(\hat{y} − y)^2
We know that
\epsilon = \hat{y} − y
So, the mean squared error is -
\frac{1}{n}\sum\epsilon^2
And from the equation at the top, that would give us -
\frac{1}{n} \sum(Xa - y)^2
Now, X is a matrix, and a and y are vectors. So, Xa - y is going to be a vector as well.
Consider an example vector -
m =
\begin{bmatrix}
1\\2\\3 \end{bmatrix}
What is the dot product of m with itself?
m \cdot m = 1*1 + 2*2 + 3*3 = 14
Do you notice a pattern here? What if we did the following -
\sum(m)^2
We would get -
1^2 + 2^2 + 3^2 = 14
They are the same operations. There is one more way to represent a dot product -
m^Tm
Where, m^T is the transpose of m. So, we can say -
\sum(m)^2 = m^Tm
Coming back to our equation \rightarrow \frac{1}{n} \sum(Xa - y)^2, we can rewrite this to -
\frac{1}{n} (Xa - y)^T(Xa - y)