How is the cost function in matrix form derived?

Screen Link:

J ( a ) = 1/n ( X a − y ) T ( X a − y )

Not able to understand how this Cost function is derived from the previous value of y i.e y = Xa − ϵ , from here we can get the Mean Squared Error as ϵ = Xa - y but how the Cost function( the mean squared error here) is derived from this? Please, anyone, clarify this. Thanks.

1 Like

Roughly speaking,

\epsilon = Xa - y

The above is already provided to us.

The mean squared error is given by, by definition -

\frac{1}{n} \sum(\hat{y} − y)^2

We know that

\epsilon = \hat{y} − y

So, the mean squared error is -


And from the equation at the top, that would give us -

\frac{1}{n} \sum(Xa - y)^2

Now, X is a matrix, and a and y are vectors. So, Xa - y is going to be a vector as well.

Consider an example vector -

m = \begin{bmatrix} 1\\2\\3 \end{bmatrix}

What is the dot product of m with itself?

m \cdot m = 1*1 + 2*2 + 3*3 = 14

Do you notice a pattern here? What if we did the following -


We would get -

1^2 + 2^2 + 3^2 = 14

They are the same operations. There is one more way to represent a dot product -


Where, m^T is the transpose of m. So, we can say -

\sum(m)^2 = m^Tm

Coming back to our equation \rightarrow \frac{1}{n} \sum(Xa - y)^2, we can rewrite this to -

\frac{1}{n} (Xa - y)^T(Xa - y)


Wow, what a lucid and great explanation, thanks a lot doctor :slight_smile: Cheers !!
Just a small suggestion- If you include this in the Lesson, will really help many learners, thanks.

1 Like

Glad I could help!

I don’t work for dataquest, but you can provide feedback to them using the Contact Us button in the top-right of this page.

Also, please note that I modified your question title a little bit so that it’s helpful for any students who come across it in the future.