238-1 Ordinary Least Squares, Screen 1


I am a bit confused about the example given for the matrix representation of the linear regression model. As stated, the matrix representation is Xa = \hat{y} and the example given shows the X matrix with m rows and n+1 columns (where the first column is all 1s). My first question is: are there m linear regression equations we are working with? If so, is that coming from using an m-fold technique?

Next, the vector a is shown as a row vector, but then the matrix multiplication would not work out properly. If a is a row vector, rather than a column vector, we would just end up with the coefficients a_0, a_1, \ldots, a_n.

If a is a column vector, that would make more sense to me. In this case we would end up with a_0 + 99a_1 + 50a_2 + \ldots + 50a_n = 100 for the first product, and so on. At least, that would make sense if there were m-equations we are working with. So my second question is: should the matrix representation be Xa^T = \hat{y}?


Hey, Scott.

Yes! Basically, m represents the number of rows in the dataset, and n represents the number of features from the dataset that we will use in the model.

Sorry, I don’t understand what this means. Can you rephrase it?

I’m not sure what you mean. If a is a row vector, then the product simply doesn’t exist, we don’t up with anything. It’s like stating we end up with something when we divide by zero, we don’t, it’s meaningless. But you seem to understand that there’s something wrong with what’s on that screen, and you’re right.

Yes! You got it.