Ordinary least squares estimation - what do these numbers mean?

I was able to figure out an accepted answer with help of the hint and official answer.
But I’m getting lost in the details here.
What do these numbers mean?

How should I interpret them?
How exactly are they related to the features?

Can someone lead me back to the bigger picture of what we are doing here?
Thanks!

Link to the mission step: https://app.dataquest.io/m/238/ordinary-least-squares/1/introduction

Please make to include the link to the Mission/Mission Step you are referring to so that others can help you out with appropriate context.

Sorry!
I added the link.
I thought there was some automatic link to the mission through a tag system, but apparently not…

The values in ols_estimation are the scalar matrix \textbf{a} such that

\begin{eqnarray} \textbf{a} &=& \left[\begin{matrix} a_o & a_1 & ... & a_n \end{matrix}\right] \\ &=& \left[\begin{matrix} w_o & w_1 & ... & w_n \end{matrix}\right] &=& \textbf{w} \end{eqnarray}

The scalar matrix \textbf{a} is also a weight matrix \textbf{w}.

The definition of a Linear Regression

\begin{eqnarray} y &=& b + mx &, Line\ slope\ equation \\ &=& a_0 + a_1x_1 + ... + a_nx_n &, Mathematical\ defintion \\ &=& b + w_1x_1 + ... + w_nx_n &, Machine\ Learning\ definition \\ &=& y\prime & &\\ y\prime &=& b + w_1x_1 + ... + w_nx_n \end{eqnarray}

For the Line Slope Equation, where

  • y is the value we are trying to predict
  • m is the slope of the line
  • x_1, x_2, ..., x_n is the input feature
  • b is the y-intercept

For the Machine Learning and Mathematical definition of Linear Regression, where

  • y\prime is the predicted label (a desired output)
  • w_0 or a_0 is the bias (y-intercept)
  • x_1, x_2, ..., x_n is the features (or known input)
  • w_1, w_2, ..., w_n (or a_1, a_2, ..., a_n) is the weights of n features

From the mission, given the following,

features = ['Wood Deck SF', 'Fireplaces', 'Full Bath', '1st Flr SF', 'Garage Area',
       'Gr Liv Area', 'Overall Qual']
target = "SalePrice"

The Linear Regression of our model is given as

\begin{eqnarray} y\prime &=& a_0 + a_1x_1 + a_2x_2 + a_3x_3 + a_4x_4 + a_5x_5 + a_6x_6 + a_7x_7 \\ &=& w_0 + w_1x_1 + w_2x_2 + w_3x_3 + w_4x_4 + w_5x_5 + w_6x_6 + w_7x_7 \\ \end{eqnarray}

There are 7 features. n = 7 but there are 8 (to include 1 bias - a_0 or w_0) values in matrix \textbf{a} or ols_estimation.

\begin{eqnarray} \textbf{a} &=& \left[\begin{matrix} a_0 & a_1 & a_2 & a_3 & a_4 & a_5 & a_6 & a_7 \end{matrix}\right] \\ &=& \left[\begin{matrix} w_0 & w_1 & w_2 & w_3 & w_4 & w_5 & w_6 & w_7 \end{matrix}\right] &=& \textbf{w} \end{eqnarray}

From the values of ols_estimation,

ols_estimation ndarray (<class 'numpy.ndarray'>)
array([-1.12764871e+05,  3.78815268e+01,  7.08698430e+03, -2.22197281e+03,
        4.31853639e+01,  6.48808564e+01,  3.87112549e+01,  2.45531837e+04])

The weight of each feature is given as

\begin{eqnarray} w_0 &=& -1.12764871e+05 &,& Bias \\ w_1 &=& 3.78815268e+01 &,& feature &=& Wood\ Deck\ SF \\ w_2 &=& 7.08698430e+03 &,& feature &=& Fireplaces\\ w_3 &=& -2.22197281e+03 &,& feature &=& Full\ Bath\\ w_4 &=& 4.31853639e+01 &,& feature &=& 1st\ Flr\ SF\\ w_5 &=& 6.48808564e+01 &,& feature &=& Garage\ Area\\ w_6 &=& 3.87112549e+01 &,& feature &=& Gr\ Liv\ Area\\ w_7 &=& 2.45531837e+04 &,& feature &=& Overall\ Qual \\ \end{eqnarray}

Ordinary Least Square Estimator goal is to minimizing the square differences between the observed observed dependent variable (values of the variable being observed) in the given dataset and those predicted by the linear function of the independent variable.

As a result of minimizing the error, Ordinary Least Square Estimator gives the optimal Linear Regression weights for n features. In other words, in mission example, matrix \textbf{a} or ols_estimation gives us the optimal weights for 7 features.

2 Likes

Hi @alvinctk
Thank you soooo much for your long explanation!!!
This is making it much clearer!

I had completely missed the Bias thing!

Just to be absolutely sure I’m getting this right…
The bigger picture is: we have a dataset, of which we take 1460 rows to train a model.
Then we figured out that based on the training, using the 7 features, using Ordinary Least Squares, that the most optimal way to make a prediction for SalePrice is this Linear Regression formula:

y’ = w0 + w1x1 + w2x2 + w3x3 + w4x4 + w5x5 + w6x6 + w7x7

And say we have a new record for which we want to make a SalePrice prediction with these made up numbers:

'Wood Deck SF’ = 1.1
‘Fireplaces’ = 2.2
'Full Bath’ = 3.3
'1st Flr SF’ = 4.4
'Garage Area’ = 5.5
'Gr Liv Area’ = 6.6
'Overall Qual’ = 7.7

Then y’, the predicted SalePrice for that record would be this?

y’ = -1.12764871e+05
+ (1.1 * -3.78815268e+01)
+ (2.2 * -7.08698430e+03)
+ (3.3 * -2.22197281e+03)
+ (4.4 * -4.31853639e+01)
+ (5.5 * -6.48808564e+01)
+ (6.6 * -3.87112549e+01)
+ (7.7 * -2.45531837e+04)
1 Like

Yes, you can compute the predicted value y\prime.

Yes, you are absolutely on point on answering your own question.

Hi @alvinctk,

Thank you for your explanation and confirming my reasoning! :+1: :smile:

1 Like