Screen Link:

My Code:

```
from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(train[['Gr Liv Area']], train['SalePrice'])
print(lr.coef_)
print(lr.intercept_)
a0 = lr.intercept_
a1 = lr.coef_
```

What I expected to happen:

I don’t understand the lr.fit thing. Also why do we have two brakets after the first train and not the second. I get an error if I don’t include the 2 brackets in the beginning

What actually happened:

```
Replace this line with the output/error
```

1 Like

Hello,

The `fit`

method is where you train your model so it can work with new data and make predictions. You’ll find this method in any machine learning algorithm you choose to create your model. Of course, what `fit`

does behind the hood changes from algorithm to algorithm. The process in a Linear Regression is completely different from a Decision Tree, for instance, but the result is that your model will then be ready to make predictions.

Also, no matter the algorithm you’re using, `fit`

will always receive two arguments: a DataFrame containing the dependent variables and a Series containing the labels you want to train your model to predict (the target variable).

When you select a column from a DataFrame using `df['col']`

the outcome is a Series, as you need on the second argument of `fit`

. But as mentioned, in the first argument you need a DataFrame, not a Series. Therefore, you need to use double brackets to have a DataFrame as the outcome.

In this case, as you’re using only one dependent variable, it may seem like it doesn’t make any difference whether you use `df['col']`

or `df[[col]]`

, but that’s not true. Series are unidimensional objects, which means if you run `train['Gr Liv Area'].shape`

the output would be `(n, )`

where `n`

is the number of rows. There’s only `n`

in the tuple because it’s the only dimension in a Series. But if you run `train[['Gr Liv Area']].shape`

the output would be `(n, 1)`

where 1 is the number of columns in the resulting DataFrame. You’ll see two numbers in the tuple because the DataFrame has two dimensions, and the first argument of `fit`

expects a bidimensional object.

I hope this helps you.

2 Likes