from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(train[['Gr Liv Area']], train['SalePrice'])
a0 = lr.intercept_
a1 = lr.coef_
What I expected to happen:
I don’t understand the lr.fit thing. Also why do we have two brakets after the first train and not the second. I get an error if I don’t include the 2 brackets in the beginning
What actually happened:
Replace this line with the output/error
fit method is where you train your model so it can work with new data and make predictions. You’ll find this method in any machine learning algorithm you choose to create your model. Of course, what
fit does behind the hood changes from algorithm to algorithm. The process in a Linear Regression is completely different from a Decision Tree, for instance, but the result is that your model will then be ready to make predictions.
Also, no matter the algorithm you’re using,
fit will always receive two arguments: a DataFrame containing the dependent variables and a Series containing the labels you want to train your model to predict (the target variable).
When you select a column from a DataFrame using
df['col'] the outcome is a Series, as you need on the second argument of
fit. But as mentioned, in the first argument you need a DataFrame, not a Series. Therefore, you need to use double brackets to have a DataFrame as the outcome.
In this case, as you’re using only one dependent variable, it may seem like it doesn’t make any difference whether you use
df[[col]], but that’s not true. Series are unidimensional objects, which means if you run
train['Gr Liv Area'].shape the output would be
(n, ) where
n is the number of rows. There’s only
n in the tuple because it’s the only dimension in a Series. But if you run
train[['Gr Liv Area']].shape the output would be
(n, 1) where 1 is the number of columns in the resulting DataFrame. You’ll see two numbers in the tuple because the DataFrame has two dimensions, and the first argument of
fit expects a bidimensional object.
I hope this helps you.