Screen Link: https://app.dataquest.io/m/132/overfitting/5/cross-validation
Your Code: Enclose your code in 3 backticks like this
to format properly:
from sklearn.model_selection import KFold,cross_val_score,cross_val_predict
from sklearn.metrics import mean_squared_error
import numpy as np
def train_and_cross_val(cols):
features = cols
target = 'mpg'
kf = KFold(n_splits=10, shuffle=True, random_state=3)
lr = LinearRegression()
cv_score = np.absolute(cross_val_score(lr,filtered_cars[features],filtered_cars[target],
scoring='neg_mean_squared_error',cv=kf))
predictions = cross_val_predict(lr,filtered_cars[features],filtered_cars[target],
cv=kf)
avg_mse = np.mean(cv_score)
avg_var = np.var(predictions)
#print(predictions)
return (avg_mse,avg_var)
two_mse, two_var = train_and_cross_val(["cylinders","displacement"])
three_mse, three_var = train_and_cross_val(["cylinders","displacement","horsepower"])
four_mse, four_var = train_and_cross_val(["cylinders","displacement","horsepower","weight"])
five_mse, five_var = train_and_cross_val(["cylinders","displacement","horsepower","weight","acceleration"])
six_mse, six_var = train_and_cross_val(["cylinders","displacement","horsepower","weight","acceleration","model year"])
seven_mse, seven_var = train_and_cross_val(["cylinders","displacement","horsepower","weight","acceleration","model year","origin"])
What I expected to happen:
here I am using “cross_val_predict” predict function, unlike the answer that is given by DQ
What actually happened: The returned result of var doest not match with the DQ answer though the mse is matching, why is this happening ?
to format properly
Other details: