Using cross_val_predict to get predicted values and finally getting variance values

Screen Link: https://app.dataquest.io/m/132/overfitting/5/cross-validation

My Code:

def train_and_cross_val(cols):
    lr=LinearRegression()
    features = filtered_cars[cols]
    target = filtered_cars['mpg']
    variance_values=[]
    mse_values=[]
    kf=KFold(n_splits=10,shuffle=True,random_state=3)
    mse = cross_val_score(lr,features,target,scoring='neg_mean_squared_error',cv=kf)
    predictions = cross_val_predict(lr,features,target,cv=kf)
    avg_var = np.var(predictions)
    avg_mse = np.mean(abs(mse))
    return avg_mse,avg_var

What actually happened:

I am getting correct MSE values but wrong variance values, can u please help me out to get variance values correctly by using function from sklearn.model_selection

3 Likes

Hi @nitishkumarhardworke,

This question is answered here

Best,
Sahil

Thank you for the answer. I haven’t understood though, why it doesn’t work. In one case the variances get in the list all at once while in the other case (Dataquest’s implementation) they get in the list one by one. The final list should be the same in both cases and the np.var() method should afford the same result.