features = filtered_cars[cols]
target = filtered_cars['mpg']
mse = cross_val_score(lr,features,target,scoring='neg_mean_squared_error',cv=kf)
predictions = cross_val_predict(lr,features,target,cv=kf)
avg_var = np.var(predictions)
avg_mse = np.mean(abs(mse))
What actually happened:
I am getting correct MSE values but wrong variance values, can u please help me out to get variance values correctly by using function from sklearn.model_selection
This question is answered here
In your implementation of train_and_cross_val, the predictions list contains all of the predictions at once.
In Dataquest’s implementation, that array contains only the predictions for the specific fold that is being handled on each iteration. Then the mean and variance are computed for this fold and appended to mse_values and variance_values respectively.
After iterating over all folds, Dataquest’s implementation computes the mean of both mse_values and variance_values and returns this values…
Thank you for the answer. I haven’t understood though, why it doesn’t work. In one case the variances get in the list all at once while in the other case (Dataquest’s implementation) they get in the list one by one. The final list should be the same in both cases and the np.var() method should afford the same result.