Cross validation 154-2

Hey there fellow learners!

Just a quick question here.

Screen Link:
https://app.dataquest.io/m/154/cross-validation/2/holdout-validation

My Code:

from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_squared_error

train_one = split_one
test_one = split_two
train_two = split_two
test_two = split_one

knn = KNeighborsRegressor()
knn.fit(train_one[['accommodates']],train_one['price'])
prediction_one = knn.predict(test_one[['accommodates']])
msq_one = mean_squared_error(test_one['price'],prediction_one)
iteration_one_rmse = msq_one**0.5

knn_two = KNeighborsRegressor()
knn_two.fit(train_two[['accommodates']],train_two['price'])
prediction_two = knn_two.predict(test_two[['accommodates']])
msq_two = mean_squared_error(test_two['price'],prediction_two)
iteration_two_rmse = msq_two**0.5

avg_rmse = np.mean([iteration_two_rmse,iteration_one_rmse])

What I expected to happen:

avg_rmse = 128.96254732948216

What actually happened:

avg_rmse = 123.7207888486061

So I seem to be slightly off here. I checked it with the answer and what happened is that I should not have started knn_two. This seems to be a bit counter intuitive to me, because the original knn already has the test_two values in it right? So it would unfairly improve its learning capabilities. Therefore I started a new knn.

I am a bit confused why I should not start another knn.

Cheers!

1 Like

Check the red text above. It happens all the time, a slight copy-paste error can lead to wrong results :smiley:

Oops @fedepereira!

I have corrected the mistake, it’s not like I was stuck on the code, it was more a question about the difference. I reversed engineered my mistake ;).

But I guess training the model twice should be better? Even though the rmse was off more.

Hi @DavidMiedema,
sorry for my delay here, was busy these days!
I just did it like you, i.e. 2 different models and training each of them separately. I get the pass without problems. If you replace that knn with knn_two you should get a pass as well.

I think both answers are valid since the model hasn’t any memory. That’s why both answers give the same result. I just tried by re-using the same model instance in both steps, running the fit and predict methods and I get a pass.