Hi all,
I’m doing the guided project on Predicting Car Prices, but I’ve got a weird result! So probably I’ve done something wrong, and I don’t know what is it!
Here you can take a look at the whole notebook file:
Predicting Car Prices.ipynb (278.3 KB)
My problem is at the last section, 3.3) K-fold validation model. There, you’ll find that the RMSE values for the k-fold model grow as the number of neighbors increase, which is not what I expect at all!
Here’s the exact piece of code:
# Apply a k-fold validation model over a 1-25 range of k-values and append its average RMSE value to a list
kfold_avg_rmses = []
for k in range(1, 26):
kf = KFold(10, shuffle=True, random_state=1)
knn = KNeighborsRegressor(n_neighbors = k)
mses = cross_val_score(knn, clean_cars[all_columns], clean_cars["price"], scoring="neg_mean_squared_error", cv=kf)
rmses = abs(mses) ** (1/2)
avg_rmse = np.mean(rmses)
kfold_avg_rmses.append(avg_rmse)
# Show the results using a scatter plot
plt.figure(figsize=(14, 6))
plt.style.use("fivethirtyeight")
plt.scatter(range(1, 26), kfold_avg_rmses)
plt.scatter(range(1, 26), all_columns_rmses)
plt.title("RMSE values for a k-fold and a train-test models using all predictors")
plt.ylabel("RMSE")
plt.xlabel("k values")
plt.xticks(rotation = 45)
plt.legend(["K-fold", "Train-test"])
plt.show()
I would highly appreciate if someone could shed some light on this issue, please!
Thanks!
Click here to view the jupyter notebook file in a new tab