I am new to scikit-learn and machine learning and only just started with the data scientist path. I usually work on the DataQuest screen as well as on a local jupyter notebook. I’m getting different predictions and mse / rmse values and just wanted to confirm this is to be expected and not something I have done wrong?
DataQuest - first few predictions:
My first few predictions:
DataQuest MSE and RMSE
My MSE and RMSE
The code I use to validate DataQuest and in my local environment are exactly the same.
If differences are to be expected, is there any way in which I could set a random seed to avoid confusion moving forward?
I also noticed that the first predictions are run using the default metric ‘minkowski’, whereas in the MSE / RMSE calculation screen, the code switches to ‘euclidean’ metric. Could anyone explain the difference?
train_df = normalised_listings.iloc[0:2792].copy() test_df = normalised_listings.iloc[2792:].copy() knn = KNeighborsRegressor(algorithm="brute") train_features = train_df[["accommodates", "bathrooms"]] # training data - feature columns train_target = train_df["price"] # training data - target column knn.fit(train_features, train_target) predictions = knn.predict(test_df[['accommodates', 'bathrooms']]) from sklearn.metrics import mean_squared_error two_features_mse = mean_squared_error(test_df["price"], predictions) two_features_rmse = np.sqrt(two_features_mse) print(two_features_mse, two_features_rmse)
Thank you for your help!