Screen Link:
My Code:
#Mine Code
from sklearn.neighbors import KNeighborsRegressor
train_df = normalized_listings.iloc[0:2792]
test_df = normalized_listings.iloc[2792:]
knn = KNeighborsRegressor(n_neighbors=5,algorithm='brute')
knn.fit(train_df, train_df['price'])
all_features_predictions = knn.predict(test_df)
all_features_mse = mean_squared_error(test_df['price'], all_features_predictions)
all_features_rmse = all_features_mse ** (1/2)
print(all_features_mse)
print(all_features_rmse) #25 , lowest rmse
What actually happened:
671.8200682593855
25.919492052495656
What I expected to happen:
I expected “features = train_df.columns.tolist()” would enlist all of the features, but seeing from above code it differes. Since;
the above code RMSE was at 25, while the dataquest ‘suggested’ code below had an RMSE of 124.
My question is, why would you not use the above code for including all of the features? It seems the prediction was closer atleast.
# DataQuest Answer
knn = KNeighborsRegressor(n_neighbors=5, algorithm='brute')
features = train_df.columns.tolist()
features.remove('price')
knn.fit(train_df[features], train_df['price'])
all_features_predictions = knn.predict(test_df[features])
all_features_mse = mean_squared_error(test_df['price'], all_features_predictions)
all_features_rmse = all_features_mse ** (1/2)
print(all_features_mse)
print(all_features_rmse) #rmse 124
Output:
15455.275631399316
124.31924883701363