Multivariate KNN not selecting all features to test m/140

Screen Link:

My Code:

#Mine Code 
from sklearn.neighbors import KNeighborsRegressor

train_df = normalized_listings.iloc[0:2792] 
test_df = normalized_listings.iloc[2792:]

knn = KNeighborsRegressor(n_neighbors=5,algorithm='brute')
knn.fit(train_df, train_df['price'])

all_features_predictions = knn.predict(test_df)
all_features_mse = mean_squared_error(test_df['price'], all_features_predictions)
all_features_rmse = all_features_mse ** (1/2)

print(all_features_mse)
print(all_features_rmse) #25 , lowest rmse 

What actually happened:

671.8200682593855
25.919492052495656

What I expected to happen:
I expected “features = train_df.columns.tolist()” would enlist all of the features, but seeing from above code it differes. Since;
the above code RMSE was at 25, while the dataquest ‘suggested’ code below had an RMSE of 124.
My question is, why would you not use the above code for including all of the features? It seems the prediction was closer atleast.

# DataQuest Answer
knn = KNeighborsRegressor(n_neighbors=5, algorithm='brute')
features = train_df.columns.tolist()
features.remove('price')

knn.fit(train_df[features], train_df['price'])
all_features_predictions = knn.predict(test_df[features])
all_features_mse = mean_squared_error(test_df['price'], all_features_predictions)
all_features_rmse = all_features_mse ** (1/2)

print(all_features_mse)
print(all_features_rmse) #rmse 124

Output:

15455.275631399316
124.31924883701363

It looks like you are including the target column (‘Price’) to train your model, which makes it overfit (as it knows exactly what it’s predicting).

Notice that the DQ code includes the line:
features.remove(‘price’)
and then fits the model using the line:
knn.fit(train_df[features], train_df[‘price’])

which lets the model be fitted on all features BESIDES the price.

Basically, when it says fit your model on ALL features, it means all features BESIDES the price.

Dropping the price from your feature set should resolve your issue.

1 Like

aha yes, I must have forgotten in removing price. Makes alot more sense now. Thanks!