30-4 KNN problem - Exploring Topics in Data Science - Machine Learning with K-Nearest Neighbors


In 7.4, Exploring Topics in Data Science - Machine Learning with KNN

On page 7 we have the following sklearn example, when running on my local machine I get the following error:
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

I have confirmed that there are some NaN values and would like to know what’s going on behind the curtain - did Dataquest drop or change the NaN values and not say anything??

# The columns that we'll be using to make predictions
x_columns = ['age', 'g', 'gs', 'mp', 'fg', 'fga', 'fg.', 'x3p', 'x3pa', 'x3p.', 'x2p', 'x2pa', 'x2p.', 'efg.', 'ft', 'fta', 'ft.', 'orb', 'drb', 'trb', 'ast', 'stl', 'blk', 'tov', 'pf']
# The column we want to predict
y_column = ["pts"]

from sklearn.neighbors import KNeighborsRegressor
# Create the kNN model
knn = KNeighborsRegressor(n_neighbors=5)
# Fit the model on the training data
knn.fit(train[x_columns], train[y_column])
# Make predictions on the test set using the fit model
predictions = knn.predict(test[x_columns]) 
1 Like

Hey, Otto.

Yes, we’re running the following code snippet:

import pandas as pd
nba = pd.read_csv("nba_2013.csv")
nba.fillna(0, inplace=True)

I’ll bring this up with the team so we can discuss it. This mission wasn’t created with the intent of having students reproduce it perfectly in their local systems — we have guided projects for that. Sometimes we’ll want to hide certain details to focus on the intent of the mission.

On the other hand, I can definitely understand why this would be frustrating and I agree that in an ideal world this wouldn’t happen.

Thank you for bringing this up!

Hi Bruno,

Thanks for your prompt response, and I appreciate the information.

As I’ve commented with other members of the Dataquest team, I really like to run the code on my local machine because it reinforces learning - I find that I am able to remember the concepts more easily.