Why are values ricocheting in line plot?

Screen Link:

My Code:

def updated_knn_train_test(train_col, target_col, dataframe):   
    from sklearn.neighbors import KNeighborsRegressor
    from sklearn.metrics import mean_squared_error
    from math import sqrt
    np.random.seed(1)
    
    #Shuffle the rows of the dataframe
    shuffled_rows = np.random.permutation(dataframe.index)
    randomized_df = dataframe.reindex(shuffled_rows)
    length = int(len(randomized_df) / 2)
    train_df = randomized_df.iloc[0:length]
    test_df = randomized_df.iloc[length:]
    
    k_vals = [1,3,5,7,9]
    k_rmses = {}
    for k_vl in k_vals:
        knn = KNeighborsRegressor(n_neighbors=k_vl)
        train_features = train_df[[train_col]]
        train_target = train_df[target_col]
        knn.fit(train_features, train_target)
        predictions = knn.predict(test_df[[train_col]])
        k_mse = mean_squared_error(test_df[target_col], predictions)
        k_rmse = sqrt(k_mse)
        k_rmses[k_vl] = k_rmse
    return k_rmses

#Use function above to calculate rmses. First drop price from
#the training dataset since it is our target.
train_col = normalized_cars_to_numeric.columns.drop('price')
#calc rmses for all train columns
rmses = {}
for col in train_col:
    rmse_val = updated_knn_train_test(col, 'price', normalized_cars_to_numeric)
    rmses[col] = rmse_val
#rmses


import matplotlib.pyplot as plt
%matplotlib inline

for a, b in rmses.items():
    x = list(b.keys())
    y = list(b.values())
    
    plt.plot(x, y)
    plt.ylabel("RMSE (Price, USD)")
    plt.xlabel("k-value, number of similar prices")
    plt.show()

What I expected to happen:
Single graph of rmse line plots.

What actually happened:
Multipe rmse plots. The line plot values are rebounding, ricocheting in the plots.

No error message. Why are the line plots ricocheting, rebounding back and on separate plots? Is something out of order in my code?
Predicting+Car+Prices (1).ipynb (295.4 KB)

Click here to view the jupyter notebook file in a new tab

1 Like

Hi!

Your plot statement is inside the loop so it will plot over and over until the items inside the dictionary ends. I actually don’t think that you need that loop to plot the values but i might be wrong. I personally would use a list to save the rmse values and use your k_vals to plot them

Anyway, good luck!

2 Likes