Matplotlib Line Plot Doubling Back on Machine Learning Guided Project- Help Debugging?

def knn_train_test(train, target, df): 
    
    np.random.seed(1)
    shuffled_index = np.random.permutation(df.index)
    rand_df = df.reindex(shuffled_index)
    train_index = int(len(rand_df)/2)
    train_set = rand_df.iloc[0:train_index]
    test_set = rand_df.iloc[train_index:]
    
    k_rmses = {}
    k_values = [1,3,5,7,9]
    for k in k_values: 
        knn = KNeighborsRegressor(n_neighbors = k)
        knn.fit(train_set[[train]], train_set[target])
        predictions = knn.predict(test_set[[train]])
        mse = mean_squared_error(test_set[target],predictions)
        rmse = np.sqrt(mse)
        k_rmses[k] = rmse
    return k_rmses   

mult_rmses = {}

train_cols = norm_cars.columns.drop('price')
for col in train_cols:
    rmses = knn_train_test(col, 'price',norm_cars)
    mult_rmses[col] = rmses
    
mult_rmses

import matplotlib.pyplot as plt
%matplotlib inline

for k,v in mult_rmses.items():
    x = list(v.keys())
    y = list(v.values())
    
    plt.plot(x,y)
    plt.xlabel('k value')
    plt.ylabel('RMSE')

I expected this chart to match the solution code for this guided project, with lines for each variable showing the RMSE value that corresponds to each k value.

What actually happened:
My matplotlib plot is doubling back right in the middle, even though there are no dictionary keys with duplicate values. So halfway through my chart, I have more lines than I need.

My code is almost completely the same as the solution notebook, so I’m having a hard time finding the source of this error. Thanks for any help!

2 Likes

I’m having the exact same problem. I copied and pasted the code from the solution and found the exact same plot. image

@ninasweeney18 @srauten
I too had the same issue.

Not sure what the underlying cause of this is, but I tried changing

k_value = [1,3,5,7,9]

to
something else, like

k_values = [k for k in range(1,10)]

This got rid of the issue for me.

Hope this helps!

Hi! I just plugged in your code and the graph came out normal, so I was unable to reproduce the error. One thought for you to try though, because the graph seems to only double back on the x axis is to break up your code a bit and try printing out values for X and Y. My guess is that somehow the X values got out of order? perhaps you could try a sort at some point before graphing?

1 Like

hi, i tried your suggestion on my code (as i had thesame issues as the questions raised) and it worked. I’m pondering on the why.

1 Like

The issue is that the code below produces out of order lists (i.e. “9” and it’s corresponding RMSE are not in the end of the lists]. This results in an incorrect plot.

The solution I found was to order the nested dictionary and then unpack the list as shown in this post.

image

3 Likes

@ninasweeney18 @srauten @idowumichael49

There is your why :slight_smile:

@ncarvey

Thanks for your suggestion! Coming from a non-programming background, troubleshooting it this way wasn’t readily apparent to me so I just coded it differently instead. I’m still working on building up the programmer’s logical mindset so thanks again!

Troubleshooting Tip: “Try breaking down code to bits and printing values” - Noted.

@peter.dushku

Thanks for the solution Peter. Other folks who stumble onto this thread with the same issue will definitely find it useful.
Cheers.

1 Like

I’m just circling back to this and really appreciate all your responses! I’m excited to jump back in and fix this issue. Love the virtual community at work :grinning: