# Guided Project_ Predicting Car Prices

Hi everyone,

Here is my guided project on predicting car prices using KNN algorithm.

Instead of iteratively modifying the functions, I added to them all the potentially useful parameters from the beginning and then tuned them. I used train/test validation and k-fold cross-validation algorithms, and for both, I built several univariate and multivariate models and estimated the error for each. The spaghetti-plots, which usually look quite scaring, this time were rather insightful and allowed to easily find the model with the minimum error.

Looking forward to receiving your feedback. Please let me know what can be improved in my project. Code efficiency, storytelling flow, correctness of conclusions, any eventual errors or typos - anything you would suggest will be of great use for me.

P.S. The cover picture of my project I took it myself, when traveling in Chile Hope youâ€™ll like it

https://app.dataquest.io/c/36/m/155/guided-project%3A-predicting-car-prices/3/univariate-model

5 Likes

looking primo, Iâ€™ve got 2 humble remarks:

1. Doing this project I found that every parameter / hyperparameter is dependent on another oneâ€¦ so if you deducted in step sayâ€¦4 that the best k number of neighbors is equal to 6, then you moved to step 5 to fiddle with k-fold â– â– â– â–  (duuuh k n o b, what is it with this word?!) . I wouldnâ€™t leave the number of k neighbours k n o b fixed at 6, I would still keep on testing it and on top of that test the k fold value. In the lesson weâ€™re sort of being told to leave it fixed and move on(which is easier, and computationally cheaper) . In the future projects GridSearchCV does all of that for us and it does check all the possibilities from the list like Iâ€™m describing. â€¦hope thatâ€™s clear
• so in your last code cell I improved the RMSE value just by lowering the k neighbours value
1. remember the time I was flooding my notebook with lines of code for styling plots, and looking for a way to reduce that amount of repetitive code for plots?
• and you told me to use functions for that
1. actually 3rd very small one: everyone uses same colors for plots, I know the visuals of a machine learning project are not the most important part at all but a few color changes and suddenly your notebook looks different than the 100 other ones the recruiter/ client/ boss saw today.
1 Like