Going fast! #DataquestChallenge Premium Annual Offer:
500 get 50% & the next 1000 get 40% off.
GET OFFER CODE

Predicting Car Prices (m155)

Hi all!
Upload my new project.
Wait for yours notices and criticism.
01_Predicting_Car_Prices_m155.ipynb (674.7 KB)
.

Click here to view the jupyter notebook file in a new tab

1 Like

Hi Vadim,

Congratulations on completing another cool project! I liked a lot of things: your curiosity to the data and especially to details, profound data analysis, doing extra steps, very well-structured and informative storytelling, the links to additional resources all around the project, using both r2-score and RMSE, clear highly readable and well-commented code, cool cover picture.

Some minor comments for your consideration:

  • You can download several functions from the same module at once, instead of dividing them into several rows. For example:
from sklearn.model_selection import train_test_split, KFold, cross_val_score
  • You forgot to add the link to the dataset documentation in the introduction (only the one to a direct downloading the dataset).
  • Avoid obvious code comments, like # Import required modules, # call KNeighborsRegressor object, # Fit model. Otherwise, your code commenting is perfect.
  • Avoid numbering the subheadings. In some cases, it can be confusing, especially when you have sub-division of sections.
  • For the feature selection, I’d suggest you to try all the numerical columns (and avoid all the others, like make and engine-type). For example, the feature engine-size seems to be unexpectedly strongly correlated with a car’s price. On the other hand, you can’t use the non-numerical columns for ML prediction, even though I totally agree with you, and my common sense suggests it as well, that the make influences the price much more than the engine size :slightly_smiling_face: However, you cannot technically use the make for ML, while the engine size you can.
  • For all the code cells with several outputs ([3], [5], etc.): it’s always a good idea to add internal subheadings for each output.
  • As for the information on mileage: you can find it in the columns city-mpg and highway-mpg.
  • Visualizations: consider making the plot titles and labels bigger. Also, the legends for [6], [11], and [13] look a bit overwhelming, you can consider making them shorter.
  • The code cells [15] and [16]: the outputs here also look a bit overwhelming. You can consider making graphs instead, or alternatively rounding the output values. Also, it’s better to output, for example, “3 folds…” instead of “3 k-folds…”.
  • When you use a direct quotation from a source, you can find useful this technique in markdown.
  • In the conclusion, I’d add the following: which feature (or a feature combination) is the most helpful for the car price prediction in the ML algorithm?

Great job, Vadim, as usual. Good luck with your future projects and keep this fast and efficient pace of learning!

1 Like

Elena, thank you very much.
I will work for further fixing yours notices in the future.
N.B. I seem that city-mpg and highway-mpg mean average fuel consumption in the miles per gallon for the city and the highway but don’t mileage.

1 Like

Ah, yes, sorry, Vadim, you’re right about the mileage, those columns are not about it. I see now that I’m doing this mistake in my own project. I was so sure that I didn’t even checked the real values in those columns :sweat_smile: Thanks for ponting it out!

1 Like