Hello, I am sharing my guided project on predicting car prices using KNN. Some things that make this project unique from the original solution are:
- It discusses stratifying the train-test splits by labeling prices as “normal” or “high” outliers.
- It uses
SelectKBest
from scikit-learn to select features. - It combines feature selection and hyperparameter optimization in a nested for-loop, then finds the best model.
I would like feedback on the clarity and accuracy of my explanations. I would also like to know whether you agree or disagree with the final set of features and the k-value that I chose for my final model.
If ever, I would also like to know if there is a convenient way to do stratified k-fold cross-validation for regression problems. I read about sklearn’s StratifiedKFold
class, but it only seems to work on classification problems.
By the way, since the project is on my personal website, the code blocks are hidden and only the outputs are visible. You can open the code blocks by clicking the “Show Code” buttons.
Last mission screen URL:
https://app.dataquest.io/c/36/m/155/guided-project%3A-predicting-car-prices/6/next-steps
Link to my project: Predicting Car Prices using the K Nearest Neighbors Algorithm | MG Data Science