My first ML project aka how I've almost burned my laptop predicting car prices

This was a long one, started with a simple trick to shine among many:

  • check performance of 3 different versions of the dataset on different models

then I’ve started experimenting with random seed and that was opening up a pandoras box:

  • I’ve run all models on 100 random seed numbers to check how the model performs in 100 different cases, not in a single case (I’d really be grateful for feedback on that matter) instead of looking for the lowest result in those 100 runs . I was looking for a model that performed the best on average on 100 runs(100 different random seeds).

  • I’ve checked all column combinations (selecting just the top columns from single column model results is not a great solution)

Apart from that, the usual: k-values, column numbers, cross validation - check it out. I’d be curious if that approach with 100 random seeds makes any sense or was it a total waste of my laptops cooling fans. (in 1 case I went over 1.4 million rows with results dataframe)

cars_ml_small.ipynb (2.8 MB)

project on Github

Jupyter Notebook Viewer

Click here to view the jupyter notebook file in a new tab

1 Like