CYBER WEEK - EXTRA SAVINGS EVENT
TRY A FREE LESSON

Guided Project: Predicting Car Prices - how to measure our results?

In the project we can use k-values, column numbers, etc .to pursue the best results. I’ve also started experimenting with random seed, reshuffling the whole dataframe can give us veeeery different results! (like rmse down to 1700) It made we wonder:

  1. Is it fair to improve the model results just by changing the random seed number? We’re reshuffling the index of the dataframe and we’re getting better (or worse) results on our existing dataframe, but if we’d like to test our model in the real world (on a different dataset) it may not be so great.
    1.a Should we use that random seed trick to improve other variables in our model (columns, column number, k value etc). Then just use the configuration that gave us the best result… Wait but that only worked in 1 example of random index! That leads to another question:

  2. What is the best result? Is it the single best result (certain configuration of parameters and one random seed number) or should we create few best configuration options, then loop them through multiple random seeds(like a 100) and check which configuration had the BEST AVERAGE result, not the single best result? That should work better in the real world on future datasets?

chart for fun