ML Guided Project help


I am on the Ames Housing guided project and I am training my model using only numeric columns for now. I used train_test_split to split my data before running LinearRegression().

In train_test_split I varied the random_state from 1 to 100. The minimum RMSE which I got was around 31k and the maximum was around 57k.

Because the variation in RMSE is so large, does it mean that the model accuracy is poor and a well trained model will have RMSE value close to each other across various random states?


Commenting so that it goes on the top! @Sahil

Hi @Sandesh,

Not at all! The RMSE value of the model is expected to be high because house price relies on a multitude of factors other than just house features. It is always possible to improve your model’s accuracy. However, to get a very low RMSE value, we may require more data than what’s given in the dataset.

As you can see the top score is 23k.

1 Like