Gargantuan values in Ames Housung GP, where do they come from!?

After two days of struggle with this project I am ready to give up. I’ve come a long way, but I really do not understand these extreme values. Like how does anyone get an rmse of 10^15 big haha.

I hope there will be a great lesson here, and I hope there is someone there to understand my slightly messy code :smiley:

Predicting house sale prices.ipynb (762.2 KB)

By the way I found a small mistake in the solutions:
at the end they are doing this:

shuffled_df = df.sample(frac=1, )
        train = df[:1460]
        test = df[1460:]

It’s like this shuffled_df is not being used :slight_smile:

Click here to view the jupyter notebook file in a new tab

UPDATE: I’ve seemed to have found the correction. Which were a few categorical columns with more than 80% one value. Especially the ones where some values only had 3 counts created some dummy columns which were able to mess it up a lot.

Now my model scores around 31000 which is okay for me after this time :smiley: