Guided Project - Predicting House Prices

Hi Guys!
I have tried never so hard as much as I have tried to work on this. And that was due to my struggle to do a lot of things by myself. Playing around with pipeline’s functions is exciting but comes with a price tag :exploding_head: :sweat_smile:
Anyway, I have managed to get an RMSE of < 25000 and an R2 of 90%
I have tried a lot of things and none of it is ‘technically’ above what we learned in its course.
I hope that this project will invite lots of feedback as well as provide newbies some more ways to think of ML problems.
How can I further improve it?
I request all seniors to give it a glimpse and prove me your priceless feedback
Predicting House Sale Data.ipynb (1.5 MB)

Click here to view the jupyter notebook file in a new tab

Hi Ali!

Excuse me for long answer, but I was very busy looking for possible interest bug in the Tensorflow. Open ticket in the Tensorflow - wait answer about decision possible bug .

I saw you project. On other hand nice work, but other hand if you filling NaN value by mean - you get a beautiful in appearance but unreliable result.
I read the “Feature Engineering and Selection: A Practical Approach for Predictive Models” by Kjell Johnson, select 26 features from AMES dataset using mutual analyze and apply Ridge and Lasso - but my bellyful model it was equal R2 = 0.39 only in one feature - “square” and then it simply diverged starting from the second parameter it had a minus sign. I spent one week - but he model did not converge.
My IMHO in the future - don’t use mean/min/max - its distorts the real possible data rules - or delete its, or use knn neighbors or other - see book above.
Do not complicate the entities beyond what is necessary - use the Okama rule.
Keep in mind that ML and DL like search for a black cat in a black room without light where there is no cat, there is a solid empiricism here.

Heyy Vadim!!!
I am sorry I couldn’t reply to you earlier because I had to undergo eye treatment. Had developed a severe infection. Now I am doing well.

That’s interesting. I am going to look for this book right away. Other than that, I had a lot of problems with playing around with multiple parameters. Its didnt come easy to me to pick the right param_grid. I mean that GridSearchCV is supposed to give me the best answer out of the param_grid I suggest to it.
The trouble is what to feed my param_grid with to get the best results. Hope I am clearly stating you my confusion. :shushing_face:

My Best Regards Vadim,
You have played the biggest part in my recent improvements.
Thanks a lot for it