BLACK FRIDAY EXTRA SAVINGS EVENT - EXTENDED
START FREE

Predicting house prices

Hi, everybody. I’ve just finished the project and would like to get your feedback. To be honest, I was a bit dissapointed about the choice of the dataset, as it’s not that clear how important the feature management is.
I’ve read the given answer by the DQ-team and several projects from other students and found just one interesting idea of improving the model’s performance. It is Adam’s approach to work with outliers. But it seems we can’t drop outliers just to decrease RMSE as we miss some important experiments (Adam mentioned it in his conclusion).

Another thing I’d like to mention here: I don’t understand why we are suggested to create a function for data cleaning? As I understand, the idea of using functions is to avoid repeating the same code several times, but the data cleaning is the proces we do only once

house_prices.ipynb (189.2 KB)

Click here to view the jupyter notebook file in a new tab

processing data inside a function saves memory (the variables you create stay inside the function and are not stored in memory, when you’re done with the function) it’s important when you’re working with larger datasets - if you’re interested with experimenting:
https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page

Try cleaning 1 month of this dataset on kaggle notebook (and look at your RAM usage) outside the function and inside the function, compare the RAM usage in both examples

1 Like