I would like to share my solution of the Predicting Bike Rentals project.
Additionaly to the basic requirements, I did :
- a more thorough preliminary analysis
- removed some anomalies from the data set
- clustered weather parameters
- used k-Fold cross validation
- for linear regression, studied if conversion of datetime parameters to dummy variables would improve the predictions. Yes, it improves the accuracy because there are seasonalities
- for all regression types, studied the performance of a model that finds a trendline first and then predicts the trend-corrected values. It improves the accuracy significantly, e.g. combination of a linear regression with a simple decision tree has the same error as non-optimized random forest.
Overall, I found that a combination of a linear regression with a random forest gives the best result.
Please note that as I used k-Fold x-validation without randomizing, the errors might be a bit higher than for random split of data set to test and train sets, which was proposed in the guidelines.