Hi,
I have just finished the Predicting the stock market project.
In this project, I use the rolling statistics technique to calculate the average price of the past n days and I also try to change the algorithms to make predictions only one day ahead from the next steps section.

I appreciate it if you can have a look at it and give me your valuable ideas. This is the link to my project.

I really like the way you have structured your project. Kudos for thinking of incorporating functions into the creation of new features and especially on the last step - predicting one day ahead.

I have noticed that in the last step the error metric you chose improves greatly, HOWEVER, that is mainly because you use a different method to calculate the metric, and not because the improvement brought by the day-by-day training!

Earlier you use the root of the mean squared error ( \sqrt{mean(error^2)} ), while in the last case you use the mean of the root of the individual squared errors (ie. \frac{\Sigma \sqrt{(error_1^2)}...\sqrt{(error_n^2)}}{n}, and this contributes to the shrinking of the indicator.

Of course the prediction should be more accurate, but not THIS much better

Thank you, @domenyb, for your comment.
I used the same error formula for both algorithms:
mse = mean_squared_error(test[target], predictions)
rmse = np.sqrt(mse)

The difference is because the second algorithm calls the train_test function for each row separately; the result is a list of rmse (rmses) instead of one rmse. To compare it with the previous algorithm, I decided to calculate the mean of rmses. Is there anything wrong with this approach?

Your project was extremely helpful to my effort on this project. It had been a while since I logged on so I was pretty rusty. Thank you so much for sharing!

The next steps I’d like to address are how to eliminate features that have little to no predictive power and possibly search for ‘regimes’ in the data. I.e. maybe using all the data to predict the next day’s target is too arbitrary, maybe weighting more recent data is more predictive. We’ll have to test and see…

Thank you, @everett.k.perry, for your comment. I am so happy that sharing my project was helpful.
I like your idea about the next steps and would like to see the result.

I really liked your project! I personally learnt alot from it.

I did notice one thing though. Near the end when you try to build a model that predicts for a day ahead, you predict for all data points in the dataset, even for dates before the year 2013. Now the issue is that you can’t compare the error calculated (which is 5.48) with the one you recieved earlier (22.2). This is due to the fact that the latter error was calculated from the prediction of the years 2013-2015 and the former incorporates all predictions calculated from the year 1951-2015.

The issue is that as you keep going back in time from 2013, the stock market price is lower and lower, which makes the error calculated for each point lower as well. a 10% error for a stock price of 500$ is much lower in value than a 10% error for a stock price of 1,500$. This explains why you got such a low error. I tried to calculate the error for a day ahead only from the year 2013-2015 to have a fair comparison and the error was pretty much the same.

Also, when you are calculating the error in the function for making a prediction for only one day ahead, i think that there is no need to calulate the mean squared error. This is due to the fact that it is only a single prediction, therefore you are squaring then square rooting a single number which is the same as the mean absolute error. Try using mean absolute error and mean squared error in the last step and you will get the same result.

I benefited alot from your project so i thought to point these out while i was on it.