Screen Link: Learn data science with Python and R projects
What actually happened:
Following the example that is given in the page 3 Rolling mean, a doubt arises to me:
The value that we introduce inside rolling(value) goes giving jumps for all our data set, and that’s the reason why we “loose” peaks?
In panda’s documentation:
They talk about:
Size of the moving window. This is the number of observations used for calculating the statistic. Each window will be a fixed size.
…so what it comes to be like a step inside a range(start, stop, step)?
That’s what I think, but I’m not entirely sure.
Hi @Edelberth, not sure if I understand exactly what you’re asking, but higher values of the rolling window eliminate the daily variation in the exchange rate. The line plots become smoother and show a little more clearly the long-term effects. Below, you can see how the plot lines change as we increase the rolling window:
I understand that this is the utility that is given in the exercise,
however the only reason that makes me suspect why it smooths the curves is that it eliminates data and if this is so, the only way is to do it periodically (¿no?).
That is why I was referring to the example of the steps.
Thank you for trying to help me.
The data is not eliminated, but rather changed. If the rolling window is 30, the exchange rate of the dollar on any given day is the average between the exchange rate of that day and the previous 29 days — so the original value changes.
In other words, the visual effect of smoothing the curve comes from averaging the values, not from eliminating the data.
When you calculate the rolling mean on an entire column, you do lose some data, but that doesn’t have to do anything with that visual effect of smoothing the curve. If the rolling window is 30, you lose data for the first 29 days because a rolling mean can’t be calculated for the first 29 days (because the rolling mean needs at least 30 values in this case).