I like to think that’s because you like my projects so much!
Good call on the Washington D.C. callout – better to explain the data a little bit for context before giving the results.
If I weren’t studying data science, maybe I’d make it as a politician? Hahaha. Best to cover my butt because sometimes missing values are weirdly encoded and hard to find, right?
For the title and subtitles I think I was testing different ways to show the title and subtitle and ended up going with ax.text because it gave more flexibility in spacing and location – I must have forgotten to delete the old ones!
Regarding correlations I’d say that you would want at least a moderate correlation, you could include weak correlations but risk adding noise to the model. I think of it this way, if a weak correlation is between 0 and .3 would you find value in a feature if the correlation is 0 but is still categorized as weakly correlated? There’s definitely nuance, if you’re lacking features maybe you can shift your window to .2 etc. but more features doesn’t necessarily lead to a more accurate model (or vice versa), especially with linear regression.
So that’s why there were fewer days haha, I was curious why and now I know!
What range function?
Cell 41 – the cnt variable for a given hour is just registered + casual. In our analysis we see that registered and casual users have different behavior with the bike share program. The hypothesis was that because of these differences in behavior we would be able to more accurately predict the registered and casual columns and that the sum of their errors would be less that the predicted column. What was surprising was that this wasn’t the case and that the predicted cnt had a lower error than that of the sum of the predicted registered and casual errors.
Those calculations on cell 38 were so expensive to run (like 12 minutes)! I think the output is sorted by priority, but a sorted output would have made them easier to compare. And totally, I think I was maybe a little lazy in not removing colinear values – same is probably true for some of the time of day features.
What’s wrong with cell 18, just a bit bare and without an immediate insight? I was thinking maybe I could have done k-means or something here, too, instead of eyeballing it.
I was pretty proud of the shooting gallery, but you missed one big thing… The MAE and RMSE labels are flipped! Gotta fix that.
Thanks so much for your time Rucha! Great feedback and it’s appreciated more than you know .