Finished My First Personal Project Post-R Pathway - Analyzing Gym memberships

Hey All,

Happy 2021! Hope everyone had an amazing holiday. I realized it’s been a while since I’ve checked in (like Sept 2020), I figured I wanted to share with you all a small data science project that I had been working on. If you aren’t aware, there had been a crazy uptick in COVID cases in Ontario which led to full shutdowns again back in November. While I was really bummed out about this, I figured I would take what I had learned from the R pathway and put it towards a cool little project to keep me preoccupied. This sort of came about when I mentioned to my boss that I was working on picking up some programming language for a few months and wanted to take a crack at doing something with it. Anyways, he was cool with it and, I went ahead with it.

With this project, I wanted to look at membership retention with our studio and see if there may be some sort of relationship with certain demographics, purchase options, attendance behaviors, and/or membership service interactions that could be playing a role with membership retention. I ended up building a dataset collecting information from our billing + scheduling software, emails, members files, etc. From there I went through a lot of data cleaning and eventually some data analysis to accomplish my objective.

While this took a while to do (largely relating to making the data set and the data cleaning), it was pretty cool to be able to apply some of the things that I learned in the R pathway towards this, namely with respect to data visualization and predictive modeling. Obviously, as anyone here would say, things get really complicated when dealing with real-world data. (i.e. non-gaussian data distribution, zero-inflations, missingness, etc.). Plus, due to the nature of my objective, I ended up using more exotic regression analyses to accomplish my tasks than what was shown here (i.e. survival analysis, random forest, logistic regression). So that obviously had me turned to a number of other resources (which I will share below on what I used for reference) which really helped out. Anyway, I was able to finish it and share it with my boss to get his thoughts on the findings. :triumph:

Overall, the whole first data science project was daunting and frustrating but also a satisfying experience. Honestly, I was sort of hesitating to even start this since the scope of the project and what I wanted out of it (seriously had some grad school PTSD all over again) and doubts about even finishing it. However, in the end, I knew I was better prepared after taking my time to really understand how to use R from the course that I got the confidence to knock this out before the end of the year.

So if anyone actually made it to this point and is planning on doing their first data science project post-pathway, here are a few things that I would say:

  • Be clear on what sort of project you want to accomplish. Things can get off track really easily and you’ll just be spinning your wheels trying to do something that is going to be seen as an extra rather than what you are actually trying to accomplish.

  • You’ll definitely find yourself scouring the internet trying to find the answer to a problem and run into an answer that sort-of addresses your problem (likely on stack exchange/stack overflow/research gate) but is probably going to be explained in a way that is over your head and you’ll have no idea what they are even talking about. That’s cool, don’t freak out. There is likely a Reddit post, YouTube video or someone’s blog that will likely explain things easier than an interaction b/t statisticians or academics.

  • Know your statistics as well as the assumptions of the modeling approach you are performing. Nothing is more annoying than coming up with a model only to find that the way that you approached it is riddled with bias or fails to meet the assumptions of the model which means your model is invalid.

  • Check out some blogs or posts from others that have tried to do something similar to your project to get an idea of how they approached it. It might be in a different programming language, but that’s something that can be overcome easily in most cases.

  • Have a few days to dissociate yourself from the project for the sake of your mental and physical health. More often than not, after taking 3 days off without even looking at RStudio, I came back with better clarity and motivation to get things done.

  • Don’t get discouraged by the lack of progress. I’ve literally spent a couple of days just redoing plots because I got OCD with how somethings looked. Or just end up in some rabbit holes about how to do a particular analysis. Just keep in mind that progress is progress and eventually that will add up over time. Plus some of those rabbit hole dives will lead you to learn more about best practices or other methodologies that you can use for a future project.

Anyway, here’s a link to my GitHub where you can check this project out along with another link to check out the data set. If you have a chance to, let me know your thoughts.

  • Mike


Bang—Membership-History—Jan-2018-to-Oct-2020.pdf (2.2 MB)