I’m working on a model to predict churn. I understand the concept of training and testing, but I’ve failed to adddress this in a real life situation.
Assume that I’ve a dataset regarding to a subscription based business. I have 5K churned customers and 15K active customers. In the general way, what all ML courses show is that split the data 80/20; train it and test it. We predict a target and compare with actual column, which makes sense.
But in my case, if I want to predict how many of these active 15k are likely to churn, how would I break down my data into train, test and predict? Prediction has to be on 15k active users since it’s from those people we want to know who is going to churn, so should I train only churned customers; or something else?
I’m a bit confused.