Churn-rate denominator

start_date occurred before the first day of then given month
end_date is later than the last day of the previous month

I do not understand the logic here
this would mean if given month is Jan , then the start_date is any date of any month in the previous year so jan-dec?
But end_date is later than the last day of the previous month so later than Dec31
but then that is current month?

import datetime as dt

def get_customers(yearmonth):
year = yearmonth//100
month = yearmonth-year*100
date = dt.datetime(year, month, 1)

**return ((subs["start_date"] < date) & (date <= subs["end_date"])).sum()**

churn["total_customers"] = churn["yearmonth"].apply(get_customers)
churn["churn_rate"] = churn["total_churned"] / churn["total_customers"]
churn["yearmonth"] = churn["yearmonth"].astype(str)

arange = __import__("numpy").arange
Ellipse = __import__("matplotlib").patches.Ellipse
ax = churn.plot(x="yearmonth", y="churn_rate", figsize=(12,6), rot=45, marker=".")
start, end = ax.get_xlim()
ax.get_xticks()
ax.set_xticks(arange(2, end, 3))
ax.set_xticklabels(yearmonths[2::3])
circle = Ellipse((35, churn.loc[churn.yearmonth == "201312", "churn_rate"].iloc[0]),
5, 0.065, color='sandybrown', fill=False
)
ax.xaxis.label.set_visible(False)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.get_legend().remove()
• create a function that takes as input an integer representing a month in the format yyyymm and returns the number of rows in subs satisfying all of the following criteria:
• start_date occurred before the first day of then given month
• end_date is later than the last day of the previous month
• Create a column in churn named total_customers that is the result of applying the function created in the previous step to churn["yearmonth"] .
• Calculate the churn rate for each month:
• Divide the total_churned column by total_customers
• Assign the result to a column called churn_rate
• Set yearmonth as a string type.
• Use the commented code to plot the churn rate across time.

Hello @jamesberentsen

If you are concerned about how the function works, here is what I did:

First you have two dataframes: subs and churn. You have your start_date and end_date information in subs. The subs dataframe is already preloaded. It is a global variable and can be accessed in the get_customer function.

You want to compare the datetime information in the subs dataframe columns start_date and end date and the datetime information on the churn dataframe column yearmonth.

What this function does it that it takes a value in yearmonth column of churn, extracts the year and month (year = yearmonth//100 and month = yearmonth - year*100). Convert this time to datetime format (date = dt.datetime(year, month, 1).

It checks if it is greater than the start date (v = subs['start_date'] < date) but less than or equal to the end date (w = date <= subs['end_date']). It returns the sum if both information are true.

Cheers!

def get_customers(yearmonth):
year = yearmonth//100
month = yearmonth - year*100
date = dt.datetime(year, month, 1)
v = subs['start_date'] < date
w = date <= subs['end_date']
return (v & w).sum()

subs['start_date'] < dt.datetime(2011, 1, 1)

Where you compare an entire series to a particular date. You get a boolean series that you can sum.

2 Likes

Itβs a bit ambiguous, however, it is mostly about the fact that not both days are in the same month. So maybe they should change the word previous to latter. But it is a difficult sentence I agree.

So they mean, if someone started the subscription they only count as a sub when they have not quit the subscription before the end of the month. So 5-31 december is not a sub, 5 Dec - 1 Jan is. Hope that clarifies

2 Likes

Ah,
I see it has to span more than one month to count as a subscription,thanks.

Thanks for your explanation it is very complex for me this function.

1 Like

Do the steps outside the function of see how it works.

**

This can be done by counting the number of rows of subs where start_date is smaller than the first day of the month and where end_date is greater

**

would it give wrong result in the case that it would count a subscription /row with these start and end dates
i.e it counts this since result is in a different month β 1 month between β2015-04-30β and β2015-05-02β which is actually just 2 days?

I am examining these start and end date columns in output ,

but not sure how to step into the function to see how it works.

You pick a date in yearmonth on the churn dataframe and you check with churn['total_customers'] = churn['yearmonth'].apply(get_customers) to see how many customers have churned. It will take a lot of time to do some manually.

To try something outside a function means to run these outside get_customer one by one to see how it works: Add the next line when you understand what the other things are doing.

yearmonth = #Specify a date
year = yearmonth//100
month = yearmonth - year*100
date = dt.datetime(year, month, 1)
v = subs['start_date'] < date
w = date <= subs['end_date']
(v & w).sum()
1 Like

Many thanks.
I found that to be an excellent breakdown
by assigning to variables v & w.
It was harder
for me to understand when it was all in one line

def get_customers(yearmonth):
year = yearmonth//100
month = yearmonth-year*100
date = dt.datetime(year, month, 1)

**return ((subs["start_date"] < date) & (date <= subs["end_date"])).sum()**

churn["total_customers"] = churn["yearmonth"].apply(get_customers)

now I see why the total_customers column is cumulative, can relate code output to column output and get how it was created.