# Churn-rate denominator

start_date occurred before the first day of then given month
end_date is later than the last day of the previous month

I do not understand the logic here
this would mean if given month is Jan , then the start_date is any date of any month in the previous year so jan-dec?
But end_date is later than the last day of the previous month so later than Dec31
but then that is current month?

``````import datetime as dt

def get_customers(yearmonth):
year = yearmonth//100
month = yearmonth-year*100
date = dt.datetime(year, month, 1)

**return ((subs["start_date"] < date) & (date <= subs["end_date"])).sum()**

churn["total_customers"] = churn["yearmonth"].apply(get_customers)
churn["churn_rate"] = churn["total_churned"] / churn["total_customers"]
churn["yearmonth"] = churn["yearmonth"].astype(str)

arange = __import__("numpy").arange
Ellipse = __import__("matplotlib").patches.Ellipse
ax = churn.plot(x="yearmonth", y="churn_rate", figsize=(12,6), rot=45, marker=".")
start, end = ax.get_xlim()
ax.get_xticks()
ax.set_xticks(arange(2, end, 3))
ax.set_xticklabels(yearmonths[2::3])
circle = Ellipse((35, churn.loc[churn.yearmonth == "201312", "churn_rate"].iloc),
5, 0.065, color='sandybrown', fill=False
)
ax.xaxis.label.set_visible(False)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.get_legend().remove()
``````
• create a function that takes as input an integer representing a month in the format `yyyymm` and returns the number of rows in `subs` satisfying all of the following criteria:
• `start_date` occurred before the first day of then given month
• `end_date` is later than the last day of the previous month
• Create a column in `churn` named `total_customers` that is the result of applying the function created in the previous step to `churn["yearmonth"]` .
• Calculate the churn rate for each month:
• Divide the `total_churned` column by `total_customers`
• Assign the result to a column called `churn_rate`
• Set `yearmonth` as a string type.
• Use the commented code to plot the churn rate across time.

Hello @jamesberentsen

If you are concerned about how the function works, here is what I did:

First you have two dataframes: `subs and churn`. You have your `start_date and end_date` information in `subs`. The `subs` dataframe is already preloaded. It is a global variable and can be accessed in the `get_customer` function.

You want to compare the datetime information in the `subs` dataframe columns `start_date and end date` and the datetime information on the `churn` dataframe column `yearmonth`.

What this function does it that it takes a value in `yearmonth` column of `churn`, extracts the year and month (`year = yearmonth//100 and month = yearmonth - year*100`). Convert this time to datetime format (`date = dt.datetime(year, month, 1`).

It checks if it is greater than the start date (`v = subs['start_date'] < date`) but less than or equal to the end date (`w = date <= subs['end_date']`). It returns the sum if both information are true.

Cheers!

``````def get_customers(yearmonth):
year = yearmonth//100
month = yearmonth - year*100
date = dt.datetime(year, month, 1)
v = subs['start_date'] < date
w = date <= subs['end_date']
return (v & w).sum()
``````

`subs['start_date'] < dt.datetime(2011, 1, 1)`

Where you compare an entire series to a particular date. You get a boolean series that you can sum.

2 Likes

It’s a bit ambiguous, however, it is mostly about the fact that not both days are in the same month. So maybe they should change the word previous to latter. But it is a difficult sentence I agree.

So they mean, if someone started the subscription they only count as a sub when they have not quit the subscription before the end of the month. So 5-31 december is not a sub, 5 Dec - 1 Jan is. Hope that clarifies 2 Likes

Ah,
I see it has to span more than one month to count as a subscription,thanks.

Thanks for your explanation it is very complex for me this function.

1 Like

Do the steps outside the function of see how it works.

**

This can be done by counting the number of rows of subs where start_date is smaller than the first day of the month and where end_date is greater

**

would it give wrong result in the case that it would count a subscription /row with these start and end dates
i.e it counts this since result is in a different month — 1 month between “2015-04-30” and “2015-05-02” which is actually just 2 days?

I am examining these start and end date columns in output ,

but not sure how to step into the function to see how it works.

You pick a date in `yearmonth` on the `churn` dataframe and you check with `churn['total_customers'] = churn['yearmonth'].apply(get_customers)` to see how many customers have churned. It will take a lot of time to do some manually.

To try something outside a function means to run these outside `get_customer` one by one to see how it works: Add the next line when you understand what the other things are doing.

``````yearmonth = #Specify a date
year = yearmonth//100
month = yearmonth - year*100
date = dt.datetime(year, month, 1)
v = subs['start_date'] < date
w = date <= subs['end_date']
(v & w).sum()
``````
1 Like

Many thanks.
I found that to be an excellent breakdown
by assigning to variables v & w.
It was harder
for me to understand when it was all in one line

``````def get_customers(yearmonth):
year = yearmonth//100
month = yearmonth-year*100
date = dt.datetime(year, month, 1)

**return ((subs["start_date"] < date) & (date <= subs["end_date"])).sum()**

churn["total_customers"] = churn["yearmonth"].apply(get_customers)
``````

now I see why the total_customers column is cumulative, can relate code output to column output and get how it was created.