MIssion 468-8: Churn Rate; Applying Datetime Module

Hello,
I am trying to understand how the datetime module can be applied to the mission and, in the process, understand how it works;
Screen Link:

My Code:

to create a new column with a value equal to first of the start date month

subs[‘start_date’].dt.to_period(‘M’).dt.to_timestamp().head()

to create a dataframe listing all the months for the period under study- from screen 468-7:

date_series = pd.DataFrame(pd.date_range(start=‘2011-01’, end=‘2015-11’))
date_series = date_series.rename(columns={0:‘date’})
date_series.head()

trying to merge the files like in 468-8:

new_data = date_series.copy()
new_data.merge(new_data,monthly_churn,how=‘left’,left_on=‘date’, right_index=True)
What I expected to happen:

I expected a merged file like shown in the solution by DQ

What actually happened:

TypeError                                 Traceback (most recent call last)
Input In [58], in <cell line: 2>()
      1 new_data = date_series.copy()
----> 2 new_data.merge(new_data,monthly_churn,how='left',left_on='date', right_index=True)

TypeError: DataFrame.merge() got multiple values for argument 'how'

I was trying to use the new column with the start of each month as a “check” date for building a query to pull out whether a customer was current for the month under consideration or had been churned e.g. start_date<check_date(first of the month) AND end_date after check_date. But I am stuck as mentioned here.

I think it might have something to do with how datetime actually stores data. Even though the display might be year-month

subs['start_date'].dt.to_period('M')

but the actual data stored is of the format:

subs['start_date'].dt.to_period('M').dt.to_timestamp().head()

which I suspect also incudes the hours etc.(default bing midnight I think).

The problem here was how to perform the groupby and then filter the results for current customers (i.e. those not churned yet).

Hope someone can shed light on how datetime works and how it can be applied to this mission or why it cant be used here;

Thanks

Hello,

I’m not on the business path, so I’m not familiar with your current screen.

But from looking at the screen, the error happened because you passed two positional arguments in mergenew_data, monthly_churn.

What actually happened:
new_data.merge(new_data,monthly_churn,how='left',left_on='date', right_index=True)

is (probably) equivalent to

new_data.merge(new_data,how=monthly_churn,how='left',left_on='date', right_index=True)

Other than that, since you’re calling merge method on new_data directly (instead of using pd’s own merge), you don’t need to pass in new_data again. The only thing you need is the “right” dataframe which is monthly_churn.

So, yeah, I’m not sure if it’s dt related. I could be wrong so please correct me if that’s the case.

Reference:

pandas.DataFrame.merge

1 Like

Thank you for the reply;
Actually, one of the issues I encountered was how to generate a “monthly _churn” churn column using datetime; the monthly_churn as given in the lesson is generated as data type int64. Trying to merge it with datetime columns gives the error

....
ValueError: You are trying to merge on datetime64[ns] and int64 columns. If you wish to proceed you should use pd.concat

Using datetime to generate monthly_churned column:

subs['end_month'] = subs['end_date'].dt.to_period('M')
churn_ea_month = pd.DataFrame(subs.groupby('end_month').size()) #ln  29
type(churn_ea_month)
churn_ea_month = churn_ea_month.rename(columns={0:'churned_month'})
churn_ea_month.head()

gives result:

	churned_month
end_month	
2011-03	8
2011-04	16
2011-05	29
2011-06	27
2011-07	37

Generating a datetime series corresponding to the yearmonth (yyyymm) column but using datetime:

date_series = pd.DataFrame(pd.date_range(start='2011-01', end='2015-11'))
date_series = date_series.rename(columns={0:'date'})
date_series.head()

result is:

	date
0	2011-01-01
1	2011-01-02
2	2011-01-03
3	2011-01-04
4	2011-01-05

but trying to merge, as given in the lesson, fails:

new_data = date_series.copy()
new_data.merge(churn_ea_month,how='left',left_on='date', right_index=True)

result:


date	churned_month
0	2011-01-01 00:00:00	NaN
1	2011-01-02 00:00:00	NaN
2	2011-01-03 00:00:00	NaN
3	2011-01-04 00:00:00	NaN
4	2011-01-05 00:00:00	NaN

It would seem that the groupby function fails when it comes to the merge since I get output for individual days rather than one row per month. When I ran the groupby by itself above, I got what seems to be the correct output e,g, 8 churned for March 2011 (2011-03). But when I tried to merge it with the month-year dataframe to allow for months when there was no churning, the merge failed as shown even though individually they seem to work giving me the number churned each month and a column of month-year from start to end of the data.

1 Like

Hello,

I’m not sure if my solution is correct, but you can read it below:

The above from what I’ve tried can be problematic because you’re using a PeriodIndex instead of the usual Int index. One way to fix it is to use reset_index so you can have an Int index instead.

To get month-by-month instead of day-by-day from the above, you’ll need to pass freq='M' as an argument for date_range.

Since the index has changed, we can’t really merge by using the index of the right dataframe anymore. But, we can use the churned_monthcolumn instead.

Here’s the whole code illustrating everything I’ve mentioned:

import datetime as dt

subs['end_month'] = subs['end_date'].dt.to_period('M')
churn_ea_month = pd.DataFrame( {'churned_month': subs.groupby('end_month').size()}) 
churn_ea_month = churn_ea_month.reset_index()
date_series = pd.DataFrame(pd.date_range(start='2011-01', end='2015-11', freq="M"))

date_series = date_series.rename(columns={0:'date'})
date_series['date'] = date_series['date'].dt.to_period('M')
date_series.head()

new_data = date_series.copy()
new_data = new_data.merge(churn_ea_month,how='left',left_on='date', right_on='end_month')

# alternatively, do the following to prevent duplicates:

# churn_ea_month = churn_ea_month.rename(columns={'end_month':'date'})
# new_data = new_data.merge(churn_ea_month, 'left', left_on='date', right_on='date')
1 Like

Thank you; it is working now.

1 Like