How ```def get_customers(yearmonth)``` works?

Screen Link: https://app.dataquest.io/m/468/business-metrics/8/churn-rate

DQ Code:

def get_customers(yearmonth):
    year = yearmonth//100
    month = yearmonth-year*100
    date = dt.datetime(year, month, 1)
    
    return ((subs["start_date"] < date) & (date <= subs["end_date"])).sum()

I couldn’t understand how this function works.

Specially the subs[“start_date”]. how will the function recognize which row to get to compare with date?

What tricks me is that this function will be applied to

churn["total_customers"] = churn["yearmonth"].apply(get_customers)

It’s totally understandable that it will iterate row by row, but how the function will iterate on subs[“start_date”] subs[“end_date”] ?

3 Likes

It automatically compares each row in subs["start_date"] with date. Same for the comparison with subs["end_date"].

How it compares each row with date will lead you to trying to understand how the source code is implemented/designed, which is not necessary to get into. But you can look up Vectorization in Numpy or Pandas to try and understand this. Broadly speaking, it’s about applying a particular operation to multiple rows or columns without explicitly looping over each value in those rows or columns and is much faster than traditional loops.

return ((subs["start_date"] < date) & (date <= subs["end_date"])).sum()

(subs["start_date"] < date) will give a Series where each row will be either True or False depending on the comparison. Same for (date <= subs["end_date"]).

The & operation will perform an and operation on the two Series, and will result in a Series with either True or False for its rows.

And then sum() will find out how many rows in that Series are True.

3 Likes

@boufatma.saraa You can learn more about this by checking out Khan Academy’s Algebra videos, for this one watch this Scaler Multiplication, In simple words, it automatically makes comparison with each element in the vector.

1 Like

WHat happens if we use ‘and’ instead of ‘&’ in the return statement?

You should try it out and see what happens.

If you have further questions please create a separate post for it since it’s not related to the original question.