Df["yearmonth"] = 100*year + month What does 100*year mean?

Screen Link: https://app.dataquest.io/m/468/business-metrics/4/net-promoter-score
Could someone explain what 100*year mean in the code below?

Code:

import pandas as pd

df = pd.read_csv("nps.csv", parse_dates=["event_date"])
year = df["event_date"].dt.year
month = df["event_date"].dt.month
df["yearmonth"] = 100*year + month

df["category"] = df["score"].apply(categorize)

nps = df.pivot_table(index="yearmonth", columns="category", aggfunc="size")
nps["total_responses"] = nps.sum(axis="columns")
nps["nps"] = (nps["Promoter"]-nps["Detractor"])/nps["total_responses"]
nps["nps"] = (100*nps["nps"]).astype(int)

I found a stack overflow page about converting datetime obj to integer, but I cannot fully understand it. The page is here; https://stackoverflow.com/questions/28154066/how-to-convert-datetime-to-integer-in-python

Thank you.

yearmonth stores the year and the month of each row in the yyyymm format. Supposed the year is 2020 and the month is 1, to get 202001 which is in the format of yyyymm we will have to multply year by 100, then add the results to month in order to get yearmonth in the required format.

Understood?

3 Likes

Thank you so much! It totally makes sense.
I did not notice that dt.year was actually int type.

1 Like

How would someone come to this conclusion? What do I need to work on to think more like this? I have been stuck on this problem for many days, I did a lot of research and tried different things. I knew the simplest way would be to add year+month but of course, it was missing that padded ‘0’ for the month of Jan - Sept… Should I study what output Im getting and just work from there? or do I just need to be a better mathematician to be better at this?

1 Like

Disclaimer; I am just a learner of Python and am not a professional. So, please don’t take what I am going to say seriously .

I guess you can get this kind of sense by getting new perspectives from someone’s code or trying to think of different solutions like you did. Moreover, this is just one of solutions. So, I guess that you don’t have to worry a lot.
I am sure that you have obtained many things from your research. It is fantastic!!

2 Likes

You’re very right anyway! Happy learning to you and thanks for the perspective!

2 Likes

why not use:

df[‘yearmonth’] = df[‘event_date’].dt.strftime("%Y%m").astype(int)

isn’t it better or faster or easier to read compared to manipulating numbers?

4 Likes

I would second that. Multiplying year by 100 is an ambiguous way to solve the problem and it takes more lines. Clean way would be to convert the format of the date to YYYYMM, using dt.strftime("%Y%m).

1 Like

Yes, this is much better than that. I actually did same but forgot to include .astype(int).