Pandas.merge on index why must it be 'right_index=True'?

https://app.dataquest.io/m/468/business-metrics/7/date-wrangling

churn = pd.merge(churn, monthly_churn, "left", left_on="yearmonth", right_index=True)

I do not understand why it is needed to set right_index=True
I understand the other parts like join the left dataframe --churn–on its column
left_on=“yearmonth”

it is apparent to me that churn has its index with rows 1,2,3,4…but it is not apparent what the index is in the monthly_churn dataframe

You merge two dataframes on a column that contains the same information. In this case, churn which is the dataframe to the left contains a column called yearmonth with year and month data, while year and month data is the index of the monthly_churn dataframe.

If you check screen 6, the index of the monthly_churn series was year and month. When you convert into a dataframe, it retains this index. Unless you want to reset it.

Finally, see where the magic happened on screen 6. When you use group_by, the column you are grouping with becomes the index.

monthly_churn = subs.groupby('churn_month').size()

4 Likes

Hi monorienaghogho,

I get it now thanks, that must be the index because it is the first column of the monthly_churn dataframe.