Pandas.merge on index why must it be 'right_index=True'?

churn = pd.merge(churn, monthly_churn, "left", left_on="yearmonth", right_index=True)

I do not understand why it is needed to set right_index=True
I understand the other parts like join the left dataframe --churn–on its column
left_on=“yearmonth”

it is apparent to me that churn has its index with rows 1,2,3,4…but it is not apparent what the index is in the monthly_churn dataframe

You merge two dataframes on a column that contains the same information. In this case, churn which is the dataframe to the left contains a column called yearmonth with year and month data, while year and month data is the index of the monthly_churn dataframe.

If you check screen 6, the index of the monthly_churn series was year and month. When you convert into a dataframe, it retains this index. Unless you want to reset it.

Finally, see where the magic happened on screen 6. When you use group_by, the column you are grouping with becomes the index.

monthly_churn = subs.groupby('churn_month').size()

4 Likes

Hi monorienaghogho,

I get it now thanks, that must be the index because it is the first column of the monthly_churn dataframe.

Heya,
I have currently completed this practice piece, and while i understood from @monorienaghogho 's great explanation the purpose of right_index=True parameter,

i tried to experiment a little and try to force the merge to have the index of churn df instead of monthly_churn’s. I was buffled that it returned a typeError ‘TypeError: object of type ‘NoneType’ has no len()’!

Why is this happening? Should’t it take the ‘0-1-2…etc’ indexes of churn df?
I have a vague idea that it somehow relates of the grouped dataframe’s index labels being hardcoded somehow or because its a left merge, but i cant reach a concrete logical answer and if it is possible at all without index resetting.

Also secondary question, If it is possible what the code would be?

Thanks in advance
Stavros

1 Like

@smilonas21

The indexes of churn dataframe are numbers. If you want to merge two dataframes, you merge on a column that have the same keys.

You can:

  • set the index of the churn dataframe to yearmonth column
  • make sure to drop this column and use inplace=True
  • merge the churn and the monthly churn dataframes on their indexes. Both now have the same information in the index
  • To pass the test, you have to reset the index. The answer dataframe’s shape is 47 by 2 and the first columns is the yearmonth column.

Find the code below for this part:

churn.set_index('yearmonth', drop=True, inplace=True)

churn = pd.merge(left=churn, right=monthly_churn, how='left', left_index = True, right_index = True)
churn['total_churned'] = (churn.fillna(0))['total_churned'].astype(int)
churn.reset_index(inplace=True)
1 Like

Thank you very much for the clarification and your patience @monorienaghogho !

So if understood correctly, this is happening because i didn’t take into account a basic merge function:

that in order to have a merge, you need the same keys, and by forcing the merge to take the indexes as keys of the left (ie the churn, which has numbers, something different than the monthly_churn has as its indexes) so it forces a merge between incorrect object types, thus the TypeError !! :man_facepalming:

In the alternative of taking the left_index (churns index) as a key column for the merge,you change that index into yearmonth, drop the column and perform the merge on their indexes that now have the same info.

And finally we use both left_index and right_index =True, because we need both of the indexes as keys.
I wonder if we could use right_on to point to the index, doesn’t matter though we got dedicated parameters for these.

Again thanks!

PS. Thank you @Elena_Kosourova for the edit and the correction of tag :smiley:

2 Likes

You can use right_on to point to the left index. Please see the code below:

churn.set_index('yearmonth', drop=True, inplace=True)
monthly_churn.reset_index(inplace=True)


churn = pd.merge(left=churn, right=monthly_churn, how='left', left_index = True, right_on = 'churn_month')
churn['total_churned'] = (churn.fillna(0))['total_churned'].astype(int)
churn.reset_index(drop=True, inplace=True)

churn.rename(columns={'churn_month':'yearmonth'}, inplace=True)
  • Reset the index of the dataframe on the right, monthly_churn
  • Merge the left and right dataframes with left index and right on
1 Like

Awesome!!! Thank you very much for going the extra mile!!

1 Like