Pd.merge() show error

Hi! I have 2 questions about the pd.merge() here

Screen Link: https://app.dataquest.io/m/468/business-metrics/7/date-wrangling

1, The right_index=True seems to reset index for a table, but it is used for which table here, for churn or monthly_churn, it looks like to the monthly_churn. but we did not clarify to python, how did python know how to work?

2, we learned pd.merge() in below format before. but it shows error. 'yearmonth' why?

Thank you!!
My Code:

churn = pd.merge(left=churn, right=monthly_churn, how="left", on="yearmonth", right_index=True)

Hi @candiceliu93,

we learned pd.merge() in below format before. but it shows error. 'yearmonth' why?

You are getting an error for that line of code because yearmonth is not a common column in churn and monthly_churn. As the column is only present in churn, you will have to use left_on option instead of on.

The option right_index = True does not reset the index for the table. It is merely telling the merge function to use the index of the right dataframe (monthly_churn in this case) as the column to join with the left dataframe (churn in this case). Since we cannot use the option of on in this scenario (there are no common columns in the two dataframes), we have to specify which columns to merge on for both the left (using left_on for churn) and the right (using right_index for monthly_churn) dataframe.

Hope this helps! Let me know if you have any more questions regarding this.

1 Like

So forpd.merge()function, if the values of columns of 2 datasets are same, just the column name different, then we have to specify which key of the dataset we want to use, in this case, we need to use the column of the left dataset,churn. then we have to use the right_index for the column of the right dataset which has the same value as the columns of left data set. Does it mean that we have to use left_on and right_index as a pair to merge 2 data sets that have no common column( i mean not the column name are not the same but the value are same, like yearmonth of churn and churn_month of montly_churn?

Yes, you are correct. You have to specify keys for both the dataframes when they don’t have the same column names. If the churn_month was a column instead of the index (which is the case here) in monthly_churn, then you would have used the option right_on instead of right_index.


Interesting that both options worked for me when I used local. But only right_index worked for Dataquest…

That is interesting, @maksym001. Perhaps one of the @moderators can throw some light on it.