Recursive joins (sql)

MISSION REFERENCE : https://app.dataquest.io/m/189/intermediate-joins-in-sql/5/recursive-joins

SELECT
e1.employee_id,
e2.employee_id supervisor_id
FROM employee e1
INNER JOIN employee e2 on e1.reports_to = e2.employee_id
LIMIT 4;

This is a query from the link above that’s supposed to join two columns in the same table. I’ve spent about an hour on this and i still don’t understand how it works. Can somebody please elaborate on what’s happening in the background?

Also, why do we have to use this format? Why can’t we just select the two columns:
SELECT employee_id, reports_to
WHERE reports_to IS NOT NULL
FROM employee

What’s the point of recursive joins?

3 Likes

It’s just coincidentally in this special case that the results look the same. That happens because the question did not ask for more details on the supervisors (the people being reported to).

Without join, you have information from 1 table, with join, you have information from 2 tables.
Imagine A reports to B.
Without join, you know the details (not just id, but demographics, email, number) of A, but only the id of B.
With join you know both the details of A and B.
So if you did not care about other details of B as in this DQ question, your solution probably is much faster and better.

Practically, recursive joins are needed when you need to manipulate information that appears in a single table. An example is in table recording transactions of users with a date time column, and you want to find how many transactions have occured before each transaction in the table, you would have a joining condition of table.datetime > table.datetime, then groupby user and aggregate by the number of timestamps in the right table for every timestamp in the left table.

2 Likes

Hello hanqi! I will use this thread to make another question regarding the same mission.
So far, I’ve had no problems changing the order of the joins comparisons, for instance:

INNER JOIN facts f ON facts.id=c.facts_id

give the same results as

INNER JOIN facts f ON c.facts_id=facts.id

However, in this mission, doing the same changes the result, with
(the correct one):

LEFT JOIN employee e2 ON e1.reports_to=e2.employee_id

to

LEFT JOIN employee e2 ON e2.employee_id=e1.reports_to

Could you please explain the reasoning behind this? Thanks =)

No matter LEFT or INNER, switching the order should not affect the results. I tested both variants in DQ and on https://sqliteonline.com and they are the same.

What exactly did you run and what results did you get? So someone else can reproduce the error.

Yes, i, too, was wondering why the order was different in this particular case only.

Hi
After 1 hour, I think I got the reason why the results are different if we change the order.

The reason is because we need to include the information of supervisor as well.
Therefore, e1 should be used with report_to (for ALL employees) while e2 should be used with employee_id (for supervisor)

2 Likes

can you explain in detail please ?
i did not get the reason why are results different if we change the order