Self Join - Identifying the correct columns

Screen Link: https://app.dataquest.io/m/464/intermediate-joins-in-sql/5/recursive-joins

The query:

select * 
FROM 
employee e1
inner join employee e2
on e1.reports_to  = e2.employee_id

returns the following result:

employee_id|last_name|first_name|title              |reports_to|birthdate          |
-----------|---------|----------|-------------------|----------|-------------------|
          2|Edwards  |Nancy     |Sales Manager      |         1|1958-12-08 00:00:00|
          3|Peacock  |Jane      |Sales Support Agent|         2|1973-08-29 00:00:00|
          4|Park     |Margaret  |Sales Support Agent|         2|1947-09-19 00:00:00|
          5|Johnson  |Steve     |Sales Support Agent|         2|1965-03-03 00:00:00|
          6|Mitchell |Michael   |IT Manager         |         1|1973-07-01 00:00:00|
          7|King     |Robert    |IT Staff           |         6|1970-05-29 00:00:00|
          8|Callahan |Laura     |IT Staff           |         6|1968-01-09 00:00:00|

However, if I swap e1 and e2, that is:

select * 
FROM 
employee e1
inner join employee e2
on e2.reports_to  = e1.employee_id     <---- changed 

I get the following:

employee_id|last_name|first_name|title          |reports_to|birthdate          |
-----------|---------|----------|---------------|----------|-------------------|
          1|Adams    |Andrew    |General Manager|          |1962-02-18 00:00:00|
          2|Edwards  |Nancy     |Sales Manager  |         1|1958-12-08 00:00:00|
          2|Edwards  |Nancy     |Sales Manager  |         1|1958-12-08 00:00:00|
          2|Edwards  |Nancy     |Sales Manager  |         1|1958-12-08 00:00:00|
          1|Adams    |Andrew    |General Manager|          |1962-02-18 00:00:00|
          6|Mitchell |Michael   |IT Manager     |         1|1973-07-01 00:00:00|
          6|Mitchell |Michael   |IT Manager     |         1|1973-07-01 00:00:00|

My question is how to use the correct columns in the ON clause?

@prateek

Kindly see this link.

I copied pasted the query into the linked page. It did not return what you pasted here.
You have truncated an entire table of columns and some columns of the other table too.
SELECT * should return all columns for all tables, which in this case is 2 times of the same set of columns since it’s a self join.

The order of your 2 queries doesn’t matter. It gives the same results, just with columns ordered differently.(controlled by default by table joining order from left to right). If you SELECT e1.*, e2.* in 1st query and SELECT e2.*, e1.* in the 2nd query, the output will be exactly the same.

SELECT e1.reports_to,e2.employee_id,e1.last_name,e2.last_name
FROM 
employee e1
inner join employee e2
on e1.reports_to  = e2.employee_id

SELECT e2.reports_to,e1.employee_id,e2.last_name,e1.last_name
FROM 
employee e1
inner join employee e2
on e2.reports_to  = e1.employee_id  

If you had done these 2 queries, the result will be exactly the same too. It’s as good as swapping aliases of e1 with e2 while the query structure never changed.

It doesn’t matter here because INNER JOIN is used. If an asymmetrical join like LEFT JOIN is used, then given strong enough test data, we can see differences.
I say strong enough test data means the difference can only be seen if there is an employee (CEO) who reports to no one.

Here is where the magic happens:
image

Both your queries will not return the 1st row in the picture above.
To get back the original inner join results, you can filter out NULL with WHERE

SELECT e1.reports_to,e2.employee_id,e1.last_name,e2.last_name
FROM 
employee e1
left join employee e2
on e1.reports_to  = e2.employee_id 
WHERE e1.reports_to IS NOT NULL

Here’s some details on how putting join conditions in WHERE or ON is the same in INNER JOIN, but has differences in non INNER JOIN: https://www.pluralsight.com/guides/using-on-versus-where-clauses-to-combine-and-filter-data-in-postgresql-joins