Screen Link: https://app.dataquest.io/m/191/guided-project%3A-answering-business-questions-using-sql/2/overview-of-the-data
I am working with chinook.db in Guided Project: Answering Business Questions using SQL.
On step 2 I noticed a problem with INNER JOIN clause and can’t understand the solution.
When I write this code:
%%sql
SELECT
c.customer_id,
i.customer_id
FROM genre g
INNER JOIN track t ON t.genre_id = g.genre_id
INNER JOIN invoice_line i_l ON i_l.track_id = t.track_id
INNER JOIN invoice i ON i.invoice_id = i_l.invoice_id
LEFT JOIN customer c ON c.customer_id = i.invoice_id
LIMIT 30;
I expected to happend this:
customer_id customer_id_1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
2 2
2 2
2 2
2 2
2 2
2 2
2 2
2 2
2 2
2 2
2 2
3 3
3 3
3 3
3 3
3 3
3 3
What actually happened:
customer_id customer_id_1
1 18
1 18
1 18
1 18
1 18
1 18
1 18
1 18
1 18
1 18
1 18
1 18
1 18
1 18
1 18
1 18
2 30
2 30
2 30
2 30
2 30
2 30
2 30
2 30
2 30
2 30
3 40
3 40
4 18
4 18
As i think INNER JOIN or any JOING clause is matching the values of customer_id columns (in my case) from both table.
By writing this code everything works right:
%%sql
SELECT
c.customer_id,
i.customer_id
FROM invoice i
LEFT JOIN customer c ON i.customer_id = c.customer_id
LIMIT 30;
Output is:
customer_id customer_id_1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
2 2
2 2
2 2
2 2
2 2
2 2
2 2
2 2
2 2
2 2
2 2
3 3
3 3
3 3
3 3
3 3
3 3
So we see the matching is correct.
Can someone help to understand?