Screen Link:
Hi, I’m working on the SQL module in the Data Analyst with Python course and I’m struggling on the most complex queries. Something that really bugs me is that, unlike what happens in Python, the code is not commented so if I get stuck on a passage I can’t find the right way to solve the problem. Anyway, this is one of the passages where I can’t wrap my head around it:
I’m working on the guided project and I want to categorize all purchases in purchases of whole albums vs purchases of songs. The guide says:
In order to answer the question, we’re going to have to identify whether each invoice has all the tracks from an album. We can do this by getting the list of tracks from an invoice and comparing it to the list of tracks from an album. We can find the album to compare the purchase to by looking up the album that one of the purchased tracks belongs to.
I tried and tried but could not solve it, so I looked at the solution:
%%sql
WITH invoice_first_track AS
(
SELECT
il.invoice_id invoice_id,
MIN(il.track_id) first_track_id
FROM invoice_line il
GROUP BY 1
)
SELECT
album_purchase,
COUNT(invoice_id) number_of_invoices,
CAST(count(invoice_id) AS FLOAT) / (
SELECT COUNT(*) FROM invoice
) percent
FROM
(
SELECT
ifs.*,
CASE
WHEN
(
SELECT t.track_id FROM track t
WHERE t.album_id = (
SELECT t2.album_id FROM track t2
WHERE t2.track_id = ifs.first_track_id
)
EXCEPT
SELECT il2.track_id FROM invoice_line il2
WHERE il2.invoice_id = ifs.invoice_id
) IS NULL
AND
(
SELECT il2.track_id FROM invoice_line il2
WHERE il2.invoice_id = ifs.invoice_id
EXCEPT
SELECT t.track_id FROM track t
WHERE t.album_id = (
SELECT t2.album_id FROM track t2
WHERE t2.track_id = ifs.first_track_id
)
) IS NULL
THEN "yes"
ELSE "no"
END AS "album_purchase"
FROM invoice_first_track ifs
)
GROUP BY album_purchase;
This is the part I do not understand:
SELECT t.track_id FROM track t
WHERE t.album_id = (
SELECT t2.album_id FROM track t2
WHERE t2.track_id = ifs.first_track_id
)
EXCEPT
SELECT il2.track_id FROM invoice_line il2
WHERE il2.invoice_id = ifs.invoice_id
) IS NULL
I get why we need to combine the two selections with ‘except’, and in my head we are comparing the tracks that appear in the ‘track’ table associated with a given album, vs. the tracks that appear on the invoice, but…could someone please patiently get me through it step by step?
Thanks.
Another thing that I could not solve in the same guided project (should I post it on a separate topic?) is how to categorize all countries with only one customer as ‘Other’.
I looked it up in the solution, and this is the part that troubles me:
WITH country_or_other AS
(
SELECT
CASE
WHEN (
SELECT count(*)
FROM customer
where country = c.country
) = 1 THEN "Other"
ELSE c.country
END AS country,
c.customer_id,
il.*
FROM invoice_line il
INNER JOIN invoice i ON i.invoice_id = il.invoice_id
INNER JOIN customer c ON c.customer_id = i.customer_id
)
specifically, the condition: where country = c.country …what does it mean?
Thank you for your patience!