EDIT:
Okay! I got where my logic was wrong.
My initial script worked wonders except in the scenario where a customer bought tracks from a single album BUT didn’t buy all tracks from said album. Since my algorithm would test if all the tracks from an invoice come from a single album, this would end up as a false positive as it’s NOT an album sale.
So what I did was add a column to the album_tracks table that counted the number of tracks from each album. Then when I did the final query, it would check if all the tracks in the invoice came from a single album AND if the customer got all the tracks from said album.
Then I got the exact same results as Dataquest solution, except I didn’t use the EXCEPT algorithm. I’ll post the new query in a stand-alone comment.
I feel my process follows the steps required by the question:
Classify invoices as album / not album.
Then find amount and percentage.
I’m getting
album_or_not number_invoices percentage
Album 171 27.9
Not Album 443 72.1
and Dataquest says it should be:
album_or_not number_invoices percentage
Album 114 18.6
Not Album 500 81.4
I didn’t use the EXCEPT process, which is likely why my answer differs, but what I want to know is why my process offers a different answer than the EXCEPT process since it appears to me the underlying logic is the same.
q='''
/* Here I make a list of all the
tracks in the invoices */
WITH invoice_tracks AS (
SELECT
il.invoice_id,
il.track_id,
it.track_count
FROM invoice_line il
INNER JOIN (
SELECT
invoice_id,
COUNT(track_id) track_count
FROM invoice_line
GROUP BY invoice_id
) AS it
ON it.invoice_id = il.invoice_id
),
/* Here I make a list of all
the tracks in the albums */
album_tracks AS (
SELECT
al.title,
tr.track_id
FROM album al
JOIN track tr
ON tr.album_id = al.album_id
)
/* Here I select my album/not album classification,
the number of invoices and the percentage */
SELECT
album_or_not,
COUNT(invoice_id) number_invoices,
ROUND(CAST(COUNT(invoice_id) AS FLOAT)/total_invoices,3)*100 percentage
FROM (
/* Here I select the invoice_id, a count of total invoices,
and a case of album classification based on if each invoice
has different tracks coming from different albums or not.
If all the tracks of an invoice come from a single album,
then it gets tagged as "album", if they come from more
than one album, then it gets tagged as "not album" */
SELECT
it.invoice_id,
(SELECT COUNT(DISTINCT invoice_id) FROM invoice_tracks) total_invoices,
CASE
WHEN
COUNT (DISTINCT title) = 1
THEN "Album"
ELSE "Not Album"
END AS album_or_not
FROM invoice_tracks it
INNER JOIN album_tracks at
ON at.track_id = it.track_id
GROUP BY it.invoice_id
)
GROUP BY album_or_not
'''
run_query(q)