Okay! I got where my logic was wrong.
My initial script worked wonders except in the scenario where a customer bought tracks from a single album BUT didn’t buy all tracks from said album. Since my algorithm would test if all the tracks from an invoice come from a single album, this would end up as a false positive as it’s NOT an album sale.
So what I did was add a column to the album_tracks table that counted the number of tracks from each album. Then when I did the final query, it would check if all the tracks in the invoice came from a single album AND if the customer got all the tracks from said album.
Then I got the exact same results as Dataquest solution, except I didn’t use the EXCEPT algorithm. I’ll post the new query in a stand-alone comment.
I feel my process follows the steps required by the question:
Classify invoices as album / not album.
Then find amount and percentage.
album_or_not number_invoices percentage Album 171 27.9 Not Album 443 72.1
and Dataquest says it should be:
album_or_not number_invoices percentage Album 114 18.6 Not Album 500 81.4
I didn’t use the EXCEPT process, which is likely why my answer differs, but what I want to know is why my process offers a different answer than the EXCEPT process since it appears to me the underlying logic is the same.
q=''' /* Here I make a list of all the tracks in the invoices */ WITH invoice_tracks AS ( SELECT il.invoice_id, il.track_id, it.track_count FROM invoice_line il INNER JOIN ( SELECT invoice_id, COUNT(track_id) track_count FROM invoice_line GROUP BY invoice_id ) AS it ON it.invoice_id = il.invoice_id ), /* Here I make a list of all the tracks in the albums */ album_tracks AS ( SELECT al.title, tr.track_id FROM album al JOIN track tr ON tr.album_id = al.album_id ) /* Here I select my album/not album classification, the number of invoices and the percentage */ SELECT album_or_not, COUNT(invoice_id) number_invoices, ROUND(CAST(COUNT(invoice_id) AS FLOAT)/total_invoices,3)*100 percentage FROM ( /* Here I select the invoice_id, a count of total invoices, and a case of album classification based on if each invoice has different tracks coming from different albums or not. If all the tracks of an invoice come from a single album, then it gets tagged as "album", if they come from more than one album, then it gets tagged as "not album" */ SELECT it.invoice_id, (SELECT COUNT(DISTINCT invoice_id) FROM invoice_tracks) total_invoices, CASE WHEN COUNT (DISTINCT title) = 1 THEN "Album" ELSE "Not Album" END AS album_or_not FROM invoice_tracks it INNER JOIN album_tracks at ON at.track_id = it.track_id GROUP BY it.invoice_id ) GROUP BY album_or_not ''' run_query(q)