191-7 | I'm getting a different answer, but I also feel mine is correct?

EDIT:

Okay! I got where my logic was wrong.
My initial script worked wonders except in the scenario where a customer bought tracks from a single album BUT didn’t buy all tracks from said album. Since my algorithm would test if all the tracks from an invoice come from a single album, this would end up as a false positive as it’s NOT an album sale.

So what I did was add a column to the album_tracks table that counted the number of tracks from each album. Then when I did the final query, it would check if all the tracks in the invoice came from a single album AND if the customer got all the tracks from said album.

Then I got the exact same results as Dataquest solution, except I didn’t use the EXCEPT algorithm. I’ll post the new query in a stand-alone comment.


I feel my process follows the steps required by the question:
Classify invoices as album / not album.
Then find amount and percentage.

I’m getting

album_or_not	number_invoices	percentage
Album	        171	            27.9
Not Album	    443	            72.1

and Dataquest says it should be:

album_or_not	number_invoices	percentage
Album	        114	            18.6
Not Album	    500	            81.4

I didn’t use the EXCEPT process, which is likely why my answer differs, but what I want to know is why my process offers a different answer than the EXCEPT process since it appears to me the underlying logic is the same.

q='''
/* Here I make a list of all the
tracks in the invoices */
WITH invoice_tracks AS (
            SELECT
            il.invoice_id,
            il.track_id,
            it.track_count
            FROM invoice_line il
            INNER JOIN (
                    SELECT
                    invoice_id,
                    COUNT(track_id) track_count
                    FROM invoice_line
                    GROUP BY invoice_id
                    ) AS it
                    ON it.invoice_id = il.invoice_id
            ),
/* Here I make a list of all
 the tracks in the albums */     
    album_tracks AS (
            SELECT
            al.title,
            tr.track_id
            FROM album al
            JOIN track tr
                ON tr.album_id = al.album_id
            )


/* Here I select my album/not album classification,
the number of invoices and the percentage */
SELECT
album_or_not,
COUNT(invoice_id) number_invoices,
ROUND(CAST(COUNT(invoice_id) AS FLOAT)/total_invoices,3)*100 percentage
FROM (

/* Here I select the invoice_id, a count of total invoices,
and a case of album classification based on if each invoice
 has different tracks coming from different albums or not.

If all the tracks of an invoice come from a single album,
then it gets tagged as "album", if they come from more
than one album, then it gets tagged as "not album" */
        SELECT
        it.invoice_id,
        (SELECT COUNT(DISTINCT invoice_id) FROM invoice_tracks) total_invoices,
        CASE
            WHEN
                COUNT (DISTINCT title) = 1
                THEN "Album"
            ELSE "Not Album"
        END AS album_or_not
        FROM invoice_tracks it
        INNER JOIN album_tracks at
                ON at.track_id = it.track_id
        GROUP BY it.invoice_id
    )
GROUP BY album_or_not
'''
run_query(q)
q='''

WITH invoice_tracks AS (
            SELECT
            il.invoice_id,
            il.track_id,
            it.track_count
            FROM invoice_line il
            INNER JOIN (
                    SELECT
                    invoice_id,
                    COUNT(track_id) track_count
                    FROM invoice_line
                    GROUP BY invoice_id
                    ) AS it
                    ON it.invoice_id = il.invoice_id
            ),

    
    album_tracks AS (
            SELECT
            al.title,
            tr.track_id,
            amount_tracks
            FROM album al
            JOIN track tr
                ON tr.album_id = al.album_id
/* Here's the new part where I add the amount of tracks from each album to the album_tracks table */
            JOIN (SELECT COUNT(tr.track_id) amount_tracks, al.title FROM album al
                JOIN track tr ON tr.album_id = al.album_id GROUP BY al.album_id) tracks
                ON tracks.title = al.title
            )



SELECT
album_or_not,
COUNT(invoice_id) number_invoices,
ROUND(CAST(COUNT(invoice_id) AS FLOAT)/total_invoices,3)*100 percentage
FROM (

        SELECT
        it.invoice_id,
        (SELECT COUNT(DISTINCT invoice_id) FROM invoice_tracks) total_invoices,
        CASE
            WHEN
/* And here's the new part where I added the new logic for album or not */
                COUNT (DISTINCT title) > 1 OR amount_tracks > COUNT(it.track_id)
                THEN "Not Album"
            ELSE "Album"
        END AS album_or_not
        FROM invoice_tracks it
        INNER JOIN album_tracks at
                ON at.track_id = it.track_id
        GROUP BY it.invoice_id
    )
GROUP BY album_or_not
'''
run_query(q)