SQL Guided project question

Hi,

I am working on the last question of this the SQL project. In particular I can not understand that why call out the album_id for identifying an album track instead of only condition on t.track_id = il.track_id

I include the code from the suggested answer

albums_vs_tracks = '''
WITH invoice_first_track AS
    (
     SELECT
         il.invoice_id invoice_id,
         MIN(il.track_id) first_track_id
     FROM invoice_line il
     GROUP BY 1
    )

SELECT
    album_purchase,
    COUNT(invoice_id) number_of_invoices,
    CAST(count(invoice_id) AS FLOAT) / (
                                         SELECT COUNT(*) FROM invoice
                                      ) percent
FROM
    (
    SELECT
        ifs.*,
        CASE
            WHEN
                 (
                  SELECT t.track_id FROM track t
                  WHERE t.album_id = (
                                      SELECT t2.album_id FROM track t2
                                      WHERE t2.track_id = ifs.first_track_id
                                     ) 

                  EXCEPT 

                  SELECT il2.track_id FROM invoice_line il2
                  WHERE il2.invoice_id = ifs.invoice_id
                 ) IS NULL
             AND
                 (
                  SELECT il2.track_id FROM invoice_line il2
                  WHERE il2.invoice_id = ifs.invoice_id

                  EXCEPT 

                  SELECT t.track_id FROM track t
                  WHERE t.album_id = (
                                      SELECT t2.album_id FROM track t2
                                      WHERE t2.track_id = ifs.first_track_id
                                     ) 
                 ) IS NULL
             THEN "yes"
             ELSE "no"
         END AS "album_purchase"
     FROM invoice_first_track ifs
    )
GROUP BY album_purchase;
'''

run_query(albums_vs_tracks)

For clarity, the sinppet bugs me is this part

                  SELECT t.track_id FROM track t
                  WHERE t.album_id = (
                                      SELECT t2.album_id FROM track t2
                                      WHERE t2.track_id = ifs.first_track_id

However, I check that all t.track_id is matched with an t.album_id, in other words, as long as the t.track_id = ifs.first_track_id, it is an album track? so , I wonder why I can not put it like the following,

SELECT t.track_id from track t
WHERE t.track_id = ifs.first_track_id

Running with the above sippnet does not match with the suggested answer result but I just want to know what wrong with my logic. Anyone may shed some light? thank you.

1 Like

Monster question. I remember having to debug this solution before it is in its current form, a nightmare.
That screen’s instructions need another rewrite too because it’s still contradictory.

Would you clarify what you mean by “matched with an”? Do you mean every track_id integer has an album_id integer rather than 1 of them being NULL, so you conclude every track belongs to an album?

This question was not asking whether a track(s) that someone bought comes from an album or is sold as a single.(and therefore having NULL value in album_id) You have correctly discovered that every track has an associated album.
It is asking whether all the tracks bought by a customer CONSIDERED TOGETHER make up an entire album. So if the customer bought 2 tracks, the question is are those 2 from the same album. Also, the A except B and B except A pattern is to ensure the customer not only bought the full album, but nothing more beyond that. A here represents all albums in the track, B represents all the tracks he bought.
What is unbearably confusing is the instructions contradicting/making parts of the solution redundant, but let’s focus on the solution code for now assuming it’s the correct way to solve a contrived scenario.

You suggestion will not answer the question because it is not looking for all the tracks from an album. Which makes the “A B” test i described above impossible.

I’ll describe the solution step by step

  1. Look at what the customer bought. He has an invoice with numerous tracks (invoice_line_id ) on it.

  2. Take any track (solution used MIN, but MAX gives same answer) and identify which album it comes from, then extract all tracks from that same album (we are assuming he is buying an entire album here, and proof by contradiction later, this is also like hypothesis testing where you assume the null hypothesis is true until you find evidence to reject it)

  3. Check whether all tracks coming from that album are included in the tracks in the invoice. If null, it is. (A EXCEPT B)

  4. Check whether all tracks in the invoice are included in the album, If null, it is. (B EXCEPT A)

  5. Label the invoice of the customer Yes, No depending on above analysis

  6. Count number of invoices that were full album (and nothing more) purchases and number that are not

  7. Divide each of both numbers from 6. by a constant (total number of invoices) This total would be the total number of purchases (defined as an invoice) no matter that invoice is Yes or No album purchase.

3 Likes