Guided Project: Answering Business Questions using SQL - Album Vs. Individual Tracks

Hi :slight_smile:,

Just got stuck in the last step of the guided project bellow (my fault, went on vacations, and spent more than a month without practicing/studying :smiley:) :

I understood the concept of the EXCEPT, what I didn’t understand is how to apply that to a single album_id Vs. a single invoice_id in order to compare a single table of the album XWZ to a single table of invoice XPTO.

So far I got here with this piece of code:


FROM track 'a'
WHERE album_id = 91
FROM invoice_line 'il'
INNER JOIN track 't' ON 'il'.track_id = t.track_id
WHERE invoice_id = 1

But this will only ‘compare’ the album_id 91 to the invoice_id 1. How can I compare each unique album with each unique invoice?? Can anyone shed a light or give me a hint?

Many thanks :slight_smile:

You’ll need a few more steps to get to the answer and use fewer fields, composer information isn’t requested in the question. The question brings together the different concepts you’ve learned: WITH (named subqueries), CASE statements, JOINs, and EXCEPT.

The clues are in the paragraph - I hope it helps:
“In order to answer the question, we’re going to have to identify whether each invoice has all the tracks from an album. We can do this by getting the list of tracks from an invoice and comparing it to the list of tracks from an album. We can find the album to compare the purchase to by looking up the album that one of the purchased tracks belongs to. It doesn’t matter which track we pick, since if it’s an album purchase, that album will be the same for all tracks.”

I added a few extra cells to the bottom of the notebook to test and build my SQL statements a bit at a time.

In the earlier missions I think we just talked about the concept of Nested Subqueries but to answer this guided project question, we need to use the correlated subqueries. Hope DataQuest adds a screen about Correlated Subqueries.

“With a normal nested subquery, the inner SELECT query runs first and executes once, returning values to be used by the main query. A correlated subquery, however, executes once for each candidate row considered by the outer query. In other words, the inner query is driven by the outer query.”