Stuck with SQL queries on Guided Project: Answering Business Questions Using SQL

Screen Link:

Hi, I’m working on the SQL module in the Data Analyst with Python course and I’m struggling on the most complex queries. Something that really bugs me is that, unlike what happens in Python, the code is not commented so if I get stuck on a passage I can’t find the right way to solve the problem. Anyway, this is one of the passages where I can’t wrap my head around it:

I’m working on the guided project and I want to categorize all purchases in purchases of whole albums vs purchases of songs. The guide says:

In order to answer the question, we’re going to have to identify whether each invoice has all the tracks from an album. We can do this by getting the list of tracks from an invoice and comparing it to the list of tracks from an album. We can find the album to compare the purchase to by looking up the album that one of the purchased tracks belongs to.

I tried and tried but could not solve it, so I looked at the solution:

%%sql

WITH invoice_first_track AS
    (
     SELECT
         il.invoice_id invoice_id,
         MIN(il.track_id) first_track_id
     FROM invoice_line il
     GROUP BY 1
    )

SELECT
    album_purchase,
    COUNT(invoice_id) number_of_invoices,
    CAST(count(invoice_id) AS FLOAT) / (
                                         SELECT COUNT(*) FROM invoice
                                      ) percent
FROM
    (
    SELECT
        ifs.*,
        CASE
            WHEN
                 (
                  SELECT t.track_id FROM track t
                  WHERE t.album_id = (
                                      SELECT t2.album_id FROM track t2
                                      WHERE t2.track_id = ifs.first_track_id
                                     ) 

                  EXCEPT 

                  SELECT il2.track_id FROM invoice_line il2
                  WHERE il2.invoice_id = ifs.invoice_id
                 ) IS NULL
             AND
                 (
                  SELECT il2.track_id FROM invoice_line il2
                  WHERE il2.invoice_id = ifs.invoice_id

                  EXCEPT 

                  SELECT t.track_id FROM track t
                  WHERE t.album_id = (
                                      SELECT t2.album_id FROM track t2
                                      WHERE t2.track_id = ifs.first_track_id
                                     ) 
                 ) IS NULL
             THEN "yes"
             ELSE "no"
         END AS "album_purchase"
     FROM invoice_first_track ifs
    )
GROUP BY album_purchase;

This is the part I do not understand:

     SELECT t.track_id FROM track t
                  WHERE t.album_id = (
                                      SELECT t2.album_id FROM track t2
                                      WHERE t2.track_id = ifs.first_track_id
                                     ) 

                  EXCEPT 

                  SELECT il2.track_id FROM invoice_line il2
                  WHERE il2.invoice_id = ifs.invoice_id
                 ) IS NULL

I get why we need to combine the two selections with ‘except’, and in my head we are comparing the tracks that appear in the ‘track’ table associated with a given album, vs. the tracks that appear on the invoice, but…could someone please patiently get me through it step by step?

Thanks.

Another thing that I could not solve in the same guided project (should I post it on a separate topic?) is how to categorize all countries with only one customer as ‘Other’.
I looked it up in the solution, and this is the part that troubles me:

WITH country_or_other AS
    (
     SELECT
       CASE
           WHEN (
                 SELECT count(*)
                 FROM customer
                 where country = c.country
                ) = 1 THEN "Other"
           ELSE c.country
       END AS country,
       c.customer_id,
       il.*
     FROM invoice_line il
     INNER JOIN invoice i ON i.invoice_id = il.invoice_id
     INNER JOIN customer c ON c.customer_id = i.customer_id
    )

specifically, the condition: where country = c.country …what does it mean?

Thank you for your patience!

1 Like

Hello,

I completed the section maybe a week after you submitted your question, and I barely was able to solve it without looking up the solution.

Do you understand how the solution works now?

hello! thanks for reaching out!
not really, I’m still stuck there (actually moved on with the course but could not clarify the doubts) so if you have the patience to explain I’d be very grateful. thanks

I somewhat understand the concept, but not the SQL implementation. Considering my lack of understanding, maybe it’s better that we think of this as two people trying to figure out how it works rather than a top-down explanation.

Maybe I can clarify this section of your question first:

What I understand here is it’s just to retrieve the current row’s country, and then assign it as “Other” if the number of customers from the country is 1.

What happens first is we take the country value from customer (aliased as c) table which is part of the main query. Next, we’ll run a separate subquery that also refers to customer and select all rows that have the samecountry value as the one we retrieved from the main query. If after counting the selected rows and the value is 1, assign the value in the country column in main query as “Other”, or else keep it intact for other than 1.

Consider a hypothetical row that results from all the joins:

| customer_id | ... | ... | country     |
|-------------|-----|-----|-------------|
| 1           | ... | ... | DataCountry |

Pseudocode (not an exact rendition of how SQL actually works behind the scene):

1. Get the next row. # row 1
2. Get the country value for the row. # value is "DataCountry"
3. Create a subquery separate from the main query
    3.1 Query from the customer table
    3.2 Only select rows which has the same country value as the one in the current row of the main query. 
    # only select rows with country = "DataCountry"
    3.3 Count the number of rows
    3.4  Return the result to the main query
4. If the result from the subquery is 1, assign the current row country as "Other", and if not, keep the country value intact
5. Repeat from step 1

I’m not sure if you’ve already understood the second part of your question, but I’ll leave my explanation nonetheless in case it’s helpful for someone else.

1 Like