280-3 Selecting Albums to Purchase (Guided Project: Answering Business Questions using SQL)

Below is my code for the first problem in the project.

I’m having issues with the “pct_tracks_usa” subquery.

All other queries run separately (“pct_tracks_usa” subquery uses the other subqueries).

I’m getting an empty dataframe. Not sure why or how to troubleshoot. Clues?

q = '''

WITH 
n_genre_usa AS
    (
    SELECT
        il.track_id,
        g.name genre,
        COUNT(il.track_id) n_tracks_sold_usa
    FROM customer c
    INNER JOIN invoice i
        ON c.customer_id = i.customer_id
    INNER JOIN invoice_line il
        ON i.invoice_id = il.invoice_id
    INNER JOIN track t
        ON il.track_id = t.track_id
    INNER JOIN genre g
        ON t.genre_id = g.genre_id
    WHERE genre IN ('Hip Hop/Rap','Alternative & Punk','Pop','Blues')
    AND c.country = 'USA'
    GROUP BY t.genre_id
),
total_usa AS
    (
    SELECT
        il.track_id,
        CAST(COUNT(il.track_id) AS FLOAT) n_total
    FROM
    invoice_line il
    INNER JOIN invoice i 
        ON il.invoice_id = i.invoice_id
    INNER JOIN customer c
        ON i.customer_id = c.customer_id
    WHERE c.country = 'USA'
    ),
pct_tracks_usa AS       
    (
    SELECT
        ngu.track_id,
        (ngu.n_tracks_sold_usa/tu.n_total) n_tracks_sold_usa_pct
    FROM
    n_genre_usa ngu
    INNER JOIN total_usa tu
        ON ngu.track_id = tu.track_id
    ) 

SELECT
    ngu.genre,
    ngu.n_tracks_sold_usa,
    ptu.n_tracks_sold_usa_pct
FROM n_genre_usa ngu
LEFT JOIN pct_tracks_usa ptu
    ON ngu.track_id = ptu.track_id
'''

I attempted to do the calculation in the main sql query, instead.

Added “tu.n_total” to the columns, just to see if anything appears on it’s own without the calculation in “n_tracks_sold_usa_pct2”. Both are “None”. I’m really perplexed.

SELECT
    ngu.genre,
    ngu.n_tracks_sold_usa,
    (ngu.n_tracks_sold_usa / tu.n_total) n_tracks_sold_usa_pct2,
    tu.n_total
FROM n_genre_usa ngu
LEFT JOIN total_usa tu
    ON ngu.track_id = tu.track_id

This is just running the “tu.n_total” column from “total_usa tu” table, individually without the join.

It does output out the total.

SELECT
    tu.n_total
FROM total_usa tu

Hey there,

Not sure if this will help, but there are a few issues with your queries.

  • It’s not a good idea to select il.track_id and count it too, since that will just be a count of 1. Since there will still be one row per track_id, then the count per row will only include that track_id in the tally. If you want to get a count per genre, then only include fields specific to genre so it tallies correctly.
  • Whenever you include a count / sum / average, the GROUP BY clause needs to include all other non-aggregates. Not including those would lead to all sorts of issues, if not an error.
  • Also not sure why you used customer and invoice tables when you’re not using any of their fields on your subquery. Use as few tables (and as few fields from those tables) as possible, especially when you’re aggregating. Each included field and table can be a source of error, so streamline.
  • Test each subquery first, before moving on to another subquery or to your main query. The more complex your query is, the higher the probability for error, so make sure the independent units are solid before combining all of them.

Hope this helps!

2 Likes

Hey,

Thanks for the feedback.

So I did find that the query was structurally sound. And the problem came when I originally tested each individual query and forgot to re-add the GROUP BY in the “total_usa” subqery lol. So thanks for mentioning that.

So originally I had invoice and customer tables to get to the “country” column “c.country”, but upon checking “billing_country” in “invoice” table vs “country” in “customer” table, I realized it’s the same count (so I removed “customer” table and just kept “invoice”). But same results, just cleaner.

Now the assignment doesn’t specify what they mean by percent, so I presume percent-to-total (total being the count of all track_id’s purchased for those 4 genres for only USA).

That was the only thing off and realized I was joining on “track_id = track_id”. Of course, that was too low of a level and joined on “country”, instead (for “pct_tracks_usa” subquery).

Then finally, the main query duplicated the “n_tracks_sold_usa_pct” column 4 times per genre (realized it needed to aggregated or filtered on a high level attribute).

Added “genre_id” in the WHERE clause of the main query. All the existing and added columns were necessary and I actually added more to get it filter properly when joining (luckily GROUP BY aggregates at whichever designated level).

Finally, got the correct answer.

correct_query = '''

WITH 
n_genre_usa AS
    (
    SELECT
        i.billing_country country,
        g.name genre,
        t.genre_id,
        CAST(COUNT(il.track_id) AS FLOAT) n_tracks_sold_usa
    FROM invoice i
    INNER JOIN invoice_line il
        ON i.invoice_id = il.invoice_id
    INNER JOIN track t
        ON il.track_id = t.track_id
    INNER JOIN genre g
        ON t.genre_id = g.genre_id
    WHERE genre IN ('Hip Hop/Rap','Alternative & Punk','Pop','Blues')
    AND i.billing_country = 'USA'
    GROUP BY t.genre_id
),
total_usa AS
    (
    SELECT
        i.billing_country country,
        t.genre_id,
        CAST(COUNT(il.track_id) AS FLOAT) n_total
    FROM
    invoice_line il
    INNER JOIN invoice i 
        ON il.invoice_id = i.invoice_id
    INNER JOIN track t
        ON il.track_id = t.track_id
    WHERE i.billing_country = 'USA'
    GROUP BY country
    ),
pct_tracks_usa AS       
    (
    SELECT
        ngu.country,
        ngu.genre_id,
        ROUND(ngu.n_tracks_sold_usa/tu.n_total, 4) * 100 n_tracks_sold_usa_pct
    FROM
    n_genre_usa ngu
    INNER JOIN total_usa tu
        ON ngu.country = tu.country
    ) 

SELECT
    ngu.genre,
    ngu.n_tracks_sold_usa,
    ptu.n_tracks_sold_usa_pct
FROM n_genre_usa ngu
LEFT JOIN pct_tracks_usa ptu
    ON ngu.country= ptu.country
WHERE ngu.genre_id = ptu.genre_id

'''
print(run_query(correct_query))