Run result doesn't match solution key(SQL intermediate guided project)

Screen Link:

My Code:

%%sql
    
SELECT g.name genre, 
       COUNT(invoice_line_id) abs_total,
       CAST(COUNT(invoice_line_id) AS FLOAT)/
        CAST((SELECT COUNT(*) FROM invoice_line) AS FLOAT) count_ratio
       FROM customer c 
       INNER JOIN invoice i ON c.customer_id = i.customer_id
       INNER JOIN invoice_line il ON i.invoice_id = il.invoice_id
       INNER JOIN track t ON il.track_id = t.track_id
       INNER JOIN genre g ON g.genre_id = t.genre_id
       GROUP BY genre
       HAVING c.country = 'USA'
       ORDER BY 3 DESC;

What I expected to happen: The same result as the solution key.

What actually happened: The results are different

genre abs_total count_ratio
Metal 619 0.130124
R&B/Soul 159 0.033424
Blues 124 0.026067
Jazz 121 0.025436
Pop 63 0.013244
Hip Hop/Rap 33 0.006937
Heavy Metal 8 0.001682
TV Shows 2 0.000420

Soution key:

genre tracks_sold percentage_sold
Rock 561 0.5337773549000951
Alternative & Punk 130 0.12369172216936251
Metal 124 0.11798287345385347
R&B/Soul 53 0.05042816365366318
Blues 36 0.03425309229305423
Alternative 35 0.03330161750713606
Latin 22 0.02093244529019981
Pop 22 0.02093244529019981
Hip Hop/Rap 20 0.019029495718363463
Jazz 14 0.013320647002854425
1 Like

Hey.

With those multiple joins, you’re creating a large table with all the information you need. When you use aggregate functions, you’re counting on the whole table instead of just counting for those whose country is USA.

You don’t want HAVING c.country = 'USA', you want WHERE c.country = 'USA' in the appropriate place.

1 Like

Hello! Thank you for the reply. Though I am still confused because if I just change

HAVING c.country = 'USA'

to

WHERE c.country ='USA'

I get an syntax error. I suppose I can’t use WHERE after GROUP BY?
Is it the reason that the solution key creates a WITH clause to first restrain data to USA, and then join the rest of the tables and do aggregation using GROUP BY?

And why does HAVING c.country = ‘USA’ not work in this scenario?

Solution key:

%%sql

WITH usa_tracks_sold AS
   (
    SELECT il.* FROM invoice_line il
    INNER JOIN invoice i on il.invoice_id = i.invoice_id
    INNER JOIN customer c on i.customer_id = c.customer_id
    WHERE c.country = "USA"
   )

SELECT
    g.name genre,
    count(uts.invoice_line_id) tracks_sold,
    cast(count(uts.invoice_line_id) AS FLOAT) / (
        SELECT COUNT(*) from usa_tracks_sold
    ) percentage_sold
FROM usa_tracks_sold uts
INNER JOIN track t on t.track_id = uts.track_id
INNER JOIN genre g on g.genre_id = t.genre_id
GROUP BY 1
ORDER BY 2 DESC
LIMIT 10;
1 Like

Exactly, not in the way you want to; WHERE is the regular table filter, while HAVING is the filter for the immediate result of a GROUP BY.

I wouldn’t say that’s the reason, but that’s one of the advantadges.

The answer is what I said before:

When you use aggregate functions, you’re counting on the whole table instead of just counting for those whose country is USA .

To add to this, HAVING only executes after the GROUP BY calculation is complete, this means that it ends up not filtering way rows whose country isn’t USA.

1 Like

Thank you again!
As you said, I think " while HAVING is the filter for the immediate result of a GROUP BY ." is the key. In my code when HAVING is running the aggregation is already finished so this line is of no use, if what I think is right.

1 Like