%%sql
SELECT g.name genre,
COUNT(invoice_line_id) abs_total,
CAST(COUNT(invoice_line_id) AS FLOAT)/
CAST((SELECT COUNT(*) FROM invoice_line) AS FLOAT) count_ratio
FROM customer c
INNER JOIN invoice i ON c.customer_id = i.customer_id
INNER JOIN invoice_line il ON i.invoice_id = il.invoice_id
INNER JOIN track t ON il.track_id = t.track_id
INNER JOIN genre g ON g.genre_id = t.genre_id
GROUP BY genre
HAVING c.country = 'USA'
ORDER BY 3 DESC;
What I expected to happen: The same result as the solution key.
With those multiple joins, you’re creating a large table with all the information you need. When you use aggregate functions, you’re counting on the whole table instead of just counting for those whose country is USA.
You don’t want HAVING c.country = 'USA', you want WHERE c.country = 'USA' in the appropriate place.
Hello! Thank you for the reply. Though I am still confused because if I just change
HAVING c.country = 'USA'
to
WHERE c.country ='USA'
I get an syntax error. I suppose I can’t use WHERE after GROUP BY?
Is it the reason that the solution key creates a WITH clause to first restrain data to USA, and then join the rest of the tables and do aggregation using GROUP BY?
And why does HAVING c.country = ‘USA’ not work in this scenario?
Solution key:
%%sql
WITH usa_tracks_sold AS
(
SELECT il.* FROM invoice_line il
INNER JOIN invoice i on il.invoice_id = i.invoice_id
INNER JOIN customer c on i.customer_id = c.customer_id
WHERE c.country = "USA"
)
SELECT
g.name genre,
count(uts.invoice_line_id) tracks_sold,
cast(count(uts.invoice_line_id) AS FLOAT) / (
SELECT COUNT(*) from usa_tracks_sold
) percentage_sold
FROM usa_tracks_sold uts
INNER JOIN track t on t.track_id = uts.track_id
INNER JOIN genre g on g.genre_id = t.genre_id
GROUP BY 1
ORDER BY 2 DESC
LIMIT 10;
Exactly, not in the way you want to; WHERE is the regular table filter, while HAVING is the filter for the immediate result of a GROUP BY.
I wouldn’t say that’s the reason, but that’s one of the advantadges.
The answer is what I said before:
When you use aggregate functions, you’re counting on the whole table instead of just counting for those whose country is USA .
To add to this, HAVING only executes after the GROUP BY calculation is complete, this means that it ends up not filtering way rows whose country isn’t USA.
Thank you again!
As you said, I think " while HAVING is the filter for the immediate result of a GROUP BY ." is the key. In my code when HAVING is running the aggregation is already finished so this line is of no use, if what I think is right.