Hi all. I’d like to see if there is any opinion in terms of the difference between these two queries. One is the one I turned in, and was marked correct, and the system’s proper answer. Thanks!
select alb.title album, art.name artist, sum(il.quantity) tracks_purchased
from invoice_line il
inner join track t on il.track_id = t.track_id
inner join album alb on t.album_id = alb.album_id
inner join artist art on art.artist_id = alb.artist_id
group by album
order by tracks_purchased desc
FROM invoice_line il
INNER JOIN (
FROM track t
INNER JOIN album al ON al.album_id = t.album_id
INNER JOIN artist ar ON ar.artist_id = al.artist_id
ON ta.track_id = il.track_id
GROUP BY 1, 2
ORDER BY 3 DESC LIMIT 5;
Your answer did not group by artist, what happens when there is an album with 2 artists? Which artist is shown? (logically not an issue in this example since album artist usually 1:1 relationship, but may not be so lucky in other cases, also some sql dialects will not allow you to have columns in select that neither got aggregated nor grouped.)
Some other formatting issues making it a little harder to read are that your first two inner joins and the third inner join have the tables in the on condition in different orders (matter of readability only).
Just a note: sum() you used and count() answer used both give same answer because the sum is summing all 1. Good to check if really all 1 before using sum().
I never had any issue with “albums with the same name”, what I was pointing out in my 1st reply is the artist field, the 2nd field in SELECT. It is not in a GROUP BY and not in any aggregation function. That is the issue I raised and explained through an article linked.
I don’t understand how would “albums with the same name” become an issue. Doesn’t same name imply same album, and different name imply different album? I mean a normal person differentiates albums by their names. If there is indeed the problem of same album having different name or different album having same name, the album name field should be cleaned (preprocessing). If GROUPBY turned out too granular for “same album having different name”, then just post-process it. Never bad to be too granular, but you can’t go in reverse from more aggregated to less aggregated.
For performance, practically nobody sits around reasoning with theory except in interviews, the time spent reasoning is better spent just running both versions and timing them. I’m not familiar enough with SQL but one way is to print the execution plan which an optimised preview of how the engine will run the query. However, same SQL you wrote can become different execution plans on different servers or different versions of the same sql engine and likely among different sql engines too. Whether different SQL can become same execution plan I’ve yet to find.
Some heuristics in tools like pandas that do not have the “rewriting your code to make it faster” capabilities:
Groupby 1 col is faster than groupby 2 or more columns
Row filter before joining (analogous to constraining row growth as much as possible with subqueries/CTE in SQL) better than joining before row filter