Intermediate Joins in SQL: Step 4

Hi everyone,

Can someone explain step 4 in this mission to me? I am trying to understand the example that is explained, for calculating the number of tracks purchased by each artist. I don’t understand the lines:

SELECT
ta.artist_name artist,
COUNT(*) tracks_purchased

How does this calculate the number of tracks purchased, when the * wildcard selects all of the columns? I am also confused why we are joining the subquery (ta) with the invoice line table, and how this helps us. I am having a hard time conceptualizing this. Thanks!

I do not see the wildcard in the image.

Sorry I put the wrong screenshot up. Here is the correct picture.

It is just the answer to step #4 in the mission.

1 Like

COUNT(*) simply counts all the rows in the table that results from that query. In this particular case, since you are grouping by column 1 (denoted by the GROUP BY 1 line), the COUNT(*) line will count the number of rows (i.e. tracks) that each unique album has!

From the code it also looks like the ta table that is being joined is used to provide the title and name columns. I believe this answers your other question.

On a side-note, do keep in mind that generally it’s far better to copy-paste your code using the appropriate code formatting (all you have to do is add 3 back-ticks before and after your code), instead of using screenshots.

Hi thanks so much for your answer! So I have a couple of follow up questions:

  1. In the subquery, why do we need to select the column “t.track_id”? Is it because that is what we are using to join the subquery to the invoice_line table?

  2. In the answer, it actually groups by the first and second columns (GROUP BY 1, 2) and I don’t understand the reason for that. Shouldn’t just grouping by the first, “album” column be enough, since we are looking for the top album sales? We aren’t looking for the top sales according to artist, so I don’t understand why we group both columns.

Here is the code again pasted below. Thanks so much! (PS. I am not sure how you want me to format it for you. But I copy and pasted it so hopefully it is readable.

SELECT
    ta.album_title album, 
    ta.artist_name artist, 
    COUNT(*) tracks_purchased

FROM invoice_line il

INNER JOIN(
           SELECT
                t.track_id,
                al.title album_title,
                ar.name artist_name
           FROM track t
           INNER JOIN album al ON al.album_id = t.album_id
           INNER JOIN artist ar ON ar.artist_id = al.artist_id
           ) ta
           ON ta.track_id = il.track_id
           
GROUP BY 1, 2
ORDER BY 3 DESC LIMIT 5;

No worries! And yeah, you formatted it perfect.

Yep, that would be it. The sub-query, aliased “ta”, uses ta.track_id to join to the other table on il.track_id.

You’re also right here. The answer works fine in the mission with only GROUP BY 1 as you saw.

Something like GROUP BY 1, 2 would ordinarily be used to introduce more granularity into the resulting table, because there would be more unique groups to produce a result for.

In this case, however, since every unique album_title only has 1 unique artist_name, it doesn’t actually introduce any further granularity.