SQL left join Building and Organizing Complex Queries 3/10

https://app.dataquest.io/m/190/building-and-organizing-complex-queries/3/the-with-clause

This is the sub-query supplied in the answer:

WITH playlist_info AS
    (
     SELECT
         p.playlist_id,
         p.name playlist_name,
         t.name track_name,
         (t.milliseconds / 1000) length_seconds
     FROM playlist p
     LEFT JOIN playlist_track pt ON pt.playlist_id = p.playlist_id
     LEFT JOIN track t ON t.track_id = pt.track_id
    )

I have no idea how t.name is there. Left join should include all records from the left (first) database and only matching records from the second.

Left Join track (as t) to playlist (as pt) using track_id.

The only thing track and playlist_track have in common is track_id. How is t.name accessible? I think it should be dropped.

Hey there!

I think there’s been a bit of a misunderstanding here about how JOINS work — JOINS drop rows not columns. All of the columns of the table being joined to the original table can still be selected from in the query.

So, in this query, you take all the track_id values from the playlist_track table, then JOIN the entire row from track if its track_id value matches. Only rows where the track_id value in the track table doesn’t match something from the playlist_track.track_id column will be dropped.

Thank you! I appreciate you taking the time to explain that.

Hi,
I don’t want to open a new discussion once there is one already existing.
I have a suggestion:
In my opinion the instruction to this exercise is not clear enough, it says:

  • length_seconds - The sum of the length of the playlist in seconds.

And the correct answer is:
(t.milliseconds / 1000) length_seconds and then:
SUM(length_seconds) length_seconds

There is a disadvantage of this approach because we lose accuracy due to the approximation to seconds by cutting the last three digits.
image
There are two different approaches which come to mind:

  1. Start with making a sum and then cut three last digits:
    SUM(t.milliseconds)/1000 as length_seconds
    The advantage is that we can first sum all milliseconds and at the end cut last 3 digits while in your approach we are losing accuracy for every track.
    image
  2. The most accurate approach is to cast milliseconds to floats and at the end round them to seconds if required.
    ROUND(SUM(CAST(t.milliseconds as Float)/1000),0) as length_seconds
    image
    I think it would be good to have more precise instruction in terms of what should be the results. Because naturally I took the last approach and could pass the exercise.

Please let me know what do you think.

2 Likes

Hi @pawel.wisniewicz

I just stumbled on your post while looking for an explanation on this exercise, and I can tell you have indept understanding on why / how this should work. If it is not too much trouble, would you be kind enough to walk me through the steps?

For one, I don’t understand why t.milliseconds needs to be divided by 1000 and then summed. I simply went ahead and did this SUM(t.milliseconds) length_seconds.

I’m aware this is a fairly old post and you must be miles ahead of this stage in your learning. Hopefully, you’d see this and respond.

Thank you in advance.

Hello @dataquest

I need help unerstanding the answer to this exercise.

  • Why is it necessary to do this (t.milliseconds / 1000) length_seconds and then this SUM(length_seconds) in the outer query

  • Why GROUP BY p.playlist_id and p.playlist_name ? I only grouped by 1

  • I also do not get why we need t.name track_name in the subquery. If the intention is to COUNT number_of_tracks, I think this can be achieved by COUNTing the playlist_id ?

Please, can someone shed more light on this

Thank You

Hey! I’ll answer your questions in order below:

The track table only gives you the length of each track in milliseconds — the CTE we created using the WITH statement converts these to seconds, which we can then sum up in our outer query. There are other ways of doing this, of course, but it’s best practice to do most of your complicated joins in your CTE and/or subqueries.

Some flavors of SQL allow you to GROUP BY a single column — others, like PostgreSQL, will require that you GROUP BY all columns that aren’t being aggregated, so it’s a good idea to get into the practice of doing that.

And, for your last question, you are correct! Again, there are multiple ways of doing this, but we felt this was a more explicit way of showing how elements from a CTE can be utilized in your main query.

Thank you @dustindq for taking time to explain this.