Query Explanation
Summary
We created a subquery invoice_first_track
that finds the first_track_id
from the invoice_line
which we use to determine whether an invoice was album purchase or individual track purchases. Then we performed aggregations on the result to find out the number of invoices and the percentage of them separately for album purchase as well as individual track purchases.
Granular View
SELECT
*
FROM
invoice_line
(4757 Rows)
SELECT
il.invoice_id invoice_id, -- 23 (First Value)
MIN(il.track_id) first_track_id -- 1 (First Value)
FROM
invoice_line il
(1 Row)
SELECT
il.invoice_id invoice_id,
MIN(il.track_id) first_track_id
FROM
invoice_line il
GROUP BY
1
(614 Rows)
The above query finds the track_id
with the lowest number in the invoice_line
table for each invoice_id
. However since an invoice can either be an album purchase or individual track purchases, we cannot say that every track_id we have is the first track of the album. It is only first, if the invoice was an album purchase. However, this track_id
will help us to identify whether it was an album purchase or individual track purchases. And we can do so by getting all the tracks in an album which contains our first track_id
and comparing it with all the tracks in the invoice, if both are returning the same set of tracks then it is an album purchase, otherwise it is individual track purchases. So let’s save this as a subquery with the name invoice_first_track
.
WITH invoice_first_track AS (
SELECT
il.invoice_id invoice_id,
MIN(il.track_id) first_track_id
FROM
invoice_line il
GROUP BY
1
)
Since this is bit complicated query, we will separate the complicated part and use some abstraction to make it appear simple.
SELECT
ifs.*,
CASE WHEN (
SELECT
t.track_id
FROM
track t
WHERE
t.album_id = (
SELECT
t2.album_id
FROM
track t2
WHERE
t2.track_id = ifs.first_track_id
)
EXCEPT
SELECT
il2.track_id
FROM
invoice_line il2
WHERE
il2.invoice_id = ifs.invoice_id
) IS NULL
AND (
SELECT
il2.track_id
FROM
invoice_line il2
WHERE
il2.invoice_id = ifs.invoice_id
EXCEPT
SELECT
t.track_id
FROM
track t
WHERE
t.album_id = (
SELECT
t2.album_id
FROM
track t2
WHERE
t2.track_id = ifs.first_track_id
)
) IS NULL THEN "yes" ELSE "no" END AS "album_purchase"
FROM
invoice_first_track ifs
(614 Rows)
If we replace the query that identifies whether the tracks in the invoice and tracks in the album were same or not with an abstract name called remaining_tracks
then the above query is very easy to understand and it immediately becomes too short.
SELECT
ifs.*,
CASE WHEN remaining_tracks IS NULL THEN "yes" ELSE "no" END AS "album_purchase"
FROM
invoice_first_track ifs
(614 Rows)
Now let’s leave the remaining_tracks query explanation there until we check it’s outer query.
SELECT
*
FROM
invoice
(614 Rows)
SELECT
album_purchase, -- yes
COUNT(invoice_id) number_of_invoices, -- 614
CAST(
count(invoice_id) AS FLOAT
) / (
SELECT
COUNT(*)
FROM
invoice
) percent -- 1.0
FROM
(
SELECT
ifs.*,
CASE WHEN remaining_tracks IS NULL THEN "yes" ELSE "no" END AS "album_purchase"
FROM
invoice_first_track ifs
)
(1 Row)
SELECT
album_purchase,
COUNT(invoice_id) number_of_invoices,
CAST(
count(invoice_id) AS FLOAT
) / (
SELECT
COUNT(*)
FROM
invoice
) percent
FROM
(
SELECT
ifs.*,
CASE WHEN remaining_tracks IS NULL THEN "yes" ELSE "no" END AS "album_purchase"
FROM
invoice_first_track ifs
)
GROUP BY
album_purchase;
(2 Rows)
Select all rows from our invoice_first_track
subquery and if the result of remaining_tracks
is NULL
then assign "yes"
to album_purchase
else assign "no"
to album_purchase
. Find the number of invoices and there percentage of invoices grouped by album_purchase
(yes and no). Yes, the entire query was as simple as that. So let’s move on to the query that appears to be a complicated one. But before that, we need to make sure that we understand how EXCEPT
works. EXCEPT
is basically used to subtract one set of rows from another. Let’s say a query A
generated 5 rows (1, 2, 3, 4, 5) and query B
generated 4 rows (1, 2, 3, 4). If we subtract B
from A
, we will get (5). That is, we will get rows that are in A
but not in B
. Now let’s subtract A
from B
. We will get NULL
because there are no rows in B
that are not in a A
. So let’s say query A
is album_tracks
and B
is invoice_tracks
. So to be considered as album_purchase
, (A - B) AND (B - A)
should evaluate to True. However, as there is difference between the results of A
and B
, (A - B) AND (B - A)
will become 5 AND NULL
. And this will evaluate to False. Now that we understand the concept, let’s dive into understand remaining_tracks
. We will explore from outer query to inner query by using abstract names.
SELECT
ifs.*,
CASE WHEN remaining_tracks IS NULL THEN "yes" ELSE "no" END AS "album_purchase"
FROM
invoice_first_track ifs
SELECT
ifs.*,
CASE WHEN (
album_tracks -- A
EXCEPT -- -
invoice_tracks -- B
) IS NULL
AND (
invoice_tracks -- B
EXCEPT -- -
album_tracks -- A
) IS NULL THEN "yes" ELSE "no" END AS "album_purchase"
FROM
invoice_first_track ifs
SELECT
ifs.*,
CASE WHEN (
SELECT
t.track_id
FROM
track t
WHERE
t.album_id = album_id_first_track
EXCEPT
SELECT
il2.track_id
FROM
invoice_line il2
WHERE
il2.invoice_id = ifs.invoice_id
) IS NULL
AND (
SELECT
il2.track_id
FROM
invoice_line il2
WHERE
il2.invoice_id = ifs.invoice_id
EXCEPT
SELECT
t.track_id
FROM
track t
WHERE
t.album_id = album_id_first_track
) IS NULL THEN "yes" ELSE "no" END AS "album_purchase"
FROM
invoice_first_track ifs
SELECT
ifs.*,
CASE WHEN (
SELECT
t.track_id
FROM
track t
WHERE
t.album_id = (
SELECT
t2.album_id
FROM
track t2
WHERE
t2.track_id = ifs.first_track_id
)
EXCEPT
SELECT
il2.track_id
FROM
invoice_line il2
WHERE
il2.invoice_id = ifs.invoice_id
) IS NULL
AND (
SELECT
il2.track_id
FROM
invoice_line il2
WHERE
il2.invoice_id = ifs.invoice_id
EXCEPT
SELECT
t.track_id
FROM
track t
WHERE
t.album_id = (
SELECT
t2.album_id
FROM
track t2
WHERE
t2.track_id = ifs.first_track_id
)
) IS NULL THEN "yes" ELSE "no" END AS "album_purchase"
FROM
invoice_first_track ifs
Now that we have successfully understood, how that query works, let’s do some number crunching. One thing to note is that, we won’t be able to understand the output of the current query because the query does the same process for all the invoices. So let’s pick a first_track_id
to make it easier.
SELECT
t2.album_id
FROM
track t2
WHERE
t2.track_id = 1158 -- 91
(1 Row)
SELECT
t.track_id
FROM
track t
WHERE
t.album_id = (
SELECT
t2.album_id
FROM
track t2
WHERE
t2.track_id = 1158
)
(16 Rows)
SELECT
il2.track_id
FROM
invoice_line il2
WHERE
il2.invoice_id = 1
(16 Rows)
SELECT
t.track_id
FROM
track t
WHERE
t.album_id = (
SELECT
t2.album_id
FROM
track t2
WHERE
t2.track_id = 1158
)
EXCEPT
SELECT
il2.track_id
FROM
invoice_line il2
WHERE
il2.invoice_id = 1
(0 Rows) - NULL
Now let’s do the reverse (B - A).
SELECT
il2.track_id
FROM
invoice_line il2
WHERE
il2.invoice_id = 1
EXCEPT
SELECT
t.track_id
FROM
track t
WHERE
t.album_id = (
SELECT
t2.album_id
FROM
track t2
WHERE
t2.track_id = 1158
)
(0 Rows) - NULL
This is definitely an album purchase. However, let’s proceed and verify that.
SELECT
ifs.*,
CASE WHEN (
SELECT
t.track_id
FROM
track t
WHERE
t.album_id = (
SELECT
t2.album_id
FROM
track t2
WHERE
t2.track_id = ifs.first_track_id
)
EXCEPT
SELECT
il2.track_id
FROM
invoice_line il2
WHERE
il2.invoice_id = ifs.invoice_id
) IS NULL
AND (
SELECT
il2.track_id
FROM
invoice_line il2
WHERE
il2.invoice_id = ifs.invoice_id
EXCEPT
SELECT
t.track_id
FROM
track t
WHERE
t.album_id = (
SELECT
t2.album_id
FROM
track t2
WHERE
t2.track_id = ifs.first_track_id
)
) IS NULL THEN "yes" ELSE "no" END AS "album_purchase" -- yes
FROM
invoice_first_track ifs
WHERE
ifs.invoice_id = 1
(1 Row)
And the above process happens with every single invoices so finding the rows of each tiny query is not easy for this one.