Screen Link:
https://app.dataquest.io/m/191/guided-project%3A-answering-business-questions-using-sql/7/next-steps
My Code:
%%sql
WITH invoices_album_tracks_sold AS
(
SELECT
invoice_id,
album_id,
COUNT(t.track_id) number_tracks_sold
FROM invoice_line il
INNER JOIN track t on t.track_id = il.track_id
GROUP BY invoice_id, album_id
),
albums_tracks_info AS
(
SELECT
album_id,
count(track_id) track_on_cd
from track
GROUP BY album_id
),
invoices_with_full_album AS
(
SELECT
invoice_id,
iats.album_id,
CASE
WHEN (iats.number_tracks_sold = ati.track_on_cd and ati.track_on_cd > 2) THEN 1
ELSE 0
END AS full_album_sold
FROM invoices_album_tracks_sold iats
LEFT JOIN albums_tracks_info ati ON ati.album_id = iats.album_id
),
unique_invoice_id_albums_sold AS
(
SELECT
invoice_id,
SUM(full_album_sold) album_sold
FROM invoices_with_full_album
GROUP BY invoice_id
)
SELECT
COUNT(invoice_id) number_invoices,
SUM(album_sold) invoice_with_album_sold,
SUM(album_sold)/CAST(COUNT(invoice_id) as float) per_invoices_with_album
FROM unique_invoice_id_albums_sold
What I expect to happen:
The approach I employed to answer the question of the full album vs individual tracks is different from the proposed solution. I get a similar percentage of invoices with at least one full album sold (18% vs 18.6 in the solution).
But is this approach correct?
Here the logic behind it :
- one subquery: group the invoice_line database by invoice_id & by album_id and add the number of tracks bought from each album.
- one subquery: group the track database by album and add the number of tracks on the album
- left join the album_db to the invoice_db on the album_id
- create a binary if the selling of for each album was a full album by comparing the number of traks for each albums in each invoice vs the normal number of tracks
- group again by invoice_id and sum the number of album per invoice
- conclude
Thank you for your input!