Going fast! #DataquestChallenge Premium Annual Offer:
500 get 50% & the next 1000 get 40% off.
GET OFFER CODE

SQL Business Analysis Project: Album vs. individual tracks - alternative solution

Screen Link:
https://app.dataquest.io/m/191/guided-project%3A-answering-business-questions-using-sql/6/albums-vs-individual-tracks

Hello,

I couldn’t find a way to solve this task with the EXCEPT-Operator so I came up with another solution. Instead of the EXCEPT I counted the DISTINCT track_id’s from the album_ids.

My Code:

WITH cat_purchase AS
(
     SELECT
        il.invoice_id,
        CASE
            WHEN
                COUNT(DISTINCT t.album_id) = 1
                AND 
                COUNT(DISTINCT t.track_id) = c.count_album_tracks
                THEN 'album'
                ELSE 'single'
                END AS purchase_type,
        c.count_album_tracks
    FROM track t
    JOIN invoice_line il
        ON il.track_id = t.track_id
    JOIN (SELECT COUNT(*) AS count_album_tracks, album_id
          FROM track
          GROUP BY album_id) c
        ON c.album_id = t.album_id
    GROUP BY invoice_id
)

SELECT
    purchase_type,
    COUNT(*) AS number_of_invoices,
    ROUND(CAST(COUNT(*) AS float) / CAST((SELECT COUNT(*)
               FROM invoice) as float), 2) AS '%_of_invoices'
FROM cat_purchase
GROUP BY purchase_type

Yeah, it’s quite messy but it worked and for me it was way easier than using EXCEPT.

What’s your opinion about the code above?
I’m open to improvement suggestions.

Thanks in advance,
Jonas

3 Likes

I do not understand the EXCEPT correction on Dataquest to be honest, so I will go with your solution!

Sadly I cannot say if yours is correct or not as I am a begginer.

Anyway, well done!

Looks good!

I used the same approach:

  • check if invoice has only 1 album id
  • check if number of tracks in the invoice with 1 album id is equal to the number of tracks in the respective album
with all_tables as (
        SELECT *
        FROM invoice as i
        LEFT JOIN invoice_line as il on il.invoice_id = i.invoice_id
        LEFT JOIN track as tr on tr.track_id = il.track_id
    ),
    invoice_albums as (
        SELECT invoice_id,
               album_id as album_ids,
               track_id as track_ids
        FROM all_tables
        GROUP BY 1,2,3
    ),
    albums as (
        SELECT  album_id,
                track_id
        FROM track
        GROUP BY 1,2
    ),
    album_purchase as (
        SELECT
            ia.invoice_id,
            CASE
                WHEN COUNT(distinct ia.album_ids) = 1
                AND COUNT(distinct ia.track_ids) = COUNT(DISTINCT a.track_id)
                THEN 'album_purchase'
                ELSE 'non-album_purchase'
            END as album_purchase
        FROM invoice_albums as ia
        LEFT JOIN albums as a ON ia.album_ids = a.album_id
        GROUP BY 1        
    )

SELECT
    album_purchase,
    COUNT(invoice_id) as invoices,
    COUNT(invoice_id) / (SELECT COUNT(*) FROM album_purchase) * 100
FROM album_purchase
GROUP BY 1

Still trying to figure out how it was meant to be solved using EXCEPT

The idea behind your concept is sound. Whiles your concept uses general information on the albums as a whole, Dataquest is focusing on the content of the album_ID for comparison hence the use of the except clause**(BOTH CONCEPTS ARE RIGHT)**

DATAQUEST CONCEPT

Focused on comparing the contents using the except clause(a set of tracks ids within a specific invoice_id to a set of track ids within the album_id associated to the same invoice_id)

for each invoice_id select a represntative track_id and find its corresponding album_id to be a representative album_id for the invoice(nb.for invoice to be an album purcahse all the tracks_id associated to the invoice_id must have a distinct album_id/only one album_id and also the tracks ids should be the same as what is contained in the original distinct album_id)

YOUR CONCEPT
croseecheck using the COUNT aggregate function

  1. the album_id associated with the tracks for a particular invoice is distinct/only one,(if so then there’s a probability is an album purchase)
    note: if the album id is not distinct for an invoice automatically it cannot be an album purchase

  2. even if point one is true , we have to confirm if all the tracks within the distinct album are indeed represented for that particular invoice_id.

if the two conditions hold then it can be classified as an album purchase hence it was a single purchase

YOUR CONCEPT

%%sql

WITH invoice_d AS (SELECT * FROM invoice_line il LEFT JOIN track t ON t.track_id=il.track_id ),

    album_d AS (select count (track_id) tracks_in_album, album_id from track group by 2),
    
    final_table AS (SELECT  ind.invoice_id,count(ind.track_id) count_track_id_in_invoice,MIN(ind.track_id) as rep_track_id,count(distinct ind.album_id) albums__id_in_invoice_id,ind.album_id as rep_album_id ,ad.tracks_in_album from invoice_d ind  LEFT JOIN album_d ad ON ad.album_id=ind.album_id group by 1),

    catergorised_table AS (SELECT ft.*, CASE WHEN (ft. albums__id_in_invoice_id =1) AND (ft.count_track_id_in_invoice=ft.tracks_in_album) THEN "album" ELSE "single" END as type_of_purchase FROM final_table ft)
    
    
    SELECT type_of_purchase, count(invoice_id) number_of_invoices,ROUND((CAST(count(invoice_id)AS FLOAT)/(SELECT COUNT(invoice_id) FROM catergorised_table))*100,2) as "percentage invoice"  FROM catergorised_table ct GROUP BY 1

DATA QUEST OPTION

%%sql
WITH invoice_d AS (SELECT * FROM invoice_line il LEFT JOIN track t ON t.track_id=il.track_id ),

    album_d AS (select count (track_id) tracks_in_album, album_id from track group by 2),
    
    final_table AS (SELECT  ind.invoice_id,count(ind.track_id) count_track_id_in_invoice,MIN(ind.track_id) as rep_track_id,count(distinct ind.album_id) albums__id_in_invoice_id,ind.album_id as rep_album_id ,ad.tracks_in_album from invoice_d ind  LEFT JOIN album_d ad ON ad.album_id=ind.album_id group by 1)

    
SELECT  count(invoice_id) "number of invoices",ROUND((CAST(count(invoice_id)AS FLOAT)/(SELECT COUNT(invoice_id) FROM final_table))*100,2) "invoice in %", 

CASE WHEN

(SELECT il.track_id FROM invoice_line il WHERE il.invoice_id=ft.invoice_id EXCEPT SELECT t.track_id FROM track t WHERE t.album_id=ft.rep_album_id)IS NULL

and 

(SELECT t.track_id FROM track t WHERE t.album_id=ft.rep_album_id EXCEPT SELECT il.track_id FROM invoice_line il WHERE il.invoice_id=ft.invoice_id)IS NULL 

THEN "ALBUM"

ELSE "SINGLE" 

END AS purchase_type

FROM final_table ft

GROUP BY purchase_type