Alternative solution with different results

Screen Link:
https://app.dataquest.io/m/374/guided-project%3A-answering-business-questions-using-sql/7/albums-vs-individual-tracks

My Code:

albums_or_tracks <- "
WITH invoice_tracks AS
  (
    SELECT 
      il.invoice_id,
      COUNT(DISTINCT(il.track_id)) AS unique_tracks,
      t.album_id
    FROM invoice_line il
    INNER JOIN track t ON t.track_id = il.track_id
    GROUP BY invoice_id, album_id
  ),
  
  album_tracks AS
  (
    SELECT
      album_id,
      COUNT(DISTINCT(track_id)) AS number_of_tracks
    FROM track
    GROUP BY album_id
  ),
  
  album_purchases AS
  (
    SELECT
      it.invoice_id,
      it.unique_tracks,
      at.*,
      CASE
        WHEN unique_tracks = number_of_tracks THEN 'yes'
        ELSE 'no'
      END AS album_purchase
    FROM invoice_tracks it
    INNER JOIN album_tracks at ON at.album_id = it.album_id
  )
SELECT 
  album_purchase,
  COUNT(DISTINCT(invoice_id)) AS invoices,
  ROUND(CAST(COUNT(DISTINCT(invoice_id)) as float) / (SELECT COUNT(DISTINCT(invoice_id)) FROM album_purchases) * 100, 2) AS percentage
FROM album_purchases
GROUP BY album_purchase; "

What I expected to happen:
To get the same values in the output as in the solution notebook.

What actually happened:
I did not get the same values.

Replace this line with the output/error

album_purchase invoices percentage
1 no 500 81.43
2 yes 201 32.74

I’m not sure I understood whether or not we were supposed to handle the “edge cases”, and I’m wondering if the flow and code are still okay in light of this likely misunderstanding.

Thanks for any help!

hi @khmidah

I am not sure what “edge cases” truly mean here. The solution has not taken into account, the albums which have only 1 song, which you have considered, hence the percentages are different.

I double checked on my project (but it’s with Python); I have added a “having” clause.
Having tracks > 2 so, if I remove that, the output almost matches the result here.

1 Like

Thank you, Rucha! This makes sense.

Just to clarify – for this part of the project the following edge cases are brought up, but it was unclear whether to ignore them so that they are excluded from the results or to just not worry about addressing them when developing the solution:

  • Albums that have only one or two tracks are likely to be purchased by customers as part of a collection of individual tracks.
  • Customers may decide to manually select every track from an album, and then add a few individual tracks from other albums to their purchase.

Anyway, I was having issue incorporating HAVING into my code but I did use WHERE on the outer query, which got the results much closer. This was the output:

album_purchase invoices percentage
1 no 500 81.83
2 yes 111 18.17

Thank you again for the help!

1 Like