GP: Answering Business Questions using SQL

Even after you fix your solution, it’s going to end up being different because the proposed solution is using a different definition of what an album purchase is.

For example the following invoice ids are purchases that include full albums with only one track that the given solution ignores: {4, 517, 7, 11, 15, 527, 20, 27, 548, 37, 38, 39, 41, 565, 59, 577, 579, 67, 586, 78, 597, 87, 91, 94, 97, 101, 103, 106, 111, 121, 127, 134, 137, 164, 168, 176, 180, 189, 190, 195, 198, 210, 216, 218, 227, 242, 250, 256, 260, 265, 270, 272, 275, 285, 288, 306, 311, 313, 316, 324, 325, 338, 345, 360, 365, 368, 369, 384, 395, 402, 405, 415, 421, 423, 424, 428, 434, 437, 441, 445, 450, 452, 456, 462, 467, 486, 509}.

You can take a look at, for instance, invoice id 597. The track id 2093 is part of an album with only one track. To see Dataquest’s invoice classification, you can run the following query:

WITH invoice_first_track AS
    (
     SELECT
         il.invoice_id invoice_id,
         MIN(il.track_id) first_track_id
     FROM invoice_line il
     GROUP BY 1
    )

SELECT
    ifs.*,
    CASE
        WHEN
             (
              SELECT t.track_id FROM track t
              WHERE t.album_id = (
                                  SELECT t2.album_id FROM track t2
                                  WHERE t2.track_id = ifs.first_track_id
                                 ) 

              EXCEPT 

              SELECT il2.track_id FROM invoice_line il2
              WHERE il2.invoice_id = ifs.invoice_id
             ) IS NULL
         AND
             (
              SELECT il2.track_id FROM invoice_line il2
              WHERE il2.invoice_id = ifs.invoice_id

              EXCEPT 

              SELECT t.track_id FROM track t
              WHERE t.album_id = (
                                  SELECT t2.album_id FROM track t2
                                  WHERE t2.track_id = ifs.first_track_id
                                 ) 
             ) IS NULL
         THEN "yes"
         ELSE "no"
     END AS "album_purchase"
 FROM invoice_first_track ifs
;

Problems with your solution are:

  • Filtering for albums with more than one track. There are a ton of albums with only one track.

  • SELECT count(*) FROM tracks_purchased is a table comprised of tracks, basically. You’re supposed to count invoices, hence the huge number 2966, despite the fact that the invoice table only has 614 rows.

2 Likes

Hey @saidakbarp,

We have a solved feature that allows you the ability to mark something as the “correct” answer, which helps future students with the same question quickly find the solution they’re looking for.

Here’s an article on how to mark posts as solved - I don’t want to do this for you until I know that solution/explanation works.

Best,
Alvin.

2 Likes

@saidakbarp I don’t know if this is still relevant to you. But a few solutions to this were given here.

Hi @Bruno ,

I’m not sure why we have to find the invoice_first_track in order to find if the customer puchased an album or just tracks.

please enlighten me

Hey. @ngokieuphu

We don’t need to find invoice_first_track. That was used in Dataquest’s solution, but it’s not necessary. I even suggested a strategy that doesn’t use that table.

When i run my code my answer differs from the solution. Did i mess up something logically because I don’t see what I did wrong. My logic being that if there are any invoices with more than one distinct album_id than it is not an album and if the album has less than a few tracks than it does not count as an album. The solution from this code is 132 albums and 482 individual purchases.

WITH album_indiv AS
    (
    SELECT
        CASE
            WHEN COUNT(DISTINCT album_id)=1 and COUNT(il.track_id)>4 THEN "album"
            ELSE "individual"
        END purchase_type,
        t.album_id,
        t.name,
        il.invoice_id
    FROM invoice_line il
    INNER JOIN track t ON t.track_id=il.track_id
    GROUP BY invoice_id
    )

SELECT
    CASE
        WHEN purchase_type="album" THEN "yes"
        WHEN purchase_type="individual" THEN "no"
        END album_purchase,
    COUNT(*) total_type
    
FROM album_indiv ai
GROUP BY purchase_type

Hi Bruno. I tried using the method you mentioned and I get a different answer than the solution. I was wondering if you know what mistake I made. Thanks!

WITH 
    album_title AS
        (
        SELECT
            a.title,
            a.album_id,
            COUNT(t.name) total_tracks
        FROM album a
        INNER JOIN track t ON t.album_id=a.album_id
        GROUP BY a.title
        ),
    
    invoice_table AS
        (
        SELECT
            invoice_id,
            album_id,
            COUNT(il.track_id) tracks_purchased
        FROM invoice_line il
        INNER JOIN track t ON t.track_id=il.track_id
        GROUP BY il.invoice_id
        )
SELECT 
    CASE 
        WHEN total_tracks=tracks_purchased THEN "YES"
        ELSE "NO"
    END album_purchase,
    COUNT(*) total_type,
    CAST(COUNT(*) as float)/CAST(MAX(invoice_id) as float) percentage
FROM album_title at
INNER JOIN invoice_table it ON it.album_id=at.album_id
GROUP BY album_purchase

Hey, Ja.

What I mentioned was in another answer, so please ask it there (or ask in a separate question with the appropriate links to the sources).

In any case, in that answer you mention, I provided a file which classifies invoices as to whether they are album purchases or not. You can use that to investigate the differences and try to figure it out yourself — it’s a good exercise.

Hi @Bruno,

I am also having trouble with this exercise. I want to understand Dataquest’s solution since it shows something I have not seen in past exercises.

“…SELECT t2.album_id FROM track t2
WHERE t2.track_id = ifs.first_track_id…”

In past execises, I remember seeing a number or string in the WHERE clause, for instance:
WHERE t2.track_id = 2
or a subquery that results in a number or string. In this exercise, we are comparing two columns in the WHERE clause, similarly to when we use the JOIN ON clause.

I do not understand how WHERE in this situation works, specially considering that the ifs table was not joined in this subquery.

Can you help me to understand this, please?

I do not want to advance with an alternate solution because I fear I might need to use this WHERE condition again in future exercises.

Thank you very much for your help :slight_smile:

Hey, Rodrigo.

Since this is an entirely different question, I won’t go into too much detail. Feel free to ask your questions in a new post if this doesn’t suffice.

This is great feedback, thanks. @rodrigoc.lomonaco

A clause like WHERE a_column = another_column is filtering for rows in which the values of the columns match.

For instance, say you have a table called a_table that looks like this:

a_column another_column yet_another_column
1 1 “Whateva!”
2 5 “I do what I want.”
3 3 “— Eric Cartman”

The query SELECT * FROM a_table WHERE a_column = another_column; yields

a_column another_column yet_another_column
1 1 “Whateva!”
3 3 “— Eric Cartman”
1 Like

Hi @Bruno, thank you for your reply. It was very enlightening.

When you meant that I should ask my questions in a new post, I am not sure if you meant I should create a new topic or if I should not reply the post I have previously replied. If you meant the former, I am sorry :confused:

I still have one question. In the same subquery:

a column of “ifs” table is used but “ifs” table is not in a FROM clause nor in a JOIN clause. I thought that if you are going to use a column from a table, this table had to be in a FROM clause or in a JOIN clause. Was that assumption incorrect?

Thanks again and sorry if I did a mistake.

Sorry, I used the wrong term, I meant a new topic. The one we’re using isn’t meant to encompass all questions regarding 191-7. The 191-7 tag only serves to inform what the topic pertains to.

It is:

image

1 Like

I think you are asking about “within the same SELECT statement, how did a new table appear?”

This was the concept where i remember the lessons missed totally.
Here is a clear example.

2 Likes

Thank you, @Bruno and @hanqi!

Oh, my mistake, Bruno. I will keep this in mind in future questions.

That is exactly what I meant, hanqi! I didn’t know I could use an information from an outer query in an inner query. I always thought the subqueries had to be sufficient on their own. I will try to find some exercises regarding correlated subqueries before tackling this 191-7 project.

Use the search function here on Discourse to search for 191-7, there are other solutions around that do not use the this technique.

Hey, Bruno. I had run through these topics regarding 191-7. I had found another solution in:

https://community.dataquest.io/t/191-7-sql-business-analysis-project/7228/

The proposed solution counts the number of distinct tracks in an album and the number of distinct tracks in an invoice, compares both and considers it an album purchase if the numbers are equal, isn’t it?

However, can’t we have some false positives in doing so? I mean, cases when the number of distinct tracks in an invoice are the same number of tracks in an album but the tracks themselves are different? In this situation, the invoice will be called an album purchase but it should not be.

I am not sure if I misinterpreted the code, though.

Not quite. It’s the number of distinct tracks in each combination of invoice and album.

If you have further questions concerning that solution, please reply to the post where that solution is posted or create a new topic including the solution you wish to ask about.

Thanks, Bruno! :grinning:

1 Like

2 posts were split to a new topic: Answering Business Questions Using SQL

I am certainly no expert (yet - hopefully one day :slight_smile:), but this approach seems a lot more simple and intuitive to me than the solution given. I started going down this route, got stuck on some details, got very confused by the proposed solution and started thinking I was way off base… I went back and did it this way in the end. Thanks for your help!

1 Like