Query results don't match - Answering Business Questions using SQL

https://app.dataquest.io/m/191/guided-project%3A-answering-business-questions-using-sql/3/selecting-albums-to-purchase

By “the number of tracks sold in the USA”, should we consider the country column in the customer table or the billing-country from the invoice table?
My query looks like this :

QUERY 1:

%%sql
WITH usa AS 
(SELECT il.*
FROM  invoice_line il 
LEFT JOIN invoice i ON  i.invoice_id = il.invoice_id
LEFT JOIN customer c ON c.customer_id = i.invoice_id
WHERE i.billing_country = 'USA')
SELECT g.name "genre",
SUM(usa.quantity)"tracks_sold" ,
CAST(COUNT(usa.invoice_line_id) AS FLOAT) / (
        SELECT COUNT(*) from usa
    ) "percentage_sold"
FROM usa usa
LEFT JOIN  track t ON t.track_id = usa.track_id
LEFT JOIN genre g ON g.genre_id = t.genre_id
GROUP BY 1
ORDER BY 2 DESC
LIMIT 10;

and the following is the result set I received.

genre tracks_sold percentage_sold
Rock 561 0.5337773549000951
Alternative & Punk 130 0.12369172216936251
Metal 124 0.11798287345385347
R&B/Soul 53 0.05042816365366318
Blues 36 0.03425309229305423
Alternative 35 0.03330161750713606
Latin 22 0.02093244529019981
Pop 22 0.02093244529019981
Hip Hop/Rap 20 0.019029495718363463
Jazz 14 0.013320647002854425

I see that, the result is similar to the one in the solution. However, when I tried the query in the solution, this is what I got.

QUERY 2:

%%sql
WITH usa AS 
(SELECT il.*
FROM  invoice_line il 
INNER JOIN invoice i ON i.invoice_id = il.invoice_id
INNER JOIN customer c ON c.customer_id = i.invoice_id
WHERE c.country = 'USA')
SELECT g.name "genre",
COUNT(usa.invoice_line_id)"tracks_sold" ,
CAST(count(usa.invoice_line_id) AS FLOAT) / (
        SELECT COUNT(*) from usa
    ) "percentage_sold"
FROM usa usa
INNER JOIN  track t ON t.track_id = usa.track_id
INNER JOIN genre g ON g.genre_id = t.genre_id
GROUP BY 1
ORDER BY 2 DESC
LIMIT 10;

Result:

genre tracks_sold percentage_sold
Rock 65 0.6770833333333334
Metal 8 0.08333333333333333
Alternative & Punk 7 0.07291666666666667
R&B/Soul 6 0.0625
Latin 3 0.03125
Alternative 1 0.010416666666666666
Blues 1 0.010416666666666666
Classical 1 0.010416666666666666
Electronica/Dance 1 0.010416666666666666
Heavy Metal 1 0.010416666666666666

I am a bit lost, could someone please help me here? Is Query 1 above correct and what am I missing in Query 2, which is close to the one given in the solution?

Thanks.

1 Like

Hey, Menon.

From a business and data point of view, this is something to think about, fortunately it doesn’t matter because the billing country always matches the customer’s country (as per the customer table).

This can be seen by running the following query and noting it returns zero rows.

SELECT *
  FROM customer as c
 INNER JOIN invoice as i
    ON c.customer_id = i.customer_id
 WHERE billing_country <> country;

Other than some style issues (like like of blank space), the first query looks good.

The join condition c.customer_id = i.invoice_id is wrong, you want to match customer with customer.

1 Like

The join condition c.customer_id = i.invoice_id is wrong, you want to match customer with customer.

Thanks @Bruno. Sorry my bad, such a silly mistake from my part. Thanks for pointing it out.

Also, can you please provide (or where can I find) the solution for the practice questions mentioned here, so that I can check the correctness of my queries. Thanks

Those are meant as extra work. We don’t provide solutions for those.

You’re welcome to give it a try and post your solutions for the community to review!

   %%sql
WITH genre_count_track as (
SELECT 
    g.genre_id,
    g.name genre_name,
    t.track_id,
    t.name track_name
FROM genre g INNER JOIN track t
ON t.genre_id=g.genre_id),
customer_usa as (
SELECT 
    i.customer_id,
    i.billing_country country,
    il.*
FROM customer c INNER JOIN invoice i
ON i.customer_id=c.customer_id
INNER JOIN invoice_line il
ON i.invoice_id=il.invoice_id
WHERE i.billing_country='USA')
SELECT
    gct.genre_name, 
    count(cu.invoice_line_id) track_count
FROM genre_count_track gct INNER JOIN customer_usa cu
ON cu.track_id=gct.track_id
GROUP BY 1
ORDER BY 2 DESC
;

I did not calculate the percentage. But if I run the query I see a couple of extra genre in USA. What is wrong in here.

hi @hmosharrof.ifoam,

I am limiting the results by 10, if you check my query. I tried running your query and comparing it with mine, removing the LIMIT clause. Both results are the one and the same.

1 Like

hi @menon.aj thanks
what do you think @Bruno. Is it right or wrong

Judging by the results, assuming Menon’s comment above is correct (and I trust it to be correct), and looking at your code, it looks correct.

2 Likes

Hi. I am also working through this right now, but I have gotten different results for the totals per genre. I basically did what you did, except I created two views, one that joins invoices to invoice_line (where invoice = USA), and another that joins tracks and genres - both INNER. I then joined these two w/ INNER JOIN, and GROUPED BY like you did. Why are you using LEFT JOIN? What does the customers table do for you?