SQL How to get Percentages of Total Amount of Sales

Screen Link: https://app.dataquest.io/m/191/guided-project%3A-answering-business-questions-using-sql/3/selecting-albums-to-purchase

Your Code:

%%sql

WITH usa_tracks_sold AS
   (
    SELECT il.* FROM invoice_line il
    INNER JOIN invoice i on il.invoice_id = i.invoice_id
    INNER JOIN customer c on i.customer_id = c.customer_id
    WHERE c.country = "USA"
   )

SELECT
    g.name genre,
    count(uts.invoice_line_id) tracks_sold,
    **cast(count(uts.invoice_line_id) AS FLOAT) / (**
**        SELECT COUNT(*) from usa_tracks_sold**
    ) percentage_sold
FROM usa_tracks_sold uts
INNER JOIN track t on t.track_id = uts.track_id
INNER JOIN genre g on g.genre_id = t.genre_id
GROUP BY 1
ORDER BY 2 DESC
LIMIT 10;

The code above calculates Percentages of Number of Tracks Sold. But what if i want to calculate Percentages of Amounts of Tracks Sold?

I tried

cast((uts.unit_price*uts.quantity) AS FLOAT) / (
        SELECT SUM((uts.unit_price*uts.quantity)) from usa_tracks_sold

but it added up more than 1

Let’s think about this with a sample. If you remove the GROUP BY clause and the aggregate functions from the query, the resulting table kind of looks like this:

invoice_id track_id unit_price quantity genre_name
5 2001 0.99 1 Rock
5 2002 0.99 1 Rock
5 206 0.99 1 Latin

Eyeballing we can tell that for this sample the answer would be approximately 66.6% for Rock and 33.3% for Latin.

The subquery (SELECT SUM((uts.unit_price*uts.quantity)) results in 2.97. So what you’re doing is basically the following:

SELECT genre_name,
       CAST(unit_price*quantity AS FLOAT)/2.97
  FROM small_table_above
 GROUP BY genre_name;

Note that in this simplified example, you’re using GROUP BY, but not actually using any aggregate functions, this is a conceptual mistake and in many databases this would actually yield an error. See this answer for more details on this.

You need to sum the numerator, it is the analogous of the COUNT you used before.

A couple of final observations:

  • Most tracks cost 0.99, this will make it so that counting and summing will lead to approximately the same results, so this might end up not being particularly insightful; and that’s fine, it’s part of the process.
  • The quantity is always 1, so there’s no need to multiply by the quantity, you can just sum the price directly. Having said this, I think it’s good practice to do this because it matches the conceptual idea, because:
    • It makes it easier to read and understand the query.
    • Enables you to catch data errors if they ever occurr. For example, in this business customers shouldn’t be allowed to purchase the same song twice, because purchasing it just one will give them exactly the same thing. If you multiply by the quantity and one day the quantity isn’t 1, you’ll be in a better position to notice this.
1 Like

Hi @Bruno
am trying this out, but am afraid still stuck hehe…

%%sql

WITH usa_tracks_sold AS
   (
    SELECT il.* FROM invoice_line il,
    SUM (unit_price * quantity) grd_total
    INNER JOIN invoice i on il.invoice_id = i.invoice_id
    INNER JOIN customer c on i.customer_id = c.customer_id
    WHERE c.country = "USA"
   )

SELECT
    g.name genre,
    SUM(uts.unit_price*uts.quantity) amt_tracks_sold,
    ((SUM(uts.unit_price) / uts.grd_total) amt_percentage_sold
FROM usa_tracks_sold uts
INNER JOIN track t on t.track_id = uts.track_id
INNER JOIN genre g on g.genre_id = t.genre_id
ORDER BY 3 DESC
LIMIT 10;
* sqlite:///chinook.db
(sqlite3.OperationalError) near "amt_percentage_sold": syntax error [SQL: 'WITH usa_tracks_sold AS\n   (\n    SELECT il.* FROM invoice_line il,\n    SUM (unit_price * quantity) grd_total\n    \n    INNER JOIN invoice i on il.invoice_id = i.invoice_id\n    INNER JOIN customer c on i.customer_id = c.customer_id\n    WHERE c.country = "USA"\n   )\n\nSELECT\n    g.name genre,\n    SUM(uts.unit_price*uts.quantity) amt_tracks_sold,\n    ((SUM(uts.unit_price) / uts.grd_total) amt_percentage_sold\nFROM usa_tracks_sold uts\nINNER JOIN track t on t.track_id = uts.track_id\nINNER JOIN genre g on g.genre_id = t.genre_id\n/*ORDER BY 3 DESC*/\n/*LIMIT 10*/;'] (Background on this error at: http://sqlalche.me/e/e3q8)

Any other hints of where i got it wrong this time? Also how can i learn to debug on SQL based on the msgs?

In my experience SQL error messages aren’t very helpful.

The error is in usa_tracks_sold. You’re selecting a column after the FROM clause.

Hi
i tried this,

%%sql

WITH usa_tracks_sold AS
   (
    SELECT il.*,
    SUM (il.unit_price * il.quantity) grd_total 
    FROM invoice_line il
    INNER JOIN invoice i on il.invoice_id = i.invoice_id
    INNER JOIN customer c on i.customer_id = c.customer_id
    WHERE c.country = "USA"
   )

SELECT
    g.name genre,
    SUM(uts.unit_price*uts.quantity) amt_tracks_sold,
    ((SUM(uts.unit_price) / uts.grd_total)*100) amt_percentage_sold
FROM usa_tracks_sold uts
INNER JOIN track t on t.track_id = uts.track_id
INNER JOIN genre g on g.genre_id = t.genre_id;

but i am getting this as a result here

genre 	amt_tracks_sold 	amt_percentage_sold
Rock 	0.99 	0.09514747859181659

any pointers of where i get it wrong? :grinning:

Hey.

You now did the opposite mistake of your starting question. You’re using an aggregate function without using GROUP BY.

It’s possible to use aggregate functions without groups, but you can’t select any additional columns in this case.

oh finally managed to muddle it out with some googling. this probably going down the books as the longest time taken to solve for SQL :grinning:

%%sql

WITH usa_tracks_sold AS
   (
    SELECT il.* FROM invoice_line il
    INNER JOIN invoice i on il.invoice_id = i.invoice_id
    INNER JOIN customer c on i.customer_id = c.customer_id
    WHERE c.country = "USA"
   )
    
SELECT
    g.name genre,
    SUM(uts.unit_price*uts.quantity) amt_tracks_sold,
    SUM(uts.unit_price*uts.quantity) / (SELECT SUM(unit_price*quantity) FROM usa_tracks_sold)*100 amt_percentage_sold
FROM usa_tracks_sold uts
INNER JOIN track t on t.track_id = uts.track_id
INNER JOIN genre g on g.genre_id = t.genre_id
GROUP BY 1
ORDER BY 2 DESC;

Just a few questions

  1. Why doesn’t (SELECT SUM(unit_price*quantity) FROM usa_tracks_sold) gets affected by the GROUP BY as in the denominator is able to get the grand total? is it because it is referring directly to the subquery?
  2. I am trying to understand your advice on '“Note that in this simplified example, you’re using GROUP BY , but not actually using any aggregate functions, this is a conceptual mistake and in many databases this would actually yield an error. See this answer for more details on this.” What you are saying is that for any columns that there is no aggregation, we should use Group By and vice versa?
1 Like

Yes!

1 Like