7. Generating Columns With The Case Statement

I get the solution and my answer was somewhat similar, however mine was different in a few ways and I had questions on it. I’ll post my code and the solution code below.

So my curiosity/question stems from:

  1. When I query the “invoice” table separately, I see the “total” column is the sum of purchases for ALL invoices
  2. Because of knowledge of #1, it didn’t seem to appear that I need to aggregate “I.TOTAL” because it already is.
  3. So because of this I did a COUNT(*) because we’re GROUP BY at the customer level (all invoice_line and invoice_ids should roll into it) > the total purchases look correct
  4. But of course the issue is with the I.TOTAL because I assumed two things 1) it is already the sum of sums of all purchases by “CUSTOMER_ID” and 2) even without, we still GROUP BY customer name.

Questions:
Basically, just curious why it is TOTAL column is sum of sums in INVOICE table, but without sum() the results in our query is the sum of only ONE arbitrary invoice (after the join)?

Also, does the GROUP BY not apply to everything in select statement (if so, I’d imagine it would group the I.TOTAL by all purchases?

My code:
SELECT
c.first_name || " " || c.last_name customer_name,
COUNT(*) number_of_purchases,
i.total total_spent,
CASE
WHEN i.total < 40 THEN “small spender”
WHEN i.total > 100 THEN “big spender”
ELSE “regular”
END
AS customer_category

FROM customer c
INNER JOIN invoice i ON c.customer_id = i.customer_id
GROUP BY customer_name
ORDER BY customer_name

Solution:
SELECT
c.first_name || " " || c.last_name customer_name,
COUNT(i.invoice_id) number_of_purchases,
SUM(i.total) total_spent,
CASE
WHEN SUM(i.total) < 40 THEN “small spender”
WHEN SUM(i.total) > 100 THEN “big spender”
ELSE “regular”
END
AS customer_category

FROM customer c
INNER JOIN invoice i ON c.customer_id = i.customer_id
GROUP BY customer_name
ORDER BY customer_name

Hey there!

The interesting thing here is that in most flavors of SQL, your code would have failed as you’d either need to include all non-aggregated columns into your GROUP BY statement or aggregate the column — in our code interpreter, we just select the last known value for total when you group. In order to get the total amount spent, you’d need to aggregate the total_spent column using an aggregation method — that would be SUM in this case.

The number_of_purchases column looks correct because you’ve aggregated using COUNT — SQL has no way of knowing what to do with the values in a column if no aggregator is specified. For example, Aaron Mitchell has made 8 purchases — there were 8 invoices in the table in his name. Therefore, there are 8 different total values for Aaron — SQL needs to know what to do with those values when you group.

Make sense?

1 Like

Hey there,

It makes sense. But basically, regardless of the (i.e. “total”) column and how it’s aggregated on it’s individual table, I should always use an aggregate function when joining with other tables, to be on the safe side (i.e. invoice and customer tables).

And I’m not sure what I checked before I posted the original question, but yeah, the “total” column in the “invoice” table shows each individual transaction for “total”, so I get why I have to aggregate first.

And just to clarify, GROUP BY occurs AFTER any other aggregations? What exactly the GROUP BY doing (i.e. does it aggregate each column to by the GROUP BY)?

Thanks again.