Revisiting CASES & GROUP BY

Screen link Learn data science with Python and R projects

SELECT
    c.first_name || " " || c.last_name as customer_name,
    COUNT(i.invoice_id)  number_of_purchases,
    SUM(i.total) total_spent,
    CASE
        WHEN sum(i.total) < 40 THEN "small spender"
        WHEN sum(i.total) > 100 THEN "big spender"
        ELSE "regular"
        END
        AS customer_category 

FROM invoice i
INNER JOIN customer c ON i.customer_id = c.customer_id
GROUP BY 1 ORDER BY 1;

Can someone explain to me why omitting the GROUP BY at the end causes this query to return only one result, instead of a list of ‘un-grouped’ customers?

Many thanks,
Andrea

@moroa

After joining the tables, you have 614 rows.
Use this:

SELECT
   i.* 
FROM invoice i
INNER JOIN customer c ON i.customer_id = c.customer_id;

Or this:

SELECT
   COUNT(i.total)
FROM invoice i
INNER JOIN customer c ON i.customer_id = c.customer_id;

to confirm 614 rows.

If you sum the total in the invoice table you get 4709.429999999994

SELECT
   SUM(i.total)
FROM invoice i
INNER JOIN customer c ON i.customer_id = c.customer_id;

So if you run the code without GROUP BY, you are summing over the entire table:

SELECT
   c.first_name || " " || c.last_name customer_name,
   COUNT(i.invoice_id) number_of_purchases,
   SUM(i.total) total_spent,
   CASE
       WHEN sum(i.total) < 40 THEN 'small spender'
       WHEN sum(i.total) > 100 THEN 'big spender'
       ELSE 'regular'
       END
       AS customer_category
FROM invoice i
INNER JOIN customer c ON i.customer_id = c.customer_id
ORDER BY 1;

So you get 614 rows and the sum of invoice.total is t 4709.429999999994

thank you @monorienaghogho !

1 Like