Why Group By not Order By, and JOIN after SELECT?

https://app.dataquest.io/m/190/building-and-organizing-complex-queries/7/multiple-named-subqueries

WITH
    customers_india AS
        (
        SELECT * FROM customer
        WHERE country = "India"
        ),
    sales_per_customer AS
        (
         SELECT
             customer_id,
             SUM(total) total
         FROM invoice
         GROUP BY 1
        )

SELECT
    ci.first_name || " " || ci.last_name customer_name,
    spc.total total_purchases
FROM customers_india ci
INNER JOIN sales_per_customer spc ON ci.customer_id = spc.customer_id
ORDER BY 1;

The above is the solution for this section. The total_per_customer is grouped by customer_id, GROUP BY 1. Why doesn’t this work if I do ORDER BY 1 instead?
Also, if the tables need to be joined before the columns can be selected, why don’t we JOIN then SELECT?

Why should it work if ORDER BY 1?
GROUP BY and ORDER BY do totally different things.
ORDER BY has no concept of aggregation while GROUP BY does. There is no total_per_customer in the snippet pasted, i assume you mean sales_per_customer. To get per customer statistics, there has to be a grouping on customers and aggregation on whatever metric of interest (total invoice value here). ORDER BY cannot do this aggregation.


This is the kind of discussion you can search for. There is also a 3rd method of filter during join, using the syntax of tableA JOIN tableB ON <insert filter conditions/expression>.

Filtering first makes the code easier to read and logically think about, and more amenable to copy paste the smaller, more directly purposeful transformations for reuse. If join first, colleagues who don’t need the join for their job would be doing unnecessary work if copying from your code. Besides direct copy paste for others to do interactive query, the smaller CTE could be saved as views also.

2 Likes