My different solution to the Challenge: Each Country's Best Customer

So I created an solution for the Challenge: Each Country’s Best Customer of Building and Organizing Complex Queries, but my solution was quite different from the suggested solution and I’m wondering why and how I can take a lesson from this.

Challenge to be found on: https://app.dataquest.io/m/465/building-and-organizing-complex-queries/8/challenge-each-countrys-best-customer

The suggested solution is as follows:

WITH
    customer_country_purchases AS
        (
         SELECT
             i.customer_id,
             c.country,
             SUM(i.total) total_purchases
         FROM invoice i
         INNER JOIN customer c ON i.customer_id = c.customer_id
         GROUP BY 1, 2
        ),
    country_max_purchase AS
        (
         SELECT
             country,
             MAX(total_purchases) max_purchase
         FROM customer_country_purchases
         GROUP BY 1
        ),
    country_best_customer AS
        (
         SELECT
            cmp.country,
            cmp.max_purchase,
            (
             SELECT ccp.customer_id
             FROM customer_country_purchases ccp
             WHERE ccp.country = cmp.country AND cmp.max_purchase = ccp.total_purchases
            ) customer_id
         FROM country_max_purchase cmp
        )
SELECT
    cbc.country country,
    c.first_name || " " || c.last_name customer_name,
    cbc.max_purchase total_purchased
FROM customer c
INNER JOIN country_best_customer cbc ON cbc.customer_id = c.customer_id
ORDER BY 1 ASC

My solution is like this:

WITH
    customer_purchase_amount AS
        (
        SELECT
            customer_id,
            SUM(total) total_purchased
        FROM invoice
        GROUP BY 1
        ),
    best_customer_country AS
        (
        SELECT
            customer_id,
            first_name || " " || last_name customer_name,
            country
        FROM customer
        )
        
SELECT
    bcc.country,
    bcc.customer_name,
    MAX(cpa.total_purchased) total_purchased
FROM best_customer_country bcc
INNER JOIN customer_purchase_amount cpa ON cpa.customer_id = bcc.customer_id
GROUP BY 1
ORDER BY 1 ASC;

All feedback is welcome. Thanks!

2 Likes

Hi @DataBuzzer,

Great Work!

If you are wondering about the length of the Dataquest solution, then the approach taken by the content author here is not for finding the fastest solution but for making it easy to do queries in the real world. For example, By creating 3 separate temporary tables using with clause, we can use any of those for answering other questions, so instead of recreating temporary tables for a specific case every single time, this approach kind of makes your query modular. So that you can use the same for multiple cases.

I hope it helps you to understand the perspective behind it.

Best,
Sahil

2 Likes

Makes sense, thanks!

1 Like

Hey @DataBuzzer,

I authored this course, and I wanted to provide some extra insight into why I didn’t decide things this way. Let’s look at your outermost query:

SELECT
    bcc.country,
    bcc.customer_name,
    MAX(cpa.total_purchased) total_purchased
FROM best_customer_country bcc
INNER JOIN customer_purchase_amount cpa ON cpa.customer_id = bcc.customer_id
GROUP BY 1
ORDER BY 1 ASC;

In this query, you have three columns:

  • bcc.country, which you’ve grouped by
  • MAX(cpa.total_purchased), which uses an aggregate function
  • bcc.customer_name, which uses neither.

In many variants of SQL (but notably not SQLite), this query would raise an error, because when you use aggregates/group by, every column has to be one or the other. This is because it’s not obvious to the database what you want it to do if you don’t specify it.

If you look at https://www.sqlite.org/quirks.html, they actually talk about this:

In most SQL implementations, output columns of an aggregate query may only reference aggregate functions or columns named in the GROUP BY clause. It does not make good sense to reference an ordinary column in an aggregate query because each output row might be composed from two or more rows in the input table(s).

SQLite does not impose this restriction. The output columns from an aggregate query can be arbitrary expressions that include columns not found in GROUP BY clause

While your query does work in SQLite, it fails in many other SQL implementations, which is why we’ve chosen to do things in a way that avoids this SQLite ‘feature’ completely.

I hope this helps you to understand our approach (and potentially save you some pain later on by preventing you from running into this issue).

3 Likes

Thanks for your in depth reply!