Comparing My Answer with DQ Answer: Please Critique

Screen Link:

This is my code that I came up with on my own which arrives at the correct answer:

 WITH sum_by_cust AS
  (
   SELECT
          c.country AS country, 
          c.first_name||" "||c.last_name as customer_name,
          SUM(i.total) AS customer_total_purchased
     FROM customer as c
    INNER JOIN invoice as i ON i.customer_id = c.customer_id
    GROUP BY c.country, customer_name
    ORDER BY customer_total_purchased DESC
  )
SELECT country, customer_name, MAX(customer_total_purchased) as total_purchased
 FROM sum_by_cust
GROUP BY country;

This is the DQ Answer which is much different:

WITH
    customer_country_purchases AS
        (
         SELECT
             i.customer_id,
             c.country,
             SUM(i.total) total_purchases
         FROM invoice i
         INNER JOIN customer c ON i.customer_id = c.customer_id
         GROUP BY 1, 2
        ),
    country_max_purchase AS
        (
         SELECT
             country,
             MAX(total_purchases) max_purchase
         FROM customer_country_purchases
         GROUP BY 1
        ),
    country_best_customer AS
        (
         SELECT
            cmp.country,
            cmp.max_purchase,
            (
             SELECT ccp.customer_id
             FROM customer_country_purchases ccp
             WHERE ccp.country = cmp.country AND cmp.max_purchase = ccp.total_purchases
            ) customer_id
         FROM country_max_purchase cmp
        )
SELECT
    cbc.country country,
    c.first_name || " " || c.last_name customer_name,
    cbc.max_purchase total_purchased
FROM customer c
INNER JOIN country_best_customer cbc ON cbc.customer_id = c.customer_id
ORDER BY 1 ASC

When I came up with my answer after trial and error. I mapped it out on paper and tried to visualize it simple steps using the knowledge that I have so far. I would not have thought of what the DQ answer did.

Is my answer bad or not acceptable in some way? What would people expect in practice? I feel like I’m missing something.

Thank you for your time

Comments on your code:

SELECT country, customer_name, MAX(customer_total_purchased) as total_purchased
 FROM sum_by_cust
GROUP BY country;

Here you SELECT customer_name, a column that does not appear in GROUP and does not get aggregated. This will error in other sql engines because it logically makes no sense. SQLite is lax so doesn’t error but returns the 1st row in the group by default. You can add customer_name into GROUP to solve it. It can cause harder to debug problems: Why is the Solution Ignoring 'Quantity' in invoice_line when Calculating Sold Tracks per Genre per Country?

In CTE you grouped by customer_name, what if 2 customers have same name? That’s why practically tables give real world objects ids that are unique so separate things don’t get treated the same, so when you want to analyze anything, look for their id first and if not there, try to combine multiple columns of information to create unique identifiers for each object. For people, their emails can be used too.

What is ORDER BY customer_total_purchased DESC for? I don’t see the DQ answer ordering by that but by country instead, maybe you are using it to make use of SQLite’s quirk as I describe below.

Comments on DQ code:

GROUP BY numbers. Bad practice that should never be done. If SELECT columns changed, numbers refer to wrong columns.

It works with customer_id and only in final step JOINs names in. It creates a dictionary of country_max_purchase from customer_country_purchases, leaving out the customer_name information which you included (maybe because you wanted to do everything in 1 step), because it is awkward to select (and thus have to group by) more than necessary and doesn’t consider ties.

country_best_customer uses correlated subquery to filter the outer query with the dictionary from the previous CTE to get customer_id. The difference with yours is this is identifying rows with same country and total_purchase as the country’s max, so if more than 1 customer meets these conditions, both will be returned. Your code only returns a single row for each country, with a customer_name that is selected from the 1st customer in that country (which is only correct because you did ORDER BY customer_total_purchased DESC previously, but still wrong when ties need to be returned, and wrong because it doesn’t follow sql standard).

To maximize your learning, think about why DQ answer is doing each step, are they necessary or logically correct? Then go to forums and find other answers. I have seen SQL answers better than DQ answers here.

1 Like