# Solution for 190-8 "Challenge: Each Country's Best Customer" seems overly complicated?

The challenge hints that the solution needs

• One or more named subqueries defined in a `WITH` clause
• Aggregate functions like `SUM()` and `MAX()`
• Several `INNER JOIN` s
• A subquery to define a column
• `GROUP BY` and `ORDER BY` clauses

and that the solution is

``````WITH
customer_country_purchases AS
(
SELECT
i.customer_id,
c.country,
SUM(i.total) total_purchases
FROM invoice i
INNER JOIN customer c ON i.customer_id = c.customer_id
GROUP BY 1, 2
),
country_max_purchase AS
(
SELECT
country,
MAX(total_purchases) max_purchase
FROM customer_country_purchases
GROUP BY 1
),
country_best_customer AS
(
SELECT
cmp.country,
cmp.max_purchase,
(
SELECT ccp.customer_id
FROM customer_country_purchases ccp
WHERE ccp.country = cmp.country AND cmp.max_purchase = ccp.total_purchases
) customer_id
FROM country_max_purchase cmp
)
SELECT
cbc.country country,
c.first_name || " " || c.last_name customer_name,
cbc.max_purchase total_purchased
FROM customer c
INNER JOIN country_best_customer cbc ON cbc.customer_id = c.customer_id
ORDER BY 1 ASC
``````

However, I was able to achieve the correct answer in a much much much simpler way:

``````with c_total_sales as (
select
customer.customer_id as id,
customer.first_name || ' ' || customer.last_name as customer_name,
customer.country as country,
sum(invoice.total) as total_purchased
from customer
inner join invoice
on customer.customer_id = invoice.customer_id
group by id
order by country
)

select
country,
customer_name,
max(total_purchased) as total_purchased
from c_total_sales
group by country
``````

Am I missing something? Or is the provided solution actually more complicated than needed to be?

1 Like

Hey.

Your solution has insidious issues that arenâ€™t easy to spot, it works here because of some specific circumstances.

The issue is that youâ€™re selecting columns without grouping by them. In a query where you use `GROUP BY`, every column must either by in the `GROUP BY` clause or an aggregate function must be used on it.

I hope this helps.

8 Likes

Hey @Bruno and @siuangie91

The solution also seemed to be overly complicated to me.

My solution was different again but also shorter than the solution. In brief, my logic was to:

• Return a table `ST` of customer_idâ€™s with the sum_total of the purchases for that customer.

• Join the sum_totals table `ST` to the `customer` table and display the best customer by finding the `max(total_purchased)` grouped by country.

My understanding of the logic in the given solution is:

• Return table `customer_country_purchases` with customer_id, country field from customer table and total_purchases

• Create another table `country_max_purchase` of maximum purchases by country using `customer_country_purchases`

• Create a third table `country_best_customer` that matches the country and max purchase from each country to a customer_id

• Join `country_best_customer` back to the `customer` table to match the `customer_name`

Do you see any issues or reasons why my solution below may work here but not in general? I suspect the type of problem you are talking about relates to this GROUP BY lesson where the last value of a column that hasnâ€™t been aggregated is displayed in the resultant table. In my solution below the aggregation is being performed on the inner joined table.

The columns selected are in each of the `GROUP BY` clauses Iâ€™ve used.

``````WITH
st AS
(
SELECT
customer_id,
SUM(total) AS sum_total
FROM invoice
GROUP BY customer_id
)

SELECT
country,
first_name || " " || last_name customer_name,
MAX(st.sum_total) total_purchased
FROM customer c
INNER JOIN st ON st.customer_id = c.customer_id
GROUP BY country
ORDER BY 1
``````
4 Likes

Hey, Mattrosinski.

Your understanding is great, you got the spirit right.

Not so. In the last part of the query youâ€™re selecting three columns, one has an aggregate function, the other two donâ€™t, but youâ€™re only grouping by one of the remaining ones.

Some database systems wonâ€™t accept this as valid code, it will just yield an error. And some will do something that isnâ€™t what you expect. Details can be read in the first link in my reply above.

4 Likes

I noticed that the given solution seemed overly complicated as well. My solution involved making a single view for my initial query, and then getting a query from that using an aggregate function. I did not use ORDER BY because I assumed that GROUP BY already covered the country orderingâ€¦ Is this not correct thinking?

My solution:

CREATE VIEW customer_total AS
SELECT c.first_name || â€™ â€™ || c.last_name customer_name,
c.country,
SUM(i.total) total_purchases
FROM customer c
INNER JOIN invoice i ON i.customer_id = c.customer_id
GROUP BY 1;

SELECT customer_name,
country,
ROUND(MAX(total_purchases),2) total_purchased
FROM customer_total
GROUP BY country;

Regarding your view, I repeat my words from above.

Hi everyone,

Could someone tell if this code is potentially correct for non-educational purposes and working with other DBs.

It worked fine as a solution.

WITH sq1 AS (
select c.country country, c.customer_id, c.first_name || " " || c.last_name customer_name, SUM(i.total) total_purchased
from customer c
inner join invoice i ON c.customer_id = i.customer_id
group by c.customer_id
)

SELECT sq1.country, sq1.customer_name, max(sq1.total_purchased) total_purchased from sq1
INNER JOIN customer c ON c.customer_id = sq1.customer_id
GROUP BY sq1.country
ORDER BY sq1.country

Hi @1116, as @Bruno has well explained in his responses, your query will not be accepted in DBMSs (e.g., SQL Server) which require that when an aggregate function is used in a SELECT clause, all other columns used in the SELECT clause must be used in the GROUP BY clause.

3 Likes

Hi @Bruno and everyone,

I hope that you guys can check my solution.
It has the same result but is more simple than the solution mentioned in the exercise.
Thank you.

WITH country_customer AS
(
SELECT c.country,
c.first_name || " " || c.last_name AS customer_name,
SUM(i.total) AS total_purchased
FROM customer c
INNER JOIN invoice i ON c.customer_id = i.customer_id
GROUP BY 1, 2
ORDER BY 1
)

SELECT c1.country, c2.customer_name, MAX (c1.total_purchased) AS total_purchased
FROM country_customer c1
INNER JOIN country_customer c2 ON c1.customer_name = c2.customer_name
GROUP BY 1
ORDER BY 1

Hi everyone , please this is my own solution to the challenge, if you could help me check if there is anything to improve
@Bruno i will be glad if you could too.
thanks

Interesting. I was not aware of this. I think this needs to be stated and/or covered in the Introduction to SQL Subqueries course.

On reviewing some of my previous SQL solutions I see I had not included ALL of the SELECT columns in the corresponding GROUP BY yet my solution was accepted when I submitted it.

I too arrived here because of my â€śsimpleâ€ť answer to this exercise.

â€“ Edit

I did some more research on this. According to the sqlite specification the following happens.

Side note: Bare columns in an aggregate queries. The usual case is that all column names in an aggregate query are either arguments to aggregate functions or else appear in the GROUP BY clause. A result column which contains a column name that is not within an aggregate function and that does not appear in the GROUP BY clause (if one exists) is called a â€śbareâ€ť column. Example:

SELECT a, b, sum ( c ) FROM tab1 GROUP BY a;

In the query above, the â€śaâ€ť column is part of the GROUP BY clause and so each row of the output contains one of the distinct values for â€śaâ€ť. The â€ścâ€ť column is contained within the sum()aggregate function and so that output column is the sum of all â€ścâ€ť values in rows that have the same value for â€śaâ€ť. But what is the result of the bare column â€śbâ€ť? The answer is that the â€śbâ€ť result will be the value for â€śbâ€ť in one of the input rows that form the aggregate. The problem is that you usually do not know which input row is used to compute â€śbâ€ť, and so in many cases the value for â€śbâ€ť is undefined.

Special processing occurs when the aggregate function is either min() or max(). Example:

SELECT a, b, max( c ) FROM tab1 GROUP BY a;

When the min() or max() aggregate functions are used in an aggregate query, all bare columns in the result set take values from the input row which also contains the minimum or maximum. So in the query above, the value of the â€śbâ€ť column in the output will be the value of the â€śbâ€ť column in the input row that has the largest â€ścâ€ť value. There is still an ambiguity if two or more of the input rows have the same minimum or maximum value or if the query contains more than one min() and/or max() aggregate function. Only the built-in min() and max()functions work this way

So while bare columns sometimes work for Min Max aggregate functions I guess the moral of the story would be to avoid use of bare columns in sqlite GROUP BY calls.

2 Likes

Hi All -
Just chiming in as fellow Dataquest student. @Bruno 's response was great as I donâ€™t recall the Dataquestâ€™s intermediate SQL course mentioning that when an aggregate function is included in the SELECT clause, all of the non-aggregated columns - aka columns without a function applied - need to be included in the GROUP BY clause.

I found this link on stackoverflow explaining how GROUP BY can be interpreted as â€śfor eachâ€ť. So, â€śfor eachâ€ť country, customer, and purchase amount, return X, which helped me in my understanding.

Also helpful for this challenge was recalling the order of operations that SQL executes, which is:

1. FROM
2. WHERE
3. GROUP BY
4. HAVING
5. SELECT
6. ORDER BY
7. LIMIT

Hereâ€™s my code answer for this below. Open to feedback!

`````` WITH customer_country_purchases AS
(
SELECT invoice.customer_id,
customer.first_name,
customer.last_name,
customer.country,
SUM(invoice.total) AS total_purchases

FROM invoice
INNER JOIN customer
ON customer.customer_id = invoice.customer_id

GROUP BY invoice.customer_id,
customer.first_name,
customer.last_name,
customer.country
)

/*
below query states:
for each country where the sum of the total purchased of a customer is the highest amongst
other customers from the same country, return the column with the countries' name, the
customers' names, and the accompanying purchased amounts
*/

SELECT country,
first_name || " " || last_name AS customer_name,
total_purchases AS total_purchased
FROM customer_country_purchases
GROUP BY country
HAVING total_purchases = MAX(total_purchases);``````
1 Like

There is no aggregate function here, and GROUP BY was only applied to `country`

@semenchuk.ivan is it required to only use GROUP BY where there is an aggregate function?

Also - should GROUP BY have been applied to more than one column?

@orichh, my understanding after reading Brunoâ€™s comments is that not applying GROUP BY to `customer_name` in your proposed solution is not advised and could cause problems with some other flavors of SQL.

In this mission this is not a problem because we are using SQLite.
I am glad I came across this thread to learn this potential issue with other SQL flavors.

when I saw my output for an sql i made it was:

WITH customerlist AS
(SELECT *
FROM customer),
billing AS
(SELECT customer_id, SUM(total) total_purchased
FROM invoice i
GROUP BY 1)

SELECT cl.country, cl.first_name ||" "|| cl.last_name customer_name, MAX(b.total_purchased) total_purchased
FROM customerlist cl INNER JOIN billing b ON cl.customer_id = b.customer_id
GROUP BY 1
ORDER BY 1 ASC

Even though the output is identical to the table with 3 columns dataquest refuses to acknowledge it as being correctis it a glitch, as when I use dataquestâ€™s solution it automatically says correct solution even though the output are identical.