280-5 Analyzing Sales by Country

Can someone explain the logic of the case statement in the solution, step-by-step (inside of the MAIN query)?

My query is different and more complicated than it needs to be because I’m doing all the queries for each calculation separately.
So I need to get the logic in order to incorporate the case statement into my own code.

Thanks in advance.

sales_by_country = '''
WITH country_or_other AS
(
    SELECT
    CASE
        WHEN (
                SELECT count(*)
                FROM customer
                where country = c.country
            ) = 1 THEN "Other"
        ELSE c.country
    END AS country,
    c.customer_id,
    il.*
    FROM invoice_line il
    INNER JOIN invoice i ON i.invoice_id = il.invoice_id
    INNER JOIN customer c ON c.customer_id = i.customer_id
)

SELECT
country,
customers,
total_sales,
average_order,
customer_lifetime_value
FROM
(
SELECT
    country,
    count(distinct customer_id) customers,
    SUM(unit_price) total_sales,
    SUM(unit_price) / count(distinct customer_id) customer_lifetime_value,
    SUM(unit_price) / count(distinct invoice_id) average_order,
    CASE
        WHEN country = "Other" THEN 1
        ELSE 0
    END AS sort
FROM country_or_other
GROUP BY country
ORDER BY sort ASC, total_sales DESC
);
1 Like

I’m only putting my code here because the responses usually take long, but either way, I attempted to incorporate the CASE statements into my existing code.

Again, I’m not sure how the CASE statement works in the MAIN query of the solution and I understand my code is redundant.

sales_by_country = '''
WITH
tot_cust_country AS
(
SELECT
    CASE
        WHEN(
            SELECT COUNT(*)
            FROM customer
            WHERE country = c.country
            ) = 1
        THEN "Other"
        ELSE c.country
    END AS country,
    COUNT(customer_id) tot_customers
FROM customer c
GROUP BY country
),
tot_amt_country AS
(
SELECT
    c.country country,
    CAST(COUNT(i.invoice_id) AS FLOAT) invoice_id,
    CAST(SUM(i.total) AS FLOAT) total_amt
FROM invoice i
INNER JOIN customer c
    ON i.customer_id = c.customer_id
GROUP BY country
),
avg_ord_amt AS
(
SELECT
    country,
    total_amt / invoice_id avg_per_ord
FROM tot_amt_country
GROUP BY country
),
avg_sales_per AS
(
SELECT 
    tac.country,
    (tac.total_amt / tcc.tot_customers) avg_per_cust
FROM tot_amt_country tac
INNER JOIN tot_cust_country tcc
    ON tac.country = tcc.country
GROUP BY 1
)

SELECT
tcc.country,
tcc.tot_customers,
tac.total_amt,
asp.avg_per_cust,
aom.avg_per_ord,
CASE
    WHEN tcc.country = "Other"
    THEN 1
    ELSE 0
END AS sort
FROM tot_cust_country tcc
INNER JOIN tot_amt_country tac
ON tcc.country = tac.country
INNER JOIN avg_sales_per asp
ON tac.country = asp.country
INNER JOIN avg_ord_amt aom
ON asp.country = aom.country
GROUP BY tcc.country
ORDER BY sort ASC, tac.total_amt DESC
'''
print(run_query(sales_by_country))

Would anybody happen to have any clues as to what might be causing a new column to be displayed (sort) instead of the row for “Other”?

Just to reiterate, my code above is redundant, but the data was correct.

I’m trying to understand the case statement, but in my code, I simply used the solution case statement in both the WITH and MAIN query.

Because you’re creating a column called sort in this part of the code:

CASE
    WHEN tcc.country = "Other"
    THEN 1
    ELSE 0
END AS sort

Let’s focus on the following query:

SELECT CASE
         WHEN (SELECT COUNT(*)
                 FROM customer
                WHERE country = c.country
              ) = 1 THEN "Other"
         ELSE c.country
       END AS country,
       c.customer_id,
       il.*
  FROM invoice_line il
 INNER JOIN invoice AS i ON i.invoice_id = il.invoice_id
 INNER JOIN customer AS c ON c.customer_id = i.customer_id

There are several tables in play, one of them even occurs twice, and that’s where we’ll focus. I’m talking about country. This table occurs as part of the “outer query” in the line INNER JOIN customer c ON c.customer_id = i.customer_id and is given the alias c.

It occurs again in the case statement:

CASE
  WHEN (SELECT COUNT(*)
          FROM customer
         WHERE country = c.country
       ) = 1 THEN "Other"
  ELSE c.country
END AS country

In here, it’s as if customer is two different tables; we have both customer and c.

The way this works is that the SQL engine will loop through all rows in c. Each row has a country, c.country. This is a specific value as this point, something like Dataquest Country.

Then a subquery is ran:

SELECT COUNT(*)
  FROM customer
WHERE country='Dataquest Country`
;

This counts the number of rows in customer whose country is Dataquest Country. In other words, it counts the number of customers whose country is Dataquest Country.

Then the case statements checks to see if the result is 1. If it is, it assigns the value Other, if it doesn’t, it assigns the value c.country, which is Dataquest Country.

I hope this helps.

4 Likes

Hey Bruno,

Thanks for that. The explanation on the order of operations and exactly what occurs helps greatly.

With that said, I still don’t understand what you mean about creating a column called sort. Isn’t this the point to create that pseudo column, but somehow for it to become a row? The solution does exactly the same? I’m just not sure why it doesn’t work within mine?

I guess, firstly, I don’t get how the “column” becomes a row in the first place.

CASE
   WHEN tcc.country = "Other"
   THEN 1
   ELSE 0
END AS sort

Here is a screenshot of my code (main query) for both mine and the solution. I can just redo my code, but I just wanted to see if it’s because of how I structured the rest of my query.

All the data is correct with my query, except “sort” being a column instead of a row as “Other”.

Thanks again

I really don’t know what you mean with a row becoming a column or the other way around. We’re just dealing with columns here, that’s part of the essence of SQL.


Your query has at least a couple of issues. Your first CTE, that is, tot_cust_country, results in the following:

country tot_customers
Other 1
Other 1
Other 1
Other 1
Brazil 5
Canada 8
Other 1
Czech Republic 2
Other 1
Other 1
France 5
Germany 4
Other 1
India 2
Other 1
Other 1
Other 1
Other 1
Other 1
Portugal 2
Other 1
Other 1
USA 13
United Kingdom 3

This isn’t what you want to get. I suspect you want the table below.

country tot_customers
Brazil 5
Canada 8
Czech Republic 2
France 5
Germany 4
India 2
Other 15
Portugal 2
USA 13
United Kingdom 3

But even after you fix this, you’re going to run into problems because you later join on country, but there’s no country named Other in the other tables that participate in the join. That’s why your table is missing a row.


I suggest that you take a look at what each of your CTEs looks like. If you had done this, you’d have notice immediately that you got something wrong right at the beginning.

Hello all!

I´m currently struggling with the first step, to aggregate customers for each country. It works fine for all countries except for my custom bucket called “Others”. Here´s my code so far:

WITH countries AS(
             SELECT
                 COUNT(customer_id)AS count,
                 country
             FROM customer
             GROUP BY country
             ORDER by count ASC)
         
    SELECT
        count,
        bins
    FROM
        (
        SELECT
            c.*,
            CASE
                WHEN c.count = 1 THEN "Other"
                ELSE c.country
            END AS bins
            
        FROM countries AS c
       
        
       )
    GROUP BY bins
    ORDER BY count ASC

giving me the following outpout:

bins count
Brazil 5
Canada 8
Czech Republic 2
France 5
Germany 4
India 2
Other 1
Portugal 2
USA 13
United Kingdom 3

It´s counting everything alright, but not summing up “Others” correctly. Looking for something like this:

I refuse to check the solution just yet but can´t figure out where I´m going wrong :-/
any help/hint would be appreciated.

thanks all! (hope it formats correctly)

br
Marina

Nice :muscle:

The countries CTE is fine. The subquery

SELECT c.*, CASE
             WHEN c.count = 1 THEN "Other"
             ELSE c.country
            END AS bins
  FROM countries AS c;

is also fine and it results in

count country bins
1 Poland Other
1 Spain Other
1 Sweden Other
2 Czech Republic Czech Republic
2 India India
2 Portugal Portugal
3 United Kingdom United Kingdom
4 Germany Germany
5 Brazil Brazil
5 France France
8 Canada Canada
13 USA USA

And in the above table you’re basically selecting count and grouping by bins, but you’re not using any aggregate function. Do you see why this wouldn’t work?

1 Like

Hey Bruno!

Thank you so much for taking the time to check my code and giving hints rather than the full solution :slight_smile: With your help I was able to figure it out: sum(count) was what I was missing, now it works :slight_smile:

on to the next mission!!!
many thanks and a happy new year !!

Marina

1 Like