Guided Project: Answering Business Questions using SQL - 5.Analyzing Sales by Country - Odd Group By Quirk

Screen Link:

https://app.dataquest.io/m/191/guided-project%3A-answering-business-questions-using-sql/5/analyzing-sales-by-country
My Code:

%%sql

WITH 
    country_count AS
        (SELECT
            country,        
            (CASE
                 WHEN COUNT(country) = 1 THEN 'Other'
                 ELSE country
             END) as country_or_other,
            COUNT(customer_id) as total_customers_per_country
         
         FROM customer
         GROUP BY country
        ),
        
    country_sales AS
        (SELECT 
            c.country,
            SUM(i.total) as total_sales,
            COUNT(i.invoice_id) as count_of_sales

        FROM customer as c
        LEFT JOIN invoice as i ON i.customer_id = c.customer_id

        GROUP BY c.country
        ),
    
    sort_table AS
        (SELECT
            country_or_other as country_name_for_sort,
            (CASE
               WHEN country_or_other == 'Other' THEN 1
              ELSE 0
            END) AS sort
         FROM country_count
        )
    
SELECT
    cc.country_or_other as country_name,
    SUM(cc.total_customers_per_country) as total_customers,
    ROUND(SUM(DISTINCT cs.total_sales), 2) as total_sales,
    ROUND((SUM(cs.total_sales)) / (SUM(cc.total_customers_per_country)), 2) as avg_sales_per_customer,
    ROUND((SUM(cs.total_sales)) / (SUM(cs.count_of_sales)), 2) as avg_order_value
    
FROM country_count as cc
LEFT JOIN country_sales as cs ON cs.country = cc.country
LEFT JOIN sort_table as st ON st.country_name_for_sort = cc.country_or_other

GROUP BY country_name
ORDER BY st.sort, total_sales DESC

What I expected to happen:

country_name total_customers total_sales avg_sales_per_customer avg_order_value
USA 13 1040.49 80.04 7.94
Canada 8 535.59 66.95 7.05
Brazil 5 427.68 85.54 7.01
France 5 389.07 77.81 7.78
Germany 4 334.62 83.65 8.16
Czech Republic 2 273.24 136.62 9.11
United Kingdom 3 245.52 81.84 8.77
Portugal 2 185.13 92.56 6.38
India 2 183.15 91.57 8.72
Other 15 1094.94 73.0 7.45

What actually happened:

country_name total_customers total_sales avg_sales_per_customer avg_order_value
USA 13 1040.49 80.04 7.94
Canada 8 535.59 66.95 7.05
Brazil 5 427.68 85.54 7.01
France 5 389.07 77.81 7.78
Germany 4 334.62 83.65 8.16
Czech Republic 2 273.24 136.62 9.11
United Kingdom 3 245.52 81.84 8.77
Portugal 2 185.13 92.56 6.38
India 2 183.15 91.57 8.72
Other 225 1094.94 73.0 7.45

And just for visual help I also ran the country_count table on its own:

country country_or_other total_customers_per_country
Argentina Other 1
Australia Other 1
Austria Other 1
Belgium Other 1
Brazil Brazil 5
Canada Canada 8
Chile Other 1
Czech Republic Czech Republic 2
Denmark Other 1
Finland Other 1
France France 5
Germany Germany 4
Hungary Other 1
India India 2
Ireland Other 1
Italy Other 1
Netherlands Other 1
Norway Other 1
Poland Other 1
Portugal Portugal 2
Spain Other 1
Sweden Other 1
USA USA 13
United Kingdom United Kingdom 3
No Error

I have checked the solution on this problem but my code is quite different than the code provided. Everything is working except for the summed up customer count for the Other countries group. I have figured out that it is counting the Other countries as 15, 15 times giving 15 * 15 = 225. It was also doing that to the total_sales column until I added DISTINCT. However, I can’t do that for the country name or it only counts the Other country one time giving me 1.

My first attempt at this problem was so far from right I just deleted everything and started over to come up with the code above. I have now spent multiple hours on this one problem and I am not sure how I can fix this last number.

Can you give me some direction? Will the code above work with a small change or do I need to start over again?

Thanks for your help!
David

2 Likes

Very nicely posed question! A sight for sore eyes.

Let’s focus on the last part of the query. We’ll remove the the aggregations so that we can look at the result of the joined tables.

We’ll focus on the first join and its most important columns:

SELECT country_or_other, total_customers_per_country, total_sales, count_of_sales
  FROM country_count as cc
  LEFT JOIN country_sales as cs ON cs.country = cc.country;

The result of this join is below.

country_or_other total_customers_per_country total_sales count_of_sales
Other 1 39.6 5
Other 1 81.18 10
Other 1 69.3 9
Other 1 60.39 7
Brazil 5 427.68 61
Canada 8 535.59 76
Other 1 97.02 13
Czech Republic 2 273.24 30
Other 1 37.62 10
Other 1 79.2 11
France 5 389.07 50
Germany 4 334.62 41
Other 1 78.21 10
India 2 183.15 21
Other 1 114.84 13
Other 1 50.49 9
Other 1 65.34 10
Other 1 72.27 9
Other 1 76.23 10
Portugal 2 185.13 29
Other 1 98.01 11
Other 1 75.24 10
USA 13 1040.49 131
United Kingdom 3 245.52 28

In your query, the table above is then joined with sort_table (as given by the line of code LEFT JOIN sort_table as st ON st.country_name_for_sort = cc.country_or_other). The table sort_table looks like this:

country_name_for_sort sort
Other 1
Other 1
Other 1
Other 1
Brazil 0
Canada 0
Other 1
Czech Republic 0
Other 1
Other 1
France 0
Germany 0
Other 1
India 0
Other 1
Other 1
Other 1
Other 1
Other 1
Portugal 0
Other 1
Other 1
USA 0
United Kingdom 0

So in st.country_name_for_sort = cc.country_or_other, the value Other makes it match multiple times.

You don’t have to redo everything to fix this particular issue.

Expand for a hint on how to fix it

Modify sort_table to lump the Other countries into a single line.

2 Likes

Thank you so much Bruno! Great explanation. I was able to solve the problem and I learned a great lesson on joins.

Is it just me, or this guided project was just too hard?
I mean in the previous Module: “Building and Organizing Complex Queries”, I got almost no prolem at all, but in this project, I feel a bit overwhelmed :crazy_face:

2 Likes

How were you able to go about this please?
I need your help

Hi @chiemela39! If you read through Bruno’s answer you can see when you get down to the last table he printed out that the “Other” in country_name_for_sort is getting counted multiple times. The way to solve this is to only have it match Distinct countries. It’s been a while since I was working on this but I think I had to use Distinct when matching the join keys. Give it a try and if you are still having issues let me know and I can try to help more.

Hope this helps!
David

Totally agree @DngNguyn

I went ahead and looked at the answers for this one, after spending several hours on the last 2 parts, for the sake of time and moving on.

This on is a toughy!