This is not really a technical question, since my code provides the correct solution table, but I am looking for input on how to better approach formulating my queries in general by looking at this one in specific. The code provided in the solution set seems to have started by producing a single, non-grouped, table with columns to provide the answers by aggregating in the main query. My approach was based on another posting that made sense to me, but basically, I never felt like I had a plan to get the desired result, I just kept fumbling ahead. But here is roughly my thought process:
-
all of the final questions are grouped by country, so my first step was to create a table grouped by country, and create the ‘Other’ category. I will be able to later join this table based on the original Country column
-
I created another table to do the aggregate functions and grouped them by the original Country column as well.
-
so now I had all the data I need, but I still have to group the ‘Other’ countries together so I used the ‘trick’ shown on this challenge and created a table with the country name or word ‘Other’ and created a column where countries get a 0 and Other gets a 1. Then I grouped the ‘Other’ countries together.
-
Lastly, I just joined all my tables together and selected the requested columns.
So, the code looks disjointed because my thought process was disjointed. Also when putting this together I had a hard time testing each subquery individually since they were linked to each other. Because this project was in a Jupyter notebook, I just opened a new cell, pasted the existing code and ran it with the main query showing only all columns of the subquery I was working on. Wondering if there is a way like in python you can just hide everything you arent working on with hastags? Or anything better than what I’m doing? Any insights welcome, Thanks!
Screen Link:
https://app.dataquest.io/m/191/guided-project%3A-answering-business-questions-using-sql/5/analyzing-sales-by-country
My Code:
%%sql
WITH
a AS (
SELECT c.country country,
CASE
WHEN count(c.customer_id) = 1 THEN 'Other'
ELSE c.country END AS final_list,
count(c.customer_id) customers
FROM customer c
GROUP BY 1
),
b AS (
SELECT c.country country,
SUM(i.total) total,
SUM(i.total) / COUNT(DISTINCT(i.customer_id)) total_per_cust,
SUM(i.total) / COUNT(i.invoice_id) total_per_invoice
FROM customer c
LEFT JOIN invoice i on i.customer_id = c.customer_id
GROUP BY 1
),
c AS ( SELECT a.final_list,
CASE
WHEN a.final_list == "Other" THEN 1
ELSE 0 END AS sort
FROM a
GROUP BY 1,2
)
SELECT a.final_list,
SUM(a.customers) customers,
ROUND(b.total, 2) total,
ROUND(b.total_per_cust, 2) avg_sales_per_customer,
ROUND(b.total_per_invoice, 2) avg_order_value
FROM a
LEFT JOIN b ON a.country = b.country
LEFT JOIN c ON c.final_list = a.final_list
GROUP BY 1
ORDER BY c.sort, 3 DESC;