sales_by_country = """
WITH country_or_other AS
COUNT(*) FROM customer
***WHERE country = c.country***
) = 1
END AS country,
FROM invoice_line il
INNER JOIN invoice i ON i.invoice_id = il.invoice_id
INNER JOIN customer c ON c.customer_id = i.customer_id
SUM(unit_price) / COUNT(DISTINCT customer_id) customer_lifetime_value,
SUM(unit_price) / COUNT(DISTINCT invoice_id) avg_order_value
GROUP BY 1
ORDER BY 2 DESC
Could some please help me to understand why we need the row country = c.country in CASE statement but not GROUP BY country instead?
The word in CASE statement literally means,
“Should there be BUYING and SELLING in ‘Holy Sea’ (a unrecognized country, with only transactions with other world which makes them recognized somewhat as a ‘country’ within a country).”
Those I believe, the statement means. Having read through your program, the validity of the program would be tested with the use of in CASE or ORDER BY.
Did you ever figure out what the WHERE country = c.country means?
I got stuck at this exact spot. I can see the answer in the answer key, and I can see that I need the line for the code to run properly. Thankfully StackOverflow had an answer.
(I know that you posted this several months ago, so you probably don’t need this information anymore. However, in an effort to minimize duplicate posts about this question, I’m responding to your post, rather than creating my own.)
It seems as though, somehow, this inner query is referencing the outer query. I thought that the subquery always executed first, but it turns out that this is not the case. I found this post on StackOverflow which explains that if a subquery is correlated to the outer query, it will run the outer query first so that the entire query will run.
So, I think the WHERE country = c.country tells the SQL engine that it needs to look at lines where the country of the subquery is equal to the country in customer of the outer query’s second INNER JOIN.
I hope this helps someone, because I got stuck here for a LONG time, stubbornly refusing to look at the answer key. Maybe @Bruno or @nityesh can add the 191-5 tag so that it pops up when searching by tags?
I have been puzzling over this question for a bit now too and am going to try to explain it as a way of figuring it out for myself as well. disclaimer: this is just my best guess, Im not a pro.
Short answer: You cant compress one column with grouping, unless you are also compressing the rest of the columns with either grouping or aggregate functions. This code produces a table with 4757 ‘rows’ but if we were to group the table by country, it would be only 24 ‘rows’.
Longer answer: We dont actually need this specific line of code, but using a CASE statement is a handy way to fulfill the task of converting country names that only have one customer to ‘Other’. I did it a slightly different way, but the beauty of the way this case statement works is that it doesnt require aggregation to work, so they can produce a table with all of the data needed for the challenge with one subquery. Here is how this case statement works. The query inside the case statement SELECT COUNT(*) FROM customer
produces a table that just counts all the rows on the customer table, 59. Adding the Where clause WHERE country = c.country
creates a table with a single column, 59 rows long, with each ‘row’ of the column being the count of the number of times that each Country is in the table. Try running this
WHERE country = c.country
FROM customer c
I honestly dont know how or why it does this, or if we have been shown this technique, but this is what they are doing. Then the case statement turns the country name of rows that have a 1 value, meaning there was only one customer, into ‘Other’. Hopefully someone smarter can shed more light on this, and someone newer can get some value from it as well