Mission link: https://app.dataquest.io/m/191/guided-project%3A-answering-business-questions-using-sql/5/analyzing-sales-by-country
Hi there @lingkans, welcome to the community!
I haven’t done this mission in Python, but I did do the one in R, where some approaches were different, this being one of them. So I’d love to hear the input of someone who’s actually done it.
From looking at the solutions key, it seems like the code you just highlighted is a neat trick to identify the countries with only 1 customer.
From reading the text in the Guided Project screen, we see the following:
So this gives you an idea that the equating of the sub-query to 1 must have been to identify the countries with only 1 customer.
I’ll try to explain the query from the inside out:
where country = c.country
) = 1 THEN "Other"
END AS country,
First, we’ll look at the
This is being used of course to filter individual rows. Let’s say there were 5 different customers in a particular country. These 5 rows would be pinpointed by that
WHERE clause. Likewise if there was only 1 country - that 1 country would be singled out as its own row.
SELECT COUNT(*) FROM customer
So what we’re doing here is we’re counting the number of rows left after the
WHERE clause we ran. If a given country had 5 customers,
SELECT COUNT(*) FROM customer would return 5. If there was only 1 customer in a country, it would return 1.
3. Setting value as “Other”.
This is where your counting of the rows ties back in to what the mission outlined in the quoted text I posted above.
You know that if a given country had only 1 customer, that the following chain of events happen:
where country = c.country clause results in only 1 row remaining,
SELECT COUNT(*) FROM customer line only counts that 1 row, thus returning a value of 1,
- Since the
WHEN (truncated for simplicity) = 1 THEN "Other" condition is now fulfilled, the
Other value is assigned.