Screen Link:
https://app.dataquest.io/m/191/guided-project%3A-answering-business-questions-using-sql/5/analyzing-sales-by-country
My Code:
%%sql
SELECT c.country, COUNT(i.customer_id) AS customers_count, SUM(i.total) AS total_sales
FROM invoice AS i
INNER JOIN customer AS c ON c.customer_id = i.customer_id
GROUP BY 1
ORDER BY 2 DESC
What I expected to happen:
I was trying to put together a working table grouping number of customers and total sales amounts by country. I joined two tables and grouped data by country.
What actually happened:
The table returned was oβkay except for the fact that number of customers for some countries (for example, USA, Canada and Brazil) were greater than total number of customers (which is 59) and, of course, if I sum up total number of customers as per my table it would exceed 59 by several times. I do not understand why this grouping does not work properly and I feel I cannot figure this out myself. If anyone can give me a brief explanation, I would be very grateful. Also, please, note I am not yet done with the Step 5 of this Guided Project and at this point I need a help with this particular issue, not the whole project (at least for now).
Thanks!
1 Like
You should include DISTINCT
SELECT c.country, COUNT(DISTINCT i.customer_id) AS customers_count, SUM(i.total) AS total_sales
FROM invoice AS i
INNER JOIN customer AS c ON c.customer_id = i.customer_id
GROUP BY 1
ORDER BY 2 DESC
1 Like
Thanks! It worked! I will search the web to understand the issue.
2 Likes
@ntokoldoshev were you able to understand the issue? I have the same thing happening, just want to better understand why. Hoping you found some reasoning to it.
1 Like
It looks like the issue is that the abnormally high number without DISTINCT represents unique customers who ordered more than once. So itβs counting person A multiple times because person A ordered multiple times, when we only want them counted once. In web analytics, this would be the difference between a unique user and a visit. One unique user could account for multiple visits, but itβs still only one person.
1 Like
@ghighcove
Kindly check this W3Schools tutorial to see exactly how it works.
To figure out what is going on, run this three codes separately and inspect the customer_id columns:
%%sql
SELECT * From customer
%%sql
SELECT * from invoice
%%sql
SELECT c.country, i.customer_id
FROM invoice as i
INNER JOIN customer AS c ON c.customer_id = i.customer_id
1 Like