Don't understand why grouping yields wrong numbers

Screen Link:

https://app.dataquest.io/m/191/guided-project%3A-answering-business-questions-using-sql/5/analyzing-sales-by-country

My Code:

%%sql
SELECT c.country, COUNT(i.customer_id) AS customers_count, SUM(i.total) AS total_sales
    FROM invoice AS i
    INNER JOIN customer AS c ON c.customer_id = i.customer_id
    GROUP BY 1
    ORDER BY 2 DESC

What I expected to happen:

I was trying to put together a working table grouping number of customers and total sales amounts by country. I joined two tables and grouped data by country.

What actually happened:

The table returned was o’kay except for the fact that number of customers for some countries (for example, USA, Canada and Brazil) were greater than total number of customers (which is 59) and, of course, if I sum up total number of customers as per my table it would exceed 59 by several times. I do not understand why this grouping does not work properly and I feel I cannot figure this out myself. If anyone can give me a brief explanation, I would be very grateful. Also, please, note I am not yet done with the Step 5 of this Guided Project and at this point I need a help with this particular issue, not the whole project (at least for now).

Thanks!

1 Like

You should include DISTINCT

SELECT c.country, COUNT(DISTINCT i.customer_id) AS customers_count, SUM(i.total) AS total_sales
    FROM invoice AS i
    INNER JOIN customer AS c ON c.customer_id = i.customer_id
    GROUP BY 1
    ORDER BY 2 DESC
1 Like

Thanks! It worked! I will search the web to understand the issue.

2 Likes

@ntokoldoshev were you able to understand the issue? I have the same thing happening, just want to better understand why. Hoping you found some reasoning to it.

1 Like

It looks like the issue is that the abnormally high number without DISTINCT represents unique customers who ordered more than once. So it’s counting person A multiple times because person A ordered multiple times, when we only want them counted once. In web analytics, this would be the difference between a unique user and a visit. One unique user could account for multiple visits, but it’s still only one person.

1 Like

@ghighcove

Kindly check this W3Schools tutorial to see exactly how it works.

To figure out what is going on, run this three codes separately and inspect the customer_id columns:

%%sql
SELECT * From customer
%%sql
SELECT * from invoice
%%sql
SELECT c.country, i.customer_id
FROM invoice as i
INNER JOIN customer AS c ON c.customer_id = i.customer_id
1 Like