Going fast! #DataquestChallenge Premium Annual Offer:
500 get 50% & the next 1000 get 40% off.
GET OFFER CODE

Alternative solution for a review

Screen Link: Learn data science with Python and R projects

Here´s a short story of how I came up with the solution. First I wrote a solution that gave the right output. Then I checked the DQ answer and found it way more complicated than my first solution. So, I went to the community to check what other learners thought about the DQ solution and such. Looking through the topics I realized that I made quite a common mistake of leaving one of the non-aggregate columns from the SELECT clause outside the GROUP BY clause. So, I came back to the DQ app to try to find another way around but still somewhat simpler than the one offered by the DQ .

So, here´s my solution:

WITH 
    customer_purchases AS
                    (
                     SELECT i.customer_id,
                            c.country,
                            SUM(i.total) total_purchases
                       FROM invoice i
                      INNER JOIN customer c ON c.customer_id = i.customer_id
                      GROUP BY i.customer_id, c.country
                    ),
    max_purchase AS
                    (
                     SELECT country,
                            MAX(total_purchases) country_max_purchase
                       FROM customer_purchases 
                      GROUP BY country
                     )
                     
SELECT mp.country country,
       c.first_name || " " || c.last_name customer_name,
       cp.total_purchases total_purchased
  FROM customer c
 INNER JOIN customer_purchases cp ON c.customer_id = cp.customer_id
 INNER JOIN max_purchase mp ON mp.country = cp.country AND mp.country_max_purchase = cp.total_purchases 
 ORDER BY country;

What do you think about it? Am I missing some possible pitfalls?

1 Like

Hello @ksenia.kustanovich,

I agree with you. When I faced this task I think the way DQ suggested was safer because there is also 1 thing we need to pay attention to when we use ON…AND with JOIN which is explained in detail here especially the answer given by John Woo.

BUT when I did the Guide Project: Answering Business Questions Using SQL, I came up with 1 question: find the best-selling album in each country based on the number of tracks sold from this album, I realize there is a problem if I use the same way that DQ suggested. The problem is that in USA, there are 2 albums that has the same number of tracks sold and both are the highest (i.e. there are 2 top albums for the US).

From my point of view I explain this problem like this:
when you use a subquery in a SELECT clause, it’s critical that subquery returns exactly 1 value (0 dimension) because the result of that being used to fill in just 1 cell of result. In my work I try to find the album that has the highest number of tracks sold but 2 or 3 albums can have the same number of tracks sold, which are the highest too. => the result of SELECT subquery is 1 dimension, so the engine should return error BUT sqlite doesn’t work like that, it still returns a number and miss other values.

So to solve this problem I try to JOIN 2 tables based on 2 columns and there is an interesting link too.

I am on my way to completing the data science path too, so this is just my thought. I am very please to discuss with others on this and maybe @Bruno and @hanqi can help us to be clear about this? :))
Thank you in advance.

I’m too lazy and too busy at the moment to gather all the context to discuss this, however. . .

. . . this is probably right. SQLite is often very permissive in ways that can be harmful not only for learning, but also for getting the results we actually want.

1 Like

Hi! I too am looking for feedback, and hoping that my “more simple” solution is not a fluke:

WITH sumer AS
    (
     SELECT 
        i.customer_id,
        c.first_name||" "||c.last_name AS customer_name,
        c.country,
        SUM(i.total) AS total_purchased
       FROM customer AS c
      INNER JOIN invoice AS i ON i.customer_id = c.customer_id
      GROUP BY i.customer_id, c.country
    ),
    
    maxer AS
    (
     SELECT *, MAX(total_purchased) AS maxed
       FROM sumer
      GROUP BY country
    )

SELECT country, customer_name, total_purchased FROM maxer;

Thanks!

You can actually get the same answer with even less code:

WITH sumer AS
(
SELECT
i.customer_id,
c.first_name||" "||c.last_name AS customer_name,
c.country,
SUM(i.total) AS total_purchased
FROM customer AS c
INNER JOIN invoice AS i ON i.customer_id = c.customer_id
GROUP BY i.customer_id, c.country
)

SELECT
country,
customer_name,
MAX(total_purchased) AS total_purchased
FROM sumer
GROUP BY country