Why won't this work? Answering Business Questions Using SQL

Hi, I’m exploring additional methods in Data Quest, and decided to (try) a more simplistic method. However, I’m not really sure as to why ‘Other’ won’t group together.

Screen Link:

My Code:

    Country_Stats AS 
            CASE COUNT(DISTINCT(c.customer_id))
                    WHEN 1
                        THEN "Other"
            END Country,
            COUNT(DISTINCT(c.customer_id)) AS Number_of_Customers,
            SUM(i.total) AS total_sales_value
        FROM customer AS c
        INNER JOIN invoice AS i ON i.customer_id = c.customer_id
        GROUP BY Country
    total_sales_value / Number_of_Customers AS average_value
FROM Country_Stats
ORDER BY Number_of_Customers DESC;

What I expected to happen:

What actually happened:

If you do add an additional ‘GROUP BY’ at the main query, you’ll only get the last row of 'Other in the second image, with the remaining countries.

Any idea?


GROUP BY is executed before SELECT. It’s just an illusion that you think the Country in GROUP BY is referring to the post-CASE processed Country rather than raw country. The Country you are grouping on has not been converted to Other at that moment, then later those countries got editted to show ‘Other’, but that’s just a cosmetic display change that did not affect Grouping logic. Sqlite and Postgres are lax enough to let you use column alias in GROUP BY but not sql server.

Other thing to watch out is if you use Country to ORDER BY in the same query (not like inner-outer as you do here). Because ORDER BY happens after SELECT, now there is confusion as to whether the Country is referring to the raw data, or the aliased column name after CASE converted some countries to ‘Other’. I wrote about it here, ( Looking for insights to approaching my SQL queries at 1st paragraph after 2nd code block, it contains link to stackoverflow explaining docs) and am still unclear about it’s behaviour, so I just edit the column alias to a new name not clashing with any existing column name and ORDER BY the new non-clashing column alias.

Most important paragraph in that stackoverflow:

If an ORDER BY expression is a simple name that matches both an output column name and an input column name, ORDER BY will interpret it as the output column name. This is the opposite of the choice that GROUP BY will make in the same situation. This inconsistency is made to be compatible with the SQL standard.

Input column name refers to Country before CASE converts to some countries to Other, output column name refers to after converted but aliased to the same Country name. I may be wrong, but interpret this statement to mean GROUP BY uses input column name to adhere to the normal sql order of execution behaviour that it does not know the new Country values yet at moment of GROUP BY, and thus uses input Country values.

1 Like

Oh I see. I’ll continue to try and am reading your advice further now. Does this mean using END [Different name to Country] would work?

I wrote an article on this: https://hanqi01.medium.com/how-badly-named-sql-column-aliases-confuse-when-using-group-by-and-order-by-261541f86f9a.

Yes a different column alias and grouping on that new alias will work. If forcefully want to name clash for the sake of it, using subqueries will put the order of execution right and give the same proper result.

	CASE WHEN name < 'c' THEN 'combined'
         ElSE name
    END as alias_name, 
FROM test 
GROUP BY alias_name;

-- name clashing below but subqueried same result as non-name clashing above
SELECT name, SUM(value)
      CASE WHEN name < 'c' THEN 'combined'
           ElSE name
      END as name, 
  FROM test
GROUP BY name;

Thanks for your efforts hanqi. It’s much appreciated! Figured it out thanks to your help.


I don’t have a subscription to Medium but would like to read that article. Is there another link you could share? Thank you

This one may work
1 Like