CYBER WEEK - EXTRA SAVINGS EVENT
TRY A FREE LESSON

Sql except operator not working properly

except is supposed to Identify customers who are in the USA and have not spent $90, but instead it returns one person who has spent over 1000 dollars and even that is wrong as there are two who have please explain what is problem here?

%%sql


        
SELECT
        c.country,
        c.first_name,
        sum(i.total)
    FROM 
        invoice i
    INNER JOIN 
            customer c 
        ON 
            i.customer_id = c.customer_id
         WHERE country = "USA"
    
    GROUP BY 1
    
    except
      SELECT
        c.country,
        c.first_name,
        sum(i.total)
    FROM 
        invoice i
    INNER JOIN 
            customer c 
        ON 
            i.customer_id = c.customer_id
    
    GROUP BY 1
        HAVING SUM(i.total) > 90
2 Likes

1.How many steps does this SQL query have?
2.What is the 2nd last step of this SQL query doing?
3.Can you print out the output of the 2nd last step of this SQL query?

Repeat steps 2+3 (change 2nd to 3rd, 4th, …)

3 Likes

Hi,

After doing below I realise problem is with the group by
I group by customer instead of country it seems ok

1.How many steps does this SQL query have?
Two ? Since there are two select statements

2.What is the 2nd last step of this SQL query doing?
The table after except is filtering first with group by then with this to find those who spent >90$

HAVING SUM(i.total) > 90

3.Can you print out the output of the 2nd last step of this SQL query?

Here is output . It seems correct, but is wrong
as 1040.489 is the total for the whole of the USA.
For some reason it is not including the breakdown by customer
but only aggregating on country when I include customer in the selec t
statement —
SELECT
c.country,
c.first_name,
sum(i.total)


is output of above query

this screenshot is output of below

%%sql

-- NOT IN = EXCEPT

SELECT
        c.country,
        c.first_name,
        sum(i.total)
    FROM 
        invoice i
    inner JOIN 
            customer c 
        ON 
            i.customer_id = c.customer_id
  
    GROUP BY 1
        HAVING SUM(i.total) > 90
        order by 3 desc
1 Like

Great, so now you see for yourself why it’s bad practice to GROUP BY a number. Always use the column name to GROUP BY, not number.

When I asked how many steps, i was actually looking for something like
last step: EXCEPT
2nd last: GROUP BY or GROUPBY + HAVING
3rd last: JOIN

Printing output after every SQL step helps you debug, just like how you do for python programs

3 Likes

yes i see how it improves readability by not grouping by a number,
but I was under wrong impression that as long as I had customer in select
then it would include customer breakdown in output even though grouping
on country.

May I know your reason for defining steps as you do please?
I see a common thread as filtering result by EXC EPT and GROUPBY
but I do not see where the JOIN fits in/rationale for it to be a step.

1 Like

You can test sql conveniently using pandas to design data and pandasql to execute SQL on that data, to save time typing long DDL commands to set up data in sql.

The step thing is an exercise in breaking down SQL code so you know the general order of steps an database engine takes to execute a query. https://sqlbolt.com/lesson/select_queries_order_of_execution.

This helps debug queries, refactor them so you can take a chunk out for reuse, or refactor for pipelines with CTE. It has nothing related to which clause or specifically to JOIN, just a generally concept of “know what are the transformations, and what does the input/output data look like (could be the raw or summarized (usually for ML data validation) stats at every step.”

Writing a giant query at once may solve the problem, but if something goes wrong, unless you have strong SQL theory, it’s hard to know why. Also, it doesn’t promote SQL code reuse. By visualizing intermediate outputs, you may discover more use cases of a subquery (DQ will not teach this because every exercise is very focused on 1 task, so have to be creative to think about other uses)

2 Likes

Many thanks again hanqi. I shall try that method it certainly seems more efficient.

1 Like