Does it really matter which table that I'm calling "FROM"?

Screen Link:
Building And Organizing Complex Queries — Creating Views | Dataquest

My Code:

DROP VIEW IF EXISTS chinook.customer_gt_90_dollars;
CREATE VIEW chinook.customer_gt_90_dollars AS
select chi.*
from chinook.customer as chi
INNER JOIN invoice as i on i.customer_id=chi.customer_id
GROUP BY 1
HAVING sum(i.total>90);

SELECT * 
FROM chinook.customer_gt_90_dollars;

What I expected to happen:

What actually happened:

[0 rows x 13 columns]
customer_idfirst_namelast_namecompanyaddresscitystatecountrypostal_codephone

The correct answer according to DQ is :


DROP VIEW IF EXISTS chinook.customer_gt_90_dollars;
CREATE VIEW chinook.customer_gt_90_dollars AS 
    SELECT
        c.*
    FROM chinook.invoice i
    INNER JOIN chinook.customer c ON i.customer_id = c.customer_id
    GROUP BY 1
    HAVING SUM(i.total) > 90;
SELECT * FROM chinook.customer_gt_90_dollars;

I just don’t understand it. Maybe I’m not seeing it but from what I see, the ONLY difference is that I used chinook.customer as chi rather than chinook.invoice i as my “from”. I joined my customer and invoice table by the customer_id. So what seems to be the issue?

HAVING SUM(i.total) > 90; vs HAVING sum(i.total>90);

You faced a silent failure of the code. Even harder to debug than raised errors.

When writing sql, try to break down thoughts into really granular bits. Then the latter code won’t happen.
Think of these.

  1. HAVING requires an aggregation → Immediately you will choose a column and close the bracket here already
  2. HAVING requires a predicate (Predicates in SQL – True, False and UnKnown | A Programmer Inner Voice), it should have as input a Boolean Value used to filter rows after grouping

The wrong query is returning 0 rows because i.total>90 is all False. False is treated as 0 in sqlite. Sum all rows of 0 is 0. 0 is treated as False as input predicate to HAVING for each row. False means that row is not chosen to be in result set.

You can try doing HAVING 2, HAVING 0, HAVING -1 to see how True/ False maps to integers. (Datatypes In SQLite I have no idea why negative numbers are treated as True by SQLite.
Practically there’s no reason to hardcode integers into clauses (WHERE/HAVING) using predicates. Usually this is seen during debugging only when you want to control sql behaviour.

1 Like

Why is the i.total>90 false? in my code? I am pulling the total column from the invoice table.
The correct answer is doing that too from my understanding.

Your closing bracket is at the wrong place
Dataquest is SUM, then compare. Yours is compare, then SUM.

2 more examples to highlight the difference (don’t need to answer me):

  1. What’s the difference between (1+2)*3 vs 1+2 * 3?
  2. What’s the difference between DQ answer: (50+60) > 90 and Your answer: (50 > 90) + (60 > 90) ?

This is what your wrong i.total>90 is doing.

  1. For each group, look at values from i.total (they are each <= 90, because every value in overall column is <=90, so group or not makes no difference, but to be clear, HAVING looks at statistics group by group)
  2. Compare each value against 90
  3. If it’s more than 90 that row will be True, else False.
  4. The whole term was wrapped in SUM
  5. Sqlite stores True as 1, False as 0
  6. Every row in step 3 returned False (Imagine simple x < y comparison done row by row)
  7. SUM(as many 0 as there are rows in i.total FOR THE CURRENT GROUP, NOT WHOLE COLUMN) = 0
  8. HAVING uses output of predicate (0 in this case) to decide if the group should appear in resultset
  9. In the correct way, HAVING uses SUM(i.total) > 90. total of each group is summed then compared against 90 to decide if the grouped data stays in resultset.
  10. Your sum(i.total>90) , mixed with the data of all <=90 means HAVING sees a 0 as it’s predicate input for every group it decides to include/exclude in resultset, so every group is excluded (0 → False) and overall result is empty.

Try the following in sqliteonline.com using sqlite engine.
Demonstrating behaviour of WRONG HAVING given different data

DROP TABLE if EXISTS sales;
CREATE TABLE sales (id int, amt INT);
INSERT INTO sales VALUES
(1, 10),
(1, 90),
(2, 20),
(2, 30)

New tab

SELECT id, SUM(amt) from sales
GROUP by id 
HAVING SUM(amt>90)

See that when all amt are <= 90, the query will return nothing.
If you change any of these 4 amount to > 90, it’s corresponding id {1,2} will appear in results, because that group will then have at least 1 row returning True to amt>90 , causing predicate SUM(amt>90) to return some non-zero positive number, which causes HAVING to see the group as “should be in result”.

1 Like

Thank makes sense! thanks! :slight_smile: