BLACK FRIDAY EXTRA SAVINGS EVENT - EXTENDED
START FREE

SQL Practice Problems - Logic of using GROUP BY with IS NULL

Screen Link:
Practice Your Skills | Dataquest

My Code:

SELECT s.student_id IS NULL AS survey_missing,
       COUNT(*) AS n_occurences
  FROM performance_updated AS pu
  LEFT JOIN survey AS s
    ON pu.student_id = s.student_id
 GROUP BY survey_missing;

Above is the DQ solution. I have a simple query on how GROUP BY and IS NULL are interacting with each other, but I can’t seem to understand the flow of operation.

Query is that if we are using the condition IS NULL to select rows which are NULL in the table , it would mean we are discarding the NOT NULL rows from further consideration. In that case, how will GROUP BY segregate them into two different groups of IS NULL and NOT NULL as shown in the output, if we have only IS NULL rows ?

image

Or are we saying that IS NULL’s output will be 2 boolean values and GROUP BY segregates them ?

1 Like

Hi Prashant,

IS NULL returns a Series of 1 and 0 values saying if a particular value is null (1) or not (0). To check it, run the following code (it’s the same your code but without COUNT and GROUP BY):

SELECT s.student_id IS NULL AS survey_missing   
  FROM performance_updated AS pu
  LEFT JOIN survey AS s
    ON pu.student_id = s.student_id;

Now, if you add only COUNT, you’ll count all the 1s, i.e., in our case, all the occurences of null values:
изображение_2021-11-20_145517

Finally, when you add GROUP BY (and now you have your intial code), you’ll group the values by being or not null, and count the occurences of both groups.

Series of 1 and 0 is the key to understand. These are small conceptual doubts which may have been covered in missions, but we do not remember. Thank you for responding.

For the future adventurers on DS path, can we also mention is that 1 is True and 0 is False ?
Because if i change the condition to IS NOT NULL , the count of 0s and 1s flip - there are 1900 entries against 0 and 100 against 1

1 Like

For the future adventurers on DS path, can we also mention is that 1 is True and 0 is False ?

Yes, exactly.

if i change the condition to IS NOT NULL , the count of 0s and 1s flip - there are 1900 entries against 0 and 100 against 1

Correct, they will flip. Because if you use IS NOT NULL (or NOTNULL, which is the same thing), then for you the True (i.e., 1s) values will become those cases where the value is not null, and viceversa.

COUNT(*) does not care about value (null or not), it counts number of rows.
This 2000 output includes both 1900 null and 100 non-null

2 Likes

Good point @hanqi, thank you!