SQL summing down columns and across rows, and COALESCE

I’m exploring how to sum down and across rows in SQL on https://sqliteonline.com/ with postgres.

I was surprised that googling postgres sum across columns (simplydf.sum(axis=1)) gave nothing obviously useful.

I’m trying to find all possible operations between columns, but most of what i see is aggregating rows down a single column. Only through stackoverflow i found the syntax as SELECT col1 + col2.

How does one read the postgres docs properly? It looks much harder to find how to do something than pandas docs.

Is it allowed or good practice to transpose rows and cols in SQL just to we can change ugly syntax from + to SUM() and get access to many other functions that only work on 1 column?
Pandas seems to freely allow df.T and playing around with rows and columns, pivoting/melting/stacking. I guess sql enforces same type in a column so transposing is impossible? Then how does pandas deal with it?

Learning how SUM deals with NULL

Query below gives 33, 2 , Looks like it treats them as 0 automatically when summing down a row, and summing output of CASE (what do you call this datatype? List? Recordset?)

-- Query 1
SELECT sum(num1),SUM(CASE WHEN num1<12 THEN 1 END) -- case return null if no match or ELSE clause
FROM (
	VALUES (10,100),
  		   (11,NULL),
  		   (12,NULL),
  		   (NULL,NULL)
  ) cases(num1,num2)

However when going across columns

--Query 2
SELECT *,num1+num2
FROM (
	VALUES (10,100),
  		   (11,NULL),
  		   (12,NULL),
  		   (NULL,NULL)
  ) cases(num1,num2)

This returns null for every row in num1+num2 except 110 for row 1 where all numbers are not NULL, so i guess any NULL among col1+col2+... will make the whole equation NULL.

Is SUM() doing something behind the scenes to convert NULL to 0, or ignoring the row?

I tried changing all num1 to null and it errors function sum(text) does not exist. This error will not appear when any of the values in num1 is not null.

--Query 3
SELECT sum(num1)
FROM (
	VALUES (NULL,100),
  		   (NULL,NULL),
  		   (NULL,NULL),
  		   (NULL,NULL)
  ) cases(num1,num2)
  1. How did it become sum(text)? Was it text because there were no integers in num1 to help it infer that the type should be a number and not text? Could this error be prevented if i had defined types during the table construction using values()? Is that even possible to do? I can’t find how to define column types when using values() to construct table.
  2. If the 4 NULL in num1 were coincidentally created from some dynamic calculation, how to program to prevent SQL erroring like that? Or do we have to accept such behaviour and test with except.

Learning COALESCE to fix issues with summing across row

-- Query 4
SELECT *,COALESCE(num1,0)+COALESCE(num2,0)
FROM (
	VALUES (10,100),
  		   (11,NULL),
  		   (12,NULL),
  		   (NULL,NULL)
  ) cases(num1,num2)

This query is basically Query 2 but wrapping each column in COALESCE to replace NULL with 0 and produce result of 110,11,12,0.

I am still not sure how to practically use COALESCE .

The definitions are mostly Return the first non-null value in a list with examples of SELECT COALESCE(NULL, NULL, NULL, 'W3Schools.com', NULL, 'Example.com') which is pretty useless in teaching how to use coalesce in real life problems.

How is replacing NULL with a default value (like in query 4) related to the above definition?

1 Like

Hi @hanqi

Welcome to the “Club of Confused and Lost Students of Data Science” (the topics may vary, thoughts are the same! :smile:

This post might help you somewhat about Coalesce - https://databasetips.net/2019/01/14/coalesce-nvl-isnull-case-what-should-a-i-use/

A database table is a 2-D structure only (and its kind of static). It is just about rows and columns. The structure is in such a manner that we can increase the number of rows/records continuously but we refrain from adding columns post a certain limit.

Essentially, we pre-determine the columns to be created in a table before we start loading the data. As the addition of a column won’t matter much (you can have a default value to be added to retro records) but deletion of a column might require referential integrity checks such as primary key - foreign key enforcements or dependent views, etc.
So the data type, the expected size of the value, the integrity checks such as duplicates or null values allowed/not allowed, primary key, etc. constraints (we call them so in oracle DB) are identified early on then implemented.

For example - months are fixed in a year i.e. 12 but measures or metrics can be anything and in any numbers, so a table like this would make sense but the other one won’t:

image

above table transposed
image

So it may not be a question of if it’s possible to transpose or not, but more like is it truly required and if the query can actually be tweaked and optimized to get the desired output.

About the sum by row and sum by column, based on the above example, it is evident we will have to add all the 12 columns to get a year-total value. or we can perhaps create calculated columns and fix the calculation for them as below:

image

However, in the real-world scenario, the table structure would somewhat resemble this:

In this example, the addition of all columns will be non-sensical even if they are integers and floats. And for this structure, we perform row-wise queries using “group by” or other clauses.

Using the same example, let’s work on the Null data. This is the same as Numpy.NaN and is treated as a void/ empty space or “Nothing”. now we can’t do anything with nothing. So we need to replace it with some default value, best case 0 when dealing with int or float data types. For string datatypes more often left blank or apply “n/a” etc.

If we don’t replace the Null in the discount field, the TOV and Balance Amt fields would also end up being Null.

The bottom line is each structure is different and brings about its own purpose and importance. So even if the two share some common similarities, we can’t substitute one for another and expect similar behaviours even if we try to distort them completely to be exactly same.

And dude, your answers are always based on a deep and detailed study. My answers are simple and touch the surface. So thanks for your patience if you are still reading it!

Thanks for the link, concise and comprehensive.