Doing arithmetic with nully data
This is the fourth post in a series of twelve in Tim Fords Entry-Level Content Challenge.
In this blog post, I will explain how SQL Server deals with nulls in expressions involving arithmetic (a posh word for 4th grade math… :) )
Contrary to my hero Chris Date, I don’t mind NULLs. But the introduction of NULL to arithmetic in SQL makes it necessary to know how NULL values are treated in the SQL standard and in the T-SQL implementation.
First of all, in the SQL language NULL represents an unknown value.
And so it begins…
If NULL is used in an in-row expression, the full expression will be NULL. This makes sense for most arithmetic operations, because NULL means unknown.
Here is an example with NULL plus/minus ONE:
And everything is still honky-dory when we divide/multiply 1 (one) with NULL:
(we still get NULLs as result, that is).
Any number multiplied by zero will be zero (because the integers/rational numbers/real numbers are what mathematicians call rings):
0*x=x*0 = 0, for all numbers x
What if we multiply NULL with zero? Well, we get…
NULL as the result. This kind of sounds wrong, but maybe this is actually the best we can do. Because if the result was not NULL, then we could do something like
0 = 0*NULL = (1-1)*NULL = NULL-NULL,
which is definitely not sound (provided that the distributive law holds for NULL).
What about dividing NULL with zero? Well, we get an (at first) unexpected result:
But if we take a closer look, then we can see that the result makes sense (sort of):
NULL/0 = NULL/(1-1)=1/(1/NULL – 1/NULL)=1/(NULL-NULL) =1/NULL=NULL
Finally, lets divide NULL by itself:
The result is NULL, because NULL is never equal to itself (well at least in-row…)
Confused? Well, I’m afraid things are just getting worse…
When we have NULL values across rows, the rules for arithmetic change…
For starters, let’s add two numbers across two rows (where one of them is NULL):
Now we see that the sum is 1 (one), which is an example of the arithmetic with NULLs are different across rows that for in-row calculations.
Let’s count, then. Just as in the SUM operator, the COUNT operator ignores NULL values:
(unless we count rows and not values, as seen in the last example above).
Even the AVG (average) operator ignores NULLs, as we can see in this example:
This might give unexpected results, if you thought the semantics were like the third example of the average calculations above.
So, if you have attributes which can hold NULL values, and if you do arithmetic on these using SQL, just please remember that things are a little different than you might expect.
Read more about NULL here:
Read more about rings here:
These are the other posts that I wrote in the 2016 entry level series:
Data warehousing 101 - what are data warehouses, why do we build them and how? http://www.pontop.dk/#!Data-warehousing-101-what-are-data-warehouses-why-do-we-build-them-and-how/whpr3/56b78f2a0cf2062bd41b6280
Hyper-V 101 - Your own local data center http://www.pontop.dk/#!HyperV-101-Your-own-local-data-center/whpr3/56c78ff30cf25df9372037c6
Database joins 101 - introduction to join algorithms http://www.pontop.dk/#!Database-joins-101-introduction-to-join-algorithms/whpr3/56db1e330cf2bc6add1a719d