Possible server bug, would welcome your views

Discussion:

(too old to reply)

jan

2012-02-20 10:21:03 UTC

Hi all, I've run into a bug and AFAICS it's not me (you may well
decide otherwise though).
Setup is Win7, 32-big, dual core and sufficient memory, sql server
2008 R2, patched to SP1 (bug appears pre and post patch).

select @@version
->
Microsoft SQL Server 2008 R2 (SP1) - 10.50.2500.0 (Intel X86) Jun 17
2011 00:57:23 Copyright (c) Microsoft Corporation Developer Edition
on Windows NT 6.1 <X86> (Build 7601: Service Pack 1)

I'm creating functions and views and using those functions in views. I
end up with a handy view 'numeric_cols_diffs_classdefault'. It has two
fields (minclassdata and maxclassdata) which should, if present, never
be the same, so let's find if there any rows where they differ:

select *
from numeric_cols_diffs_classdefault
where minclassdata <> maxclassdata
->
Msg 8114, Level 16, State 5, Line 1
Error converting data type varchar to float.

Doesn't like it.
Try in two steps instead:

select *
into #blaggle
from numeric_cols_diffs_classdefault
->
Warning: Null value is eliminated by an aggregate or other SET
operation.
(16309 row(s) affected)

- followed by -

select *
from #blaggle
where minclassdata <> maxclassdata
->
(84 row(s) affected) & I see the rows.

This cannot be right.
I started manually macro'ing out the functions and views to make a
large, clunky select, got it to work as expected ie. produced rows
without errors as per two-step above.
Views & funcs are not excessively nested, IMO.

Any thoughts?

I will try to isolate the code & data, anonymise it, and maybe post it
here if people want it.

cheers

jan

jan

2012-02-20 11:12:22 UTC

Permalink

On Feb 20, 10:21 am, jan <***@googlemail.com> wrote:
[snip]

further experimenting suggests it's to do with a mix of data and
number of rows. If I clear the table except for first 40 rows, it runs
fine. But if I do

insert into remainders
select *
from remainders

(remainders is the table that lies at the bottom of the views)
repeatedly until there are a decent number (not sure how many, >1280
does it, sometimes > 81920) then it produces the error. Any less and
it produces a result set (albeit blank). It's obviously the same data,
just more of it.

jan

Erland Sommarskog

2012-02-20 11:35:59 UTC

Permalink

Post by jan
select *
from numeric_cols_diffs_classdefault
where minclassdata <> maxclassdata
->
Msg 8114, Level 16, State 5, Line 1
Error converting data type varchar to float.
Doesn't like it.
select *
into #blaggle
from numeric_cols_diffs_classdefault
->
Warning: Null value is eliminated by an aggregate or other SET
operation.
(16309 row(s) affected)
- followed by -
select *
from #blaggle
where minclassdata <> maxclassdata
->
(84 row(s) affected) & I see the rows.
This cannot be right.

Since I don't see your code, I can't say whether this is something that can
be classified as a bug, or just incorrect coding of your side. But whatever,
you need to fix your code in the way I describe below.

First, keep in mind that SQL is not like a traditional imperative lanuage,
but it's declarative. You tell the engine what result you want. The
optimizer figures out how to compute it. Therefore, different query plans
can gives different outcomes where there is a potential source of error.

As for what the error might be, here are two examples:

SELECT *
FROM tbl
WHERE isnumeric(col) = 1
AND convert(float, col) > 2

This is a common error from programmers that are used from short-cutting
in languages like C or C#. But SQL does not have shortcutting, so there is
no guarantee that "convert(float, col)" will not be executed when
isnumeric returns 0. (A second problem is that just because isnumeric
returns 1, does not mean that the string may be convertible to float, but
that's another story.)

The remedy is to write the WHERE clause as

WHERE CASE WHEN isnumeric(col) = 1
THEN convert(float, col)
END > 2

Now you are guaranteed that convert will not be evaluated if the string is
not numeric.

Here is the second example:

SELECT convert(float, col)
FROM tbl
WHERE isnumeric(col) = 1

From a pure logical point of view how SQL is defined, this shold be safe.
The WHERE clause is evaluated first, only later the SELECT list. So if
this fails, this could be construed to be a bug. However, it is a bug by
design. In order to achieve better optimisation, SQL Server may compute
the SELECT lsit before the WHERE clause is evaluated. Which is alright,
as long as there are no errors. So the remedy is the same as above: use
CASE.

--
Erland Sommarskog, SQL Server MVP, ***@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx

jan

2012-02-20 22:53:18 UTC

Permalink

Hi Erland, thanks for replying. Your name is familiar, you may have
answered some prior questions of mine (though a long time ago)

[snip]

Post by Erland Sommarskog
Since I don't see your code, I can't say whether this is something that can
be classified as a bug, or just incorrect coding of your side. But whatever,
you need to fix your code in the way I describe below.
First, keep in mind that SQL is not like a traditional imperative lanuage,
but it's declarative. You tell the engine what result you want. The
optimizer figures out how to compute it. Therefore, different query plans
can gives different outcomes where there is a potential source of error.

[snip]

Post by Erland Sommarskog
This is a common error from programmers that are used from short-cutting
in languages like C or C#. But SQL does not have shortcutting, so there is
no guarantee that "convert(float, col)" will not be executed when
isnumeric returns 0.

I see your point, however what you're describing *is* shortcutting
(aka short circuiting, although I like your description of it), only
without explicit or predicatable ordering. Which is lethal.
If it did not have shortcutting it would evaluate both sides of the
and/or.

Post by Erland Sommarskog
(A second problem is that just because isnumeric
returns 1, does not mean that the string may be convertible to float, but
that's another story.)

sigh. As I found out.

Post by Erland Sommarskog
The remedy is to write the WHERE clause as

[snip]

Post by Erland Sommarskog
SELECT convert(float, col)
FROM tbl
WHERE isnumeric(col) = 1
From a pure logical point of view how SQL is defined, this shold be safe.
The WHERE clause is evaluated first, only later the SELECT list. So if
this fails, this could be construed to be a bug. However, it is a bug by
design. In order to achieve better optimisation, SQL Server may compute
the SELECT lsit before the WHERE clause is evaluated. Which is alright,
as long as there are no errors. So the remedy is the same as above: use
CASE.

Hmm. So MS ditched transparent and comprehensible behaviour for
performance? Without providing a static analysis tool to sniff out
such possible faults? So subtle bugs could hide in live systems
(really *fast*, *performant* live systems!) until the frequency
histogram skewed until the query plan changed then... boom?
Unbelievable. And I'm trying to get my lot off mysql and MS do this?

Okay, back to the point. Unfortunately I can't restore the DB on my
home machine (seems sql2k8 and sql2k8R2 are format-incompatible; am
surprised) but examing my code there are no conjunctions or
disjunctions in there at all. It's fairly unconditional stuff with
some case statements, as functions and views.
I have a suspicion of what it just might be, I'll dig some more and
get back to you.

thanks again

jan

Post by Erland Sommarskog
--
Books Online for SQL Server 2005 athttp://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books...
Books Online for SQL Server 2000 athttp://www.microsoft.com/sql/prodinfo/previousversions/books.mspx

Gene Wirchenko

2012-02-21 01:08:24 UTC

Permalink

[snip]

Post by jan

Post by Erland Sommarskog
From a pure logical point of view how SQL is defined, this shold be safe.
The WHERE clause is evaluated first, only later the SELECT list. So if
this fails, this could be construed to be a bug. However, it is a bug by
design. In order to achieve better optimisation, SQL Server may compute
the SELECT lsit before the WHERE clause is evaluated. Which is alright,
as long as there are no errors. So the remedy is the same as above: use
CASE.

I do not think that Erland is saying that. I think what he means
is that SQL is defined without short-circuiting and that Microsoft has
implemented it -- at least, in this respect -- as defined. The fault
then lies with the SQL standards bunch. Erland, would you please
clarify on this?

I, for one, wish that SQL did have short-circuiting. I expect
some dialects do. I use Visual FoxPro which has short-circuiting in
its native, xBASE evaluation, so I expect it is in the VFP SQL
expression handling (since VFP SQL expressions can contain VFP
functions).

[snip]

Sincerely,

Gene Wirchenko

Erland Sommarskog

2012-02-21 08:56:22 UTC

Permalink

Post by Gene Wirchenko
I do not think that Erland is saying that. I think what he means
is that SQL is defined without short-circuiting and that Microsoft has
implemented it -- at least, in this respect -- as defined. The fault
then lies with the SQL standards bunch. Erland, would you please
clarify on this?

I don't know what the standards say, but short-circuiting is completely out
of place in a declarative language like SQL.

Gene Wirchenko

2012-02-21 18:10:32 UTC

Permalink

On Tue, 21 Feb 2012 08:56:22 +0000 (UTC), Erland Sommarskog

Post by Erland Sommarskog

I don't know what the standards say, but short-circuiting is completely out
of place in a declarative language like SQL.

Why?

Sincerely,

Gene Wirchenko

Erland Sommarskog

2012-02-21 21:23:01 UTC

Permalink

Post by Gene Wirchenko
On Tue, 21 Feb 2012 08:56:22 +0000 (UTC), Erland Sommarskog

Post by Erland Sommarskog
I don't know what the standards say, but short-circuiting is completely
out of place in a declarative language like SQL.

Why?

Consider:

SELECT ...
FROM verybigtable
WHERE nonindexecol1 = @par1
AND nonindexecol2 = @par2
AND unselectiveindexcol = @par3
AND veryselectiveindexcol = @par4

With legislated short-circuiting this would be a slow query. Since you
want the the condition on nonindexecol1 to be evaluated first, the engine
has to scan the table.

Of course what you want to happen is that the last condition is evaluated
first by locating the matching rows, which then are filtered against the
the remaining conditions.

This becomes even more pronounced if you have a query that joins ten
tables and there are conditions on columns from all tables.

--
Erland Sommarskog, SQL Server MVP, ***@sommarskog.se

Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx

Gene Wirchenko

2012-02-21 21:35:33 UTC

Permalink

On Tue, 21 Feb 2012 22:23:01 +0100, Erland Sommarskog

Post by Erland Sommarskog

Post by Gene Wirchenko
On Tue, 21 Feb 2012 08:56:22 +0000 (UTC), Erland Sommarskog

Post by Erland Sommarskog
I don't know what the standards say, but short-circuiting is completely
out of place in a declarative language like SQL.

Why?

SELECT ...
FROM verybigtable
With legislated short-circuiting this would be a slow query. Since you

Good answer. Thank you.

Mind you, there seem to be cases where short-circuiting could be
useful. Any I have seen mention of have a guard conjunct to prevent
execution of the second conjunct when that would result in an error or
exception. Is there a practical way to avoid all of these?

[snip]

Sincerely,

Gene Wirchenko

Gert-Jan Strik

2012-02-21 21:55:05 UTC

Permalink

Post by Gene Wirchenko
Mind you, there seem to be cases where short-circuiting could be
useful. Any I have seen mention of have a guard conjunct to prevent
execution of the second conjunct when that would result in an error or
exception. Is there a practical way to avoid all of these?

You cannot prevent short-circuiting. Finding suitable shortcuts is what
makes the query optimizer so effective.

But of course the point is that you cannot rely on any specific short
circuit. So if an "unfavorable" query plan would lead to errors, then
your query is not properly written.

The most common protection against runtime conversion errors or division
by zero errors are to encapsulate the expression in a case expression.
For example:

SELECT CASE WHEN my_varchar NOT LIKE '%[^0-9]%' THEN CAST(my_varchar AS
int) END
, CASE WHEN div <> 0 THEN col1 / div END
, col1 / NULLIF(div,0)

--
Gert-Jan

Gene Wirchenko

2012-02-21 22:57:12 UTC

Permalink

On Tue, 21 Feb 2012 22:55:05 +0100, Gert-Jan Strik

Post by Gert-Jan Strik

You cannot prevent short-circuiting. Finding suitable shortcuts is what
makes the query optimizer so effective.
But of course the point is that you cannot rely on any specific short
circuit. So if an "unfavorable" query plan would lead to errors, then
your query is not properly written.
The most common protection against runtime conversion errors or division
by zero errors are to encapsulate the expression in a case expression.
SELECT CASE WHEN my_varchar NOT LIKE '%[^0-9]%' THEN CAST(my_varchar AS
int) END
, CASE WHEN div <> 0 THEN col1 / div END
, col1 / NULLIF(div,0)

Is such a construct as above guaranteed to be safe?

Am I reading the code correctly? There are three columns
selected. The first two are defined by one-condition cases that do
not span the possible values so making the results sometimes either --
I know not which -- undefined or some default value (null?). The
third uses nullif() as a guard.

Sincerely,

Gene Wirchenko

Erland Sommarskog

2012-02-22 08:03:55 UTC

Permalink

Post by Gene Wirchenko
On Tue, 21 Feb 2012 22:55:05 +0100, Gert-Jan Strik

Post by Gert-Jan Strik
SELECT CASE WHEN my_varchar NOT LIKE '%[^0-9]%' THEN CAST(my_varchar AS
int) END
, CASE WHEN div <> 0 THEN col1 / div END
, col1 / NULLIF(div,0)

Is such a construct as above guaranteed to be safe?

Yes. WHEN is always computed before THEN, and they are computed in order.

Post by Gene Wirchenko
Am I reading the code correctly? There are three columns
selected. The first two are defined by one-condition cases that do
not span the possible values so making the results sometimes either --
I know not which -- undefined or some default value (null?). The
third uses nullif() as a guard.

If there is no ELSE in a CASE expression and no WHEN branch matches, the
return value is NULL.

Gene Wirchenko

2012-02-22 17:10:49 UTC

Permalink

On Wed, 22 Feb 2012 09:03:55 +0100, Erland Sommarskog
<***@sommarskog.se> wrote:

[snip]

Thank you.

Sincerely,

Gene Wirchenko

jan

2012-02-23 00:24:04 UTC

Permalink

Hi Erland,
I'm afraid I been very busy and haven't had a chance to get back to my
possibly buggy code, however I'd like to question the terminology of
your example:

SELECT ...
FROM verybigtable
WHERE nonindexecol1 = @par1
AND nonindexecol2 = @par2
AND unselectiveindexcol = @par3
AND veryselectiveindexcol = @par4

What you are describing is most definitely short-circuiting, because
your veryselectiveindexcol is (hopefully) being evaluated first and
therefore a large number of rows are being discarded without
evaluating the other AND's. These other predicates are subsequently
invoked on the hopefully much reduced set of rows.
What this is called, I don't know. However it is obviously desirable
behaviour *from a practical standpoint*. Which brings me on to my
main concern.
To be clear, what we're talking about our issues around
*performance*. What concerns me is the *semantics* of the query.
Performance is critical in the real world, but it's not much use if we
don't know the semantics of the query -- in other words, what that
query is supposed to return.
Theoretically your query should return the same results whatever order
is chosen (presuming no divisions-by-zero or other timebombs hidden
inside it). The only difference in the various orders of evaluation
of predicates is how much time/space may be used.
So, stuff performance for the moment, how do I know the query is
semantically correct? How do I know it's not going to blow up on me
when the query plan changes? This is absolutely critical and on a
live system where your business may be riding on it.
So, how am I supposed to know whether my query is safe or not? That
is my question.
It was partly answered I think by some of the links you posted which
pass on to a discussion by
Chris Date <http://www.dbdebunk.com/page/page/1351381.htm>:

<<
SELECT *
FROM R1
WHERE NUM > 0 ;

is defined as follows:

1. Evaluate R1.
2. Restrict the result of the previous step to just those rows
satisfying NUM > 0.
3. Project the result of the previous step over all of its
columns (which is effectively a no op, of course).
Therefore this query

SELECT convert(float, col)
FROM tbl
WHERE isnumeric(col) = 1

is unambiguous and should never fail.
I'll go back to Chris Date's paper and try and do it justice in the
next few days.
But in contrast, going back to another of your examples:

SELECT *
FROM tbl
WHERE isnumeric(col) = 1
AND convert(float, col) > 2

What are the intended semantics of this? What should it produce? I
agree with you that this is probably flawed, could its faults be
automatically detected and the entire query rejected, even if it erred
on the conservative side? Unfortunately for all the stupid bells and
whistles that Microsoft throws in to its SQL Server products, there is
nothing I'm aware of to pick up on horrible little timebombs like
this. I don't think it should be too difficult to do either (famous
last words). Even some syntactic support for (what I would call)
explicit short-circuiting would be nice; x ANDTHEN y and x ORELSE
y is I think the notation used in modula 3. But we don't have that
so we have to use your case statement, and quite frankly I think this
is the first time I've ever seen this discussed or this solution
proposed, which is worrying. But thanks for bringing it to my
attention.

Post by Gene Wirchenko
On Tue, 21 Feb 2012 08:56:22 +0000 (UTC), Erland Sommarskog

Post by Erland Sommarskog
I don't know what the standards say, but short-circuiting is completely
out of place in a declarative language like SQL.

Why?

SELECT ...
FROM verybigtable
With legislated short-circuiting this would be a slow query. Since you
want the the condition on nonindexecol1 to be evaluated first, the engine
has to scan the table.
Of course what you want to happen is that the last condition is evaluated
first by locating the matching rows, which then are filtered against the
the remaining conditions.
This becomes even more pronounced if you have a query that joins ten
tables and there are conditions on columns from all tables.
--
SQL 2008:http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005:http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx

Gene Wirchenko

2012-02-23 04:14:15 UTC

Permalink

Post by jan
I'm afraid I been very busy and haven't had a chance to get back to my
possibly buggy code, however I'd like to question the terminology of
SELECT ...
FROM verybigtable
What you are describing is most definitely short-circuiting, because
your veryselectiveindexcol is (hopefully) being evaluated first and
therefore a large number of rows are being discarded without
evaluating the other AND's. These other predicates are subsequently
invoked on the hopefully much reduced set of rows.

NO, that is not short-circuiting. That is optimising by
selectively reordering evaluation.

Short-circuiting is very precise in the order of evaluation. For
example, logical AND is commonly short-circuited, and it works like
this:
Original expression:
a and b
Evaluated as follows:
if a
if b
result is true
else
result is false
else
result is false
Note that if a is false, b will never be evaluated. For simple
variables, this is usually not a big deal, but the conjuncts could be
expressions in their own right and include function calls.

[snip]

Sincerely,

Gene Wirchenko

jan

2012-02-23 21:58:00 UTC

Permalink

NO, that is not short-circuiting. That is optimising by
selectively reordering evaluation.
Short-circuiting is very precise in the order of evaluation. For
example, logical AND is commonly short-circuited, and it works like
a and b
if a
if b
result is true
else
result is false
else
result is false
Note that if a is false, b will never be evaluated. For simple
variables, this is usually not a big deal, but the conjuncts could be
expressions in their own right and include function calls.
[snip]
Sincerely,
Gene Wirchenko

Yes, you are right, I agree.

jan

Erland Sommarskog

2012-02-24 19:05:25 UTC

Permalink

Post by jan
What you are describing is most definitely short-circuiting, because
your veryselectiveindexcol is (hopefully) being evaluated first and
therefore a large number of rows are being discarded without
evaluating the other AND's.

As Gene pointed out, this is not short-circuiting. Short-circuiting is
deterministic. Some languages have it, some has not. And there are
languages like Ada that has both. (AND/OR will/may evaluate both whereas
AND THEN and OR ELSE will short-circuit.)

Post by jan
To be clear, what we're talking about our issues around
*performance*. What concerns me is the *semantics* of the query.
Performance is critical in the real world, but it's not much use if we
don't know the semantics of the query -- in other words, what that
query is supposed to return.
Theoretically your query should return the same results whatever order
is chosen (presuming no divisions-by-zero or other timebombs hidden
inside it). The only difference in the various orders of evaluation
of predicates is how much time/space may be used.

As long as we disregard errors like division by zero, most SELECT queries
are deterministic. That is, they will return the same result no matter
the query plan. The exception to the rule is when you have an ORDER BY
clause which does not identify a unique condition, or when you left out
ORDER BY altogether, for instance: "SELECT TOP 1 col FROM tbl".

Post by jan
So, how am I supposed to know whether my query is safe or not? That
is my question.

And the answer from Microsoft is that we should use CASE as shown
previously in the thread.

Post by jan
Therefore this query
SELECT convert(float, col)
FROM tbl
WHERE isnumeric(col) = 1
is unambiguous and should never fail.

But it may do. Both on SQL Server and on Oracle. And I would not be
surprised if it happens in DB2 as well.

And, no, I am not entirely happy with it myself. Not the least if you
go even further and try:

WITH CTE AS (
SELECT col FROM tbl WHERE isnumeric(col) = 1
)
SELECT col FROM CTE

It still doesn't help.

jan

2012-02-27 00:21:11 UTC

Permalink

Post by Erland Sommarskog

I agree, I was wrong.

[snip]

Post by Erland Sommarskog
As long as we disregard errors like division by zero, most SELECT queries
are deterministic. That is, they will return the same result no matter
the query plan. The exception to the rule is when you have an ORDER BY
clause which does not identify a unique condition, or when you left out
ORDER BY altogether, for instance: "SELECT TOP 1 col FROM tbl".

Well, any error that can cause a failure, which would include type
errors. But yes, these shouldn't be in there either. It would be a
great idea to have a static checker. On consideration such a utility
would be impossible in general, but perhaps useful enough even if very
conservative.
But that's now not my main concern because I've tracked down the
problem and it certainly looks a lot like the faulty optimisation you
describe.

[snip]

Post by Erland Sommarskog

Post by jan
Therefore this query
SELECT convert(float, col)
FROM tbl
WHERE isnumeric(col) = 1
is unambiguous and should never fail.

But it may do. Both on SQL Server and on Oracle. And I would not be
surprised if it happens in DB2 as well.

Here's the core of my query, sufficient to reproduce the bug. All
function/views inlined, all crud trimmed out:

select
min(data_got) as mindata,
from_col,
unique_id
from
(
select cast(data_got as float) as data_got, from_col, unique_id
from remainders
where from_col in
(
select colname
from numeric_cols
where discr_or_cont = 'c'
)
) as yy
group by from_col, unique_id

It's data scrubbing so it does odd things like put all data into one
column (called data_got) and tag it (with from_col) and cast it to
float if from_col is a numeric column. Not very relational but it is
ok for this.
The inner select runs fine, it only fails if it's invoked by the outer
select, which is where the cast is incorrectly applied to non-numeric
data. Sometimes. When it feels like it. Depending on quantities of
data and their distribution.
I notice, a little surprisingly, that if you add option(force order)
to the outer select then it works. Worth knowing. Now I can feel
confident of correct answers, if I live long enough to see them.

Joking aside this is really bad. I'm trying to persuade my company off
mysql to mssql because I've had a lot of very positive experiences
with mssql 2000. It was almost unstoppable, and I drove it very, very
hard.
In comparison I've found 2k8 to feel very much like a decent beta, not
RTM. I've broken it too many times, and now a few days wasted because
it's bust by design. I'll have a look at postgres.
BTW can anyone provide a link to a detailed statement/KB by MS that
this is behaviour by design? I'd like to know a bit more.

Post by Erland Sommarskog
And, no, I am not entirely happy with it myself. Not the least if you
WITH CTE AS (
SELECT col FROM tbl WHERE isnumeric(col) = 1
)
SELECT col FROM CTE
It still doesn't help.

I'll bet. BOHICA

Thanks Erland, and everyone else. Much appreciated.

jan

Post by Erland Sommarskog
--
SQL 2008:http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005:http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx

Erland Sommarskog

2012-02-27 06:57:23 UTC

Permalink

Post by jan
Joking aside this is really bad. I'm trying to persuade my company off
mysql to mssql because I've had a lot of very positive experiences
with mssql 2000. It was almost unstoppable, and I drove it very, very
hard.
In comparison I've found 2k8 to feel very much like a decent beta, not
RTM. I've broken it too many times, and now a few days wasted because
it's bust by design. I'll have a look at postgres.
BTW can anyone provide a link to a detailed statement/KB by MS that
this is behaviour by design? I'd like to know a bit more.

You may or may not like it, but you are only silly if your write this off
as a bad quality. It is a concious design, and Microsoft has made very clear
that you need to use CASE to get around this. And, no, there is no
difference with regards to SQL 2000. Well, maybe the optimizations are more
aggressive. And, no, adding OPTION (FORCE ORDER) is not the correct
approach. The correct approach is to use CASE and nothing else. That's the
only time that Microsoft gives you the guarantee.

And, no, I don't have any reference to KB articles or similar. They probably
exists. I know that I know because I have discussed this with Microsoft
people more than once.

However, I gave you a link to a blog post that covered this precise
problem. Did you read it? Did you notice that it discussed this
precise problem - only that the product was Oracle not SQL Server.

jan

2012-02-27 23:06:16 UTC

Permalink

Post by Erland Sommarskog

You may or may not like it, but you are only silly if your write this off
as a bad quality. It is a concious design, and Microsoft has made very clear
that you need to use CASE to get around this.

Hi Erland,
well, let me make some points and you decide.
* I have never seen this 'broken' optimisation documented in BOL, nor
any way of dealing with it, until your helpful reply.
* the working/suddenly-not-working behaviour is deeply unintuitive (do
you disagree?)
* according to the blog link you gave, this optimisation is incorrect
according to the SQL standard
* Chris Date thinks it's wrong.
* the semantics of it are undefined (can you provide a definition?)
* the optimisation can be made safe manually (assuming you ever find
out that it was unsafe in the first place before it going live) by
introducing an ordering via a suitable construct (a case statement in
this example), so why is the ordering not introduced *automatically*,
so as to keep the original semantics?
* the first line of the wikipedia entry on declarative programming:
"In computer science, declarative programming is a programming
paradigm that expresses the logic of a computation without describing
its control flow". SQL *is* a declarative language (do you agree?)
but to get the correct answers (from a statement which was *correct in
the first place*!!), I am expected to provide explicit control flow?
* very importantly, are there any other nasty issues like this which I
(or even you) are not aware of? How can you or I tell if they're not
documented?

Further to my point that this correctness-knackering optimisation is
not documented anywhere that I'm aware of, I believe the civil
aviation authority considers documentation to *be* software, and that
failure in documentation is just as serious as failure in code. Does
that make sense to you as it makes sense to me? (Afraid I can't find
this reference, but I believe I saw it originally in the totally
excellent book Safeware by Nancy Leveson).

Here's my manual application of your case statement trick to my
original core query: Original statement:

select
min(data_got) as mindata,
from_col,
unique_number
from
(
select cast(data_got as float) as data_got, from_col,
unique_number
from remainders
where from_col in
(
select colname
from numeric_cols
where discr_or_cont = 'c'
)
) as yy
group by from_col, unique_number

Fairly readable. And now with the predicate hoisted up to guard the
cast:

select
min(data_got) as mindata,
from_col,
unique_number
from
(
select
case
when
(
from_col in
(
select colname
from numeric_cols
where discr_or_cont = 'c'
)
)
then cast(data_got as float)
end as data_got,
from_col,
unique_number
from remainders
where from_col in
(
select colname
from numeric_cols
where discr_or_cont = 'c'
)
) as yy
group by from_col, unique_number room

I don't think this is any more readable, to be honest. Do you think
this should have been done by SQL Server, or by me?

In addition I'm trying to wean my colleagues off their excel-is-a-db
viewpoint, and they are actually looking forward to learning sql on a
proper database given the havoc they've had from 100 meg workbooks
crashing (really, they've had it bad). I was going to introduce it as
a simple set of composable operations... but turns out they're not.
How am I going to explain this one to them?

I don't care if MS and oracle do it. So far, this absolutely looks
like bad quality to me. I am open to persuasion them that it is
otherwise.

Post by Erland Sommarskog
And, no, there is no
difference with regards to SQL 2000. Well, maybe the optimizations are more
aggressive.

I'm very surprised. I'm certain I never hit this before in SQL 2000.
Probably the type of work I'm doing now is very different.

Post by Erland Sommarskog
.And, no, adding OPTION (FORCE ORDER) is not the correct
approach.

I mentioned that as a joke because to my surprise, it actually worked
but it is clearly not a solution, for a number of reasons but most of
all because I *don't know why it worked*.

Post by Erland Sommarskog
The correct approach is to use CASE and nothing else. That's the
only time that Microsoft gives you the guarantee.

What guarantee is this? You have and I'm very much obliged to you,
but where is anything written down by Microsoft that is easily
findable?
Where is this guarantee?

Post by Erland Sommarskog
And, no, I don't have any reference to KB articles or similar. They probably
exists. I know that I know because I have discussed this with Microsoft
people more than once.
However, I gave you a link to a blog post that covered this precise
problem. Did you read it? Did you notice that it discussed this
precise problem - only that the product was Oracle not SQL Server.

I did read it but only at high speed (pressure of work I'm afraid) but
that is how I came to find Chris Date's comment on this matter. I
should have read it more carefully, I might have twigged where the
problem was in my code sooner and saved you from having to respond
quite so much to my posts. Mea culpa.

thanks again

jan

Post by Erland Sommarskog
--
SQL 2008:http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005:http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx

Erland Sommarskog

2012-02-28 07:10:40 UTC

Permalink

Post by jan
* I have never seen this 'broken' optimisation documented in BOL, nor
any way of dealing with it, until your helpful reply.

There are a lot things you may be unaware of until someone tell you,
even if they are documented.

Post by jan
* the working/suddenly-not-working behaviour is deeply unintuitive (do
you disagree?)

Disagree for several reasons. First of all, SQL is declarative, and the
optimizer may recast computation order. This can lead to as in this
case a query that crashes sometimes, sometimes not. It can also lead to
very big differences in performance.

Next, there other kind of bad queries that may work for a long time, just
because you don't have any data that cause it to fail, for instance
"SELECT a/b FROM tbl", because you incorrectly think that b is never 0.

Furthermore, it is not uncommon for a program in a traditional language
to behave like it was working and then it's not when you have forgotten
to initiate a variable, or a data cell is overwritten by a bad pointer.

In short, it is not uncommon for unrobust code to behave erratically,
and a good programmer understands this.

Post by jan
* according to the blog link you gave, this optimisation is incorrect
according to the SQL standard

Yup. But still they are doing it. I'm not entirely fond of it either, but
there is always a trade-off. You can get better performance, but you
have to follow those extra rules.

Post by jan
* very importantly, are there any other nasty issues like this which I
(or even you) are not aware of? How can you or I tell if they're not
documented?

One thing that traps people is the UPDATE FROM syntax:

UPDATE tbla
SET col = tblb.col
FROM tbla
JOIN tblb ON tbla.keycol = tblb.keycol

This is alright, as one row in tbla can never map to more than one row in
tblb. But if one row in tbla can map to multiple rows in tblb, the result of
the above is not deterministic.

And there are more constructs that for one reason or another are not
deterministic.

Post by jan
Further to my point that this correctness-knackering optimisation is
not documented anywhere that I'm aware of, I believe the civil
aviation authority considers documentation to *be* software, and that
failure in documentation is just as serious as failure in code. Does
that make sense to you as it makes sense to me?

Well, it may or may not be in Books Online. It is nevertheless a well-
known fact among well-experienced SQL Server programmers. In any forum
you would ask about this, it would not take long time until you were
told to use CASE.

Post by jan
I don't care if MS and oracle do it. So far, this absolutely looks
like bad quality to me.

Sorry, if you insist of calling this "bad quality", I only find that
unprofessional from your side. "Bad quality" implies that it is sloppy
and without thought. It is a deliberate design decision. Maybe not one
that I and you agree with, but just because you disagree with something
does ont make it bad quality.

Post by jan
What guarantee is this? You have and I'm very much obliged to you,
but where is anything written down by Microsoft that is easily
findable?

You could always try reading the topic on CASE.

jan

2012-02-29 00:52:17 UTC

Permalink

Hi Erland,

[snip]

Post by Erland Sommarskog

Post by jan
* the working/suddenly-not-working behaviour is deeply unintuitive (do
you disagree?)

The optimiser can do what it pleases so long as the original semantics
are preserved. In my example above, which tripped me up, they were
not. A guard was given but not respected. That's all.
Explicit precedence (by bracketing) an implicit precedence (by
ordering of operations) must be respected. If a C compiler evaluated
the following expression

b != 0 ? a / b : c * d

such that it speculated down both true and false branches (to exploit
instruction-level parallelism perhaps) such that division by zero
occurs in the true branch *and blows up my program*, that's a compiler
bug. No different from here. C being a procedural rather than
declarative language makes no difference; it would hold just as true
in a functional language.

Post by Erland Sommarskog
Next, there other kind of bad queries that may work for a long time, just
because you don't have any data that cause it to fail, for instance
"SELECT a/b FROM tbl", because you incorrectly think that b is never 0.
Furthermore, it is not uncommon for a program in a traditional language
to behave like it was working and then it's not when you have forgotten
to initiate a variable, or a data cell is overwritten by a bad pointer.
In short, it is not uncommon for unrobust code to behave erratically,
and a good programmer understands this.

Yes. In such cases, my mess-up is at my end. And if I now wrote one of
your original examples:

SELECT *
FROM tbl
WHERE isnumeric(col) = 1
AND convert(float, col) > 2

then that'd be my problem also as I see now. But all these are
strictly programmer errors.

Post by Erland Sommarskog

Post by jan
* according to the blog link you gave, this optimisation is incorrect
according to the SQL standard

Yup. But still they are doing it. I'm not entirely fond of it either, but
there is always a trade-off. You can get better performance, but you
have to follow those extra rules.

Post by jan
* very importantly, are there any other nasty issues like this which I
(or even you) are not aware of? How can you or I tell if they're not
documented?

UPDATE tbla
SET col = tblb.col
FROM tbla
JOIN tblb ON tbla.keycol = tblb.keycol
This is alright, as one row in tbla can never map to more than one row in
tblb. But if one row in tbla can map to multiple rows in tblb, the result of
the above is not deterministic.
And there are more constructs that for one reason or another are not
deterministic.

I'm aware of the problems in update..join and something similar with
delete, also "select @var =... from...", TOP or cursors without
sufficient ordering, and others, but I'm referring not to these -
these are my fault if I write them - but to other optimisations which
may alter the meaning of what I've written.

Post by Erland Sommarskog

Just a thought but if it was just a case of explicitly letting the
optimiser off the leash with OPTION(MERGED EVALUATION ON), that would
be fine. Nothing wrong with that.

Post by Erland Sommarskog

Post by jan
I don't care if MS and oracle do it. So far, this absolutely looks
like bad quality to me.

Well, if you want to call me unprofessional for not working out the
dangerous implications of you "WHERE isnumeric(col) = 1 AND
convert(float, col) > 2 " then I hold my hands up to that and am
wiser, but unprofessional for being deeply unhappy about semantic-
mutating optimisations? I don't know...

Also thanks for digging up the link you posted below.

cheers

jan

Post by Erland Sommarskog

Post by jan
What guarantee is this? You have and I'm very much obliged to you,
but where is anything written down by Microsoft that is easily
findable?

You could always try reading the topic on CASE.
--
SQL 2008:http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005:http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx

Erland Sommarskog

2012-02-29 02:30:49 UTC

Permalink

If a C compiler evaluated the following expression
b != 0 ? a / b : c * d
such that it speculated down both true and false branches (to exploit
instruction-level parallelism perhaps) such that division by zero
occurs in the true branch *and blows up my program*, that's a compiler
bug.

But that is because C is defined to not compute the true-branch in this
case.

SQL has a computation order, but it does not really say that some things
will not be computed. So, no, it is not equally apparent.

You may be unhappy with the optimisations this time. You may not be next
time when they give you better performance. But my remark was specifically
directed against "bad quality".

Erland Sommarskog

2012-02-28 17:25:16 UTC

Permalink

Post by jan
* I have never seen this 'broken' optimisation documented in BOL, nor
any way of dealing with it, until your helpful reply.

I asked around, and I was pointed to
http://msdn.microsoft.com/en-us/library/ms188295%28v=sql.100%29.aspx

Erland Sommarskog

2012-02-21 08:54:44 UTC

Permalink

Post by jan
I see your point, however what you're describing *is* shortcutting
(aka short circuiting, although I like your description of it), only
without explicit or predicatable ordering. Which is lethal.
If it did not have shortcutting it would evaluate both sides of the
and/or.

When I was a young student and learn languages like Fortran and Pascal,
both which are languages that does not have strict short-circuiting I was
taught not to rely on any thing. For instance:

IF ptr != nil OR ptr->val = 0

could work one day, and fail the next day because the compiler had been
upgraded, or I ran with different optimization-level switches.

So generally, there is nothing you can rely on, nor should you. That said:

WHERE @param = 0
AND EXISTS (SELECT * FROM ...)

I would be very surprised if the subquery is evaluated if @param has a non-
zero value. I would expect the execution plan to include a startup filter to
avoid the table in the subquery from being touched. But, still, it is
nothing you can rely on.

Post by jan
Hmm. So MS ditched transparent and comprehensible behaviour for
performance? Without providing a static analysis tool to sniff out
such possible faults? So subtle bugs could hide in live systems
(really *fast*, *performant* live systems!) until the frequency
histogram skewed until the query plan changed then... boom?
Unbelievable. And I'm trying to get my lot off mysql and MS do this?

This blog post is interesting: http://gennick.com/madness.html
A different engine - the same problem. Hey, I even think this concerns
the company that owns MySQL these days.

I can't say that I like it, but to some extent I think the SQL standards
are to blame. In T-SQL, in difference to many other type language, there is
an implicit conversion between strings and numbers. This means that if you
confuse the data types, you are not caught early, and then you get beaten
by bad data later on.

I don't know why T-SQL has this implicit conversion, but it wasn't like
that in SQL 6.5 and before, so I'm suspecting ANSI.