Discussion:
SQL Server 2000 locking
(too old to reply)
stainless
2013-02-13 08:37:00 UTC
Permalink
We have an issue with locking data that is causing an integrity issue.

Our customers can create spatial information that require unique key identifiers on a SQL table. Please note that this is not an issue with whether or not to use an identity field, as this uniqiue key is a contsruct of a fiexed pice of text (i.e. "Pegasus") + the incremental number so as to support strict data rules imposed by embedded spatial database structures)

We have a table called Next_Num that holds the next value to be allocated to this key in a column called next_number and identified by a specific type value, in this case "Pegasus".

Our SQL is basically as follows, with @type in this case being set to "Pegasus".:

update next_num set next_number = next_number + 1 where type = @type

select @next_number = next_number - 1 from next_num


@next_number is output to the calling process to give the spatial data the unique key.

So basically we update immediately by incrementing the next_number on the table by 1. Then we select the new value from the table and subtract 1 to represent the value to be used. I know it seems a little convoluted with the subtraction but it has been like this for some time and at least should return the value that was on the table before the update.

However, we have thousands of customers accessing this process during the day, with some of their actions meaning they retrieve more than one value for some processes.

It appears that at some times, in between the update and the select, a second customer has also updated the table in that split second so 2 separate customers are getting the same @next_number value returned from the select statement. This creates data integrity issues.

The issue we have is finding a way of returning a @next_number value that reflects exactly the update applied for that customer without locking the whole table (as other processes are using the same table for different @type values with no issues).

We could, of course, have a dedicated table for just the "Pegasus" next number value which may help in some ways (allowing us to distance any solution from affecting other non-related key processes) but the issue is still there for this key. Select for Update seems to be the elegant method in later SQL Server versions but we are tied to SQl Server 2000. Any ideas as to how we can solve this?
Erland Sommarskog
2013-02-13 09:42:59 UTC
Permalink
And this code is not called inside a transaction?

So why not add BEGIN TRANSACTION before the UPDATE and COMMIT after the
SELECT? If another customer tries to update the number for the same type,
that customer will be blocked.

You need to have a index on the type column, else the lock may be on the
entire table.
--
Erland Sommarskog, SQL Server MVP, ***@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
stainless
2013-02-13 15:03:32 UTC
Permalink
Currently the index is the constraint of the type.

I must admit I am not certain as to why the transaction option is not being used and will have to check into the history of this. I cannot believe it has been overlooked as we have had some serious expertise in SQL over the years so i need to find out if there were issues using it, particularly as there are comments at the top of the relevant stored procedure to a rewrite to get around "concurrence problems" which suggests locking to me.

It seems that before this, the Select was done first followed by the Update command. At this time, logically, the Select just returned the actual value i.e. no subtraction required.

We certainly would want to avoid locking the whole table so the individual row would need to be the lock level. Based on what you stated, TRANSACTION processing would definitely just lock the individual row?
Erland Sommarskog
2013-02-13 21:13:27 UTC
Permalink
Post by stainless
We certainly would want to avoid locking the whole table so the
individual row would need to be the lock level. Based on what you
stated, TRANSACTION processing would definitely just lock the individual
row?
Yes, as long as there is index.
--
Erland Sommarskog, SQL Server MVP, ***@sommarskog.se

Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
Gert-Jan Strik
2013-02-13 18:06:23 UTC
Permalink
What you need to do is place the statements in a transaction.

Assuming next_num.type is unique, and you have a (unique) index on type, then only one row would be locked at a time.

Please note that your code is missing the "where type = @type" in the Select statement...
--
Gert-Jan
Jeroen Mostert
2013-02-13 23:46:53 UTC
Permalink
Post by stainless
We have an issue with locking data that is causing an integrity issue.
Our customers can create spatial information that require unique key
identifiers on a SQL table. Please note that this is not an issue with
whether or not to use an identity field, as this uniqiue key is a contsruct
of a fiexed pice of text (i.e. "Pegasus") + the incremental number so as to
support strict data rules imposed by embedded spatial database structures)
We have a table called Next_Num that holds the next value to be allocated to
this key in a column called next_number and identified by a specific type
value, in this case "Pegasus".
As Gert-Jan has pointed out, I hope you just forgot to write down the WHERE on
that SELECT, otherwise that's your bug right there. :-) The statement as given
is legal, but it will nondeterministically select a row to get next_number from,
which can't possibly be what you want. Insidiously, statements like these may
actually yield correct results by happy accident until, one day, the accident
stops being happy.
Post by stainless
@next_number is output to the calling process to give the spatial data the unique key.
So basically we update immediately by incrementing the next_number on the
table by 1. Then we select the new value from the table and subtract 1 to
represent the value to be used. I know it seems a little convoluted with the
subtraction but it has been like this for some time and at least should
return the value that was on the table before the update.
However, we have thousands of customers accessing this process during the
day, with some of their actions meaning they retrieve more than one value for
some processes.
It appears that at some times, in between the update and the select, a second
customer has also updated the table in that split second so 2 separate
statement. This creates data integrity issues.
reflects exactly the update applied for that customer without locking the
values with no issues).
We could, of course, have a dedicated table for just the "Pegasus" next
number value which may help in some ways (allowing us to distance any
solution from affecting other non-related key processes) but the issue is
still there for this key. Select for Update seems to be the elegant method in
later SQL Server versions but we are tied to SQl Server 2000. Any ideas as to
how we can solve this?
Aside from what the others have commented (namely, you need a transaction to
even get this right) even *with* a transaction, code like this is not easy to
get right. Obviously, on the basic level your functional problems are fixed if
you do SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; BEGIN TRANSACTION, because
that will give you all the atomicky goodness of an RDBMS. However, this can get
you into serious contention issues (even if the locks are perfectly granular to
the row) and reducing the number and duration of locks then becomes relevant. It
is, of course, a no-brainer that things need to be made correct before they are
made fast, but the definition of "correct" is sometimes flexible.

You should have a unique index on "type", otherwise it's much harder to
guarantee minimal locking (and the statements will be inefficient regardless).
To minimize the chance of deadlocks, this should (if possible) be the clustered
index of the table with no other index including "next_number", so there is only
one index involved in the update. The best situation (for this particular query,
at least, not necessarily for your entire workload) would be if the table has
only one index, which is the clustered index, which is the primary key, which is
"type".

Without having tested it (and that's a really important disclaimer, and if
possible you really should test this on a separate setup with lots of concurrent
calls), on SQL Server 2000, this is how I'd do it:

SET NOCOUNT ON;
DECLARE @next_number INT;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRANSACTION;
UPDATE next_num SET next_number = next_number + 1, @next_number =
next_number WHERE type = @type;
SELECT @next_number;
COMMIT;

(@next_number will contain the value before the update, which achieves the same
thing as subtracting afterwards.)

If it is not important that the sequence contains no gaps (just that no number
is ever issued twice), you can do slightly better:

SET NOCOUNT ON;
UPDATE next_num WITH (SERIALIZABLE) SET next_number = next_number + 1,
@next_num = next_number WHERE type = @type;
SELECT @next_num;

The UPDATE will happen transactionally, but because the SELECT is not in that
transaction, it's possible to lose this value before returning it to the client
application (if the connection aborts, for example), leaving a gap. On the plus
side, the server can complete the transaction without waiting for the client
application to acknowledge the result set.

If you can bundle requests, you can do *vastly* better (in terms of performance)
by adding more than 1 and reserving a range of values, since every update
requires a transaction log write to complete. Reserving a bunch of values at
once makes much better use of I/O. However, this obviously requires that your
application logic is suited to this (and/or can be rewritten), so it's not a
drop-in solution. I thought I'd mention it regardless.
--
J.
Erland Sommarskog
2013-02-14 08:54:23 UTC
Permalink
Post by Jeroen Mostert
Aside from what the others have commented (namely, you need a
transaction to even get this right) even *with* a transaction, code like
this is not easy to get right. Obviously, on the basic level your
functional problems are fixed if you do SET TRANSACTION ISOLATION LEVEL
SERIALIZABLE; BEGIN TRANSACTION, because that will give you all the
atomicky goodness of an RDBMS. However, this
I don't think SERIALIZABLE is a good idea here.

It is definitely not needed: SERIALIZABLE protects you against "phantom
reads". That is, it guarantees that if you read a range, and later re-read
it in the same transaction, you will not see any more or less rows than the
first time.

Since this involves UPDATE, there are no problems with phantoms, so there is
no need to use SERIALIZABLE. But since SQL Server takes out range locks,
this could cause blocking to other types than the current type.

I think the original code is fine if it's wrapped in a transaction and WHERE
clause added. The UPDATE will take out an exclusive lock on the row (if
there is an index), and then we can read the number safely. We can make the
code shorter with putting the variable assignment in the UPDATE, and then we
would not need any explicit transaction. But then you need to be sure that
the engine really takes the locks in the order you think it does. With an
UPDATE and a separate SELECT, there is no such concern.
--
Erland Sommarskog, SQL Server MVP, ***@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
Jeroen Mostert
2013-02-14 20:21:19 UTC
Permalink
Post by Erland Sommarskog
Post by Jeroen Mostert
Aside from what the others have commented (namely, you need a
transaction to even get this right) even *with* a transaction, code like
this is not easy to get right. Obviously, on the basic level your
functional problems are fixed if you do SET TRANSACTION ISOLATION LEVEL
SERIALIZABLE; BEGIN TRANSACTION, because that will give you all the
atomicky goodness of an RDBMS. However, this
I don't think SERIALIZABLE is a good idea here.
It is definitely not needed: SERIALIZABLE protects you against "phantom
reads". That is, it guarantees that if you read a range, and later re-read
it in the same transaction, you will not see any more or less rows than the
first time.
Since this involves UPDATE, there are no problems with phantoms
I am always extremely wary of UPDATE, because UPDATE involves two phases, one
where the data is selected, another where it's actually updated. When more than
one index is involved, locking quickly gets just as complicated as with separate
statements, but with less chance for guiding the engine (except through the use
of the serialization level). Then have that UPDATE run concurrently with other
instances of itself, and it's deadlock chasing fun for the whole family.

In the case of just an UPDATE and a unique index, you are correct, and we did in
fact assume that here. SERIALIZABLE is slight overkill -- but not as much as
you'd think.
Post by Erland Sommarskog
so there is no need to use SERIALIZABLE. But since SQL Server takes out range
locks, this could cause blocking to other types than the current type.
Just tested it: SQL Server 2012, at least, takes no range locks if you have a
unique index. The engine is smart enough to see that a "range" of one row isn't.
Even if it wasn't, however, the additional lock should be irrelevant, exactly
because the range is one row, so nothing can overlap.

That is, assuming you're using a unique index. If not, then the SERIALIZABLE
could actually help -- but I'll admit that you don't want to go into that
scenario in the first place, since it's not at all what you want if you're going
to minimize locking (and it makes no semantic sense for this scenario).

I must immediately admit that I don't know if SQL Server 2000 is clever enough
to omit range locks -- I know really old versions had no key-range locks at all,
and just resorted to locking entire pages. That would of course completely kill
concurrency in this scenario.
Post by Erland Sommarskog
I think the original code is fine if it's wrapped in a transaction and WHERE
clause added. The UPDATE will take out an exclusive lock on the row (if there
is an index), and then we can read the number safely. We can make the code
shorter with putting the variable assignment in the UPDATE, and then we would
not need any explicit transaction. But then you need to be sure that the
engine really takes the locks in the order you think it does. With an UPDATE
and a separate SELECT, there is no such concern.
Adding more statements that touch tables inevitably introduces more locks, even
if it's just a shared lock for the SELECT. I'm not sure how that simplifies
matters. Adding a variable assignment in the UPDATE introduces no locks, since
variables aren't shared resources and reading the original data is part of the
update process anyway. In my book, that's a clear win.

In fact, the code becomes plain wrong if the SELECT is placed before the UPDATE
(with the subtraction removed). Now of course "don't do that then", but it does
show that more statements are not necessarily simpler in terms of (correct) locking.

So this is all a very long-winded way of saying "yes, you're quite right,
SERIALIZABLE is unnecessary here". But I'll stick with the single UPDATE, if
it's all the same to you. :-)
--
J.
Erland Sommarskog
2013-02-14 22:33:38 UTC
Permalink
Post by Jeroen Mostert
Adding more statements that touch tables inevitably introduces more
locks, even if it's just a shared lock for the SELECT. I'm not sure how
that simplifies matters. Adding a variable assignment in the UPDATE
introduces no locks, since variables aren't shared resources and reading
the original data is part of the update process anyway. In my book,
that's a clear win.
Putting the variable assignment in the UPDATE is alright if the value
is read with a UPDLOCK (which it should be). But if it is read with a shard
lock, things can go bad.

I've seen other examples when putting all in one statement did not work
out as expected. Tony Rogerson did something like:

INSERT tbl (....)
SELECT ...
FROM ...
WHERE NOT EXISTS(SELECT *
FROM tbl (SERIALIZABLE)
WHERE ...)

And still got PK violations in a concurrent environment.
--
Erland Sommarskog, SQL Server MVP, ***@sommarskog.se

Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
Jeroen Mostert
2013-02-15 00:27:00 UTC
Permalink
Post by Erland Sommarskog
Post by Jeroen Mostert
Adding more statements that touch tables inevitably introduces more
locks, even if it's just a shared lock for the SELECT. I'm not sure how
that simplifies matters. Adding a variable assignment in the UPDATE
introduces no locks, since variables aren't shared resources and reading
the original data is part of the update process anyway. In my book,
that's a clear win.
Putting the variable assignment in the UPDATE is alright if the value
is read with a UPDLOCK (which it should be). But if it is read with a shard
lock, things can go bad.
It's vanishingly unlikely it works this way. Consider a table "foo" with a
column "n" that contains only odd values. Under your semantics, the batch

DECLARE @bar INT = 5;
UPDATE foo SET @bar = n WHERE n % 2 = 0

which performs intermediate assignments while it's scanning rows under shared
locks would have to go out of its way to reset "@bar" to 5 at the end of the
statement, since "@bar" should be unaffected as we updated no rows (OK,
technically, Books Online doesn't document that it should do this, but it
definitely works this way). An implementation along the obvious lines (perform
the assignment only when we're updating rows) has no such trouble.

But you might have meant that the assignments don't actually happen, but all the
data is first read under shared locks anyway, to later be used. That would
obviously be an even worse implementation, but it would avoid this particular
problem. However, even in this case you could end up with a variable that
contains a value that demonstrably cannot have been used to determine the
updated column value, which violates one of the fundamental tenets of assignment
(the substitution principle). I'd file a bug on Connect if the assignments could
work this way, because that smells like a consistency violation to me.

I do get your point now about "you need to know how the engine locks". Yes, if
this sort of thing is allowed, obviously you're better off sticking to separate
statements. However, this would render the construct almost useless (the
equivalent of a SELECT-UPDATE rather than the other way around, if you really
wanted that, which you don't). If it works like this, to quote Farnsworth, "I
don't want to live on this planet anymore".

I happen to know for a fact that it doesn't work this way for a unique index,
because I traced the locks -- no separate S-lock is issued to read the data
first prior to the update, everything happens under a single X-lock -- unless
the assignment happened under no lock at all just prior to the X-lock, which I
refuse to believe! However, it would obviously be impossible to tell what's
happening for sure if the UPDATE involves a scan.

I wish I could get some feedback from the developers on what the semantics are,
because if it could work like you suggested, I definitely don't want to ever
recommend this to someone, under any circumstances. If you need UPDLOCK, might
as well stick it on a SELECT. Or use snapshot isolation, or a temp table, or
lots of other alternatives that are possible if all you want to do is avoid
writing the same WHERE twice.

And no, I will not simply let this go and use UPDATE-SELECT from now on since it
has uncontestable semantics, dammit. The line must be drawn here.
Post by Erland Sommarskog
I've seen other examples when putting all in one statement did not work
INSERT tbl (....)
SELECT ...
FROM ...
WHERE NOT EXISTS(SELECT *
FROM tbl (SERIALIZABLE)
WHERE ...)
And still got PK violations in a concurrent environment.
Yes, but... that's obviously not one statement.

Well, OK, you know what I mean. Obviously that execution plan is going to read
like an exciting novella. It is technically a single statement, but it's going
to be compiled as an ungodly mess of operations, and of course the optimizer is
under no obligation to perform any particular operation first. In particular,
WHERE NOT EXISTS should compile to a left anti semi join somewhere near the end,
not the beginning, where it's too late to start taking locks. This is not
comparable to a simple UPDATE.

I agree that if you're going down this route, separate statements are far less
painful (and easier to understand to boot). "A single statement" does not mean
some sort of super-lock strategy is computed at the beginning appropriate to the
statement's ultimate semantics. Might be nice, but that's not how it works.

Incidentally, I *have* used the following (just once, though):

BEGIN TRANSACTION
DECLARE @dummy INT;
SELECT TOP(0) @dummy = ID FROM table WITH (TABLOCKX);

Obviously forcing an exclusive table lock for the duration of the transaction is
not something you'd use as a first approach, but believe it or not, complete
mutual exclusion was an appropriate solution for taming contention (I would have
used applocks, but couldn't -- I forget why). I'll admit this system was abusing
the RDBMS for purposes it wasn't really suited for, though. (I really hope
nobody is going to "optimize" the lock for the obviously non-functional SELECT
away in a future version. :-))
--
J.
stainless
2013-02-14 12:56:30 UTC
Permalink
< As Gert-Jan has pointed out, I hope you just forgot to write down the WHERE on that SELECT>

Yep, just missed it on the inital post. There is a "where type = @type" on the select
stainless
2013-02-14 14:34:29 UTC
Permalink
< UPDATE next_num WITH (SERIALIZABLE) SET next_number = next_number + 1, @next_number =
next_number WHERE type = @type >

In all the years I have worked with SQL Server, I have never seen a variable being set within an update statement and, for whatever reason, did not think it was possible. I hade only ever updated actual columns within such statements. So simple!

We should always have a row in place for any Update or Select, so serializable should not be absolutely necessary for us. I suppose it would do no harm to have this in place.

Thanks for your help
Jeroen Mostert
2013-02-14 20:31:46 UTC
Permalink
Post by stainless
In all the years I have worked with SQL Server, I have never seen a variable
being set within an update statement and, for whatever reason, did not think it
was possible. I hade only ever updated actual columns within such statements. So
simple!
It is, but be careful that this only works if you're absolutely sure you're
updating a single row. Otherwise, the variable will be assigned "some" value.
The same is true for SELECT, of course, but for a SELECT it's more likely that
it doesn't really matter which row you get. For an UPDATE you will update all
rows, but assign only one value, which is not intuitive at all.
Post by stainless
We should always have a row in place for any Update or Select, so
serializable should not be absolutely necessary for us. I suppose it would do
no harm to have this in place.
Per Erland, it *might* do a tiny bit of harm depending on how SQL Server 2000
handles locking. If you have a unique index, the default isolation level (READ
COMMITTED) is sufficient, so no locking hint is needed.
--
J.
Continue reading on narkive:
Loading...