Lazy Spool operator turns 1,500 rows into 7,000,000?

Discussion:

(too old to reply)

Ian Boyd

2003-09-10 20:23:34 UTC

i'm looking at a query, trying to see why it takes so long.

An obvious culprit is where an Employees table (~1,500 rows) is being run
through "Lazy Spool" operator. The output of the operation is over 7 million
rows. There is then a join against these 7 million rows.

Is this normal?

The BOL says:
<quote>
The Lazy Spool logical operator stores each row from its input in a hidden
temporary object stored in the tempdb database. If the operator is rewound
(for example, by a Nested Loops operator) but no rebinding is needed, the
spooled data is used instead of rescanning the input. If rebinding is
needed, the spooled data is discarded and the spool object is rebuilt by
rescanning the (rebound) input.
The Lazy Spool operator will build its spool file in a lazy (noneager)
manner. Each time the spool's parent operator asks for a row, the spool
operator gets a row from its input operator and stores it in the spool.
</quote>

Looking for any possible explanation of this 1000x expansion is the number
of records, i see the line paraphrased line:

'Each time the operator asks for a row, it gets it from the input, and
stores it in the spool.'

Is this an indicator that our Employees table is being asked 7 million times
for a piece of information, and so it adds 7 million rows to the spool
table. If this is the case, then there would be duplicate rows in the spool
table. And that doesnt' seem right.

1,000 rows -> Table Spool/Lazy Spool ->7,000,000 rows ?

Does this seem right?

Steve Kass

2003-09-10 23:35:52 UTC

Permalink

Ian,

You may be looking at a query, so you've got the edge. There are many
reasons a query that refers to a 1500-row table could look at a row
7000000 times. Can you provide table definitions, a small set of sample
data given as INSERT INTO <tablename> VALUES (...) statements, and
the query or a simplified version of it?

-- Steve Kass
-- Drew University
-- Ref: D21DDCF4-82A0-41CC-AE8D-01BCCD3F0292

Post by Ian Boyd
i'm looking at a query, trying to see why it takes so long.
An obvious culprit is where an Employees table (~1,500 rows) is being run
through "Lazy Spool" operator. The output of the operation is over 7 million
rows. There is then a join against these 7 million rows.
Is this normal?
<quote>
The Lazy Spool logical operator stores each row from its input in a hidden
temporary object stored in the tempdb database. If the operator is rewound
(for example, by a Nested Loops operator) but no rebinding is needed, the
spooled data is used instead of rescanning the input. If rebinding is
needed, the spooled data is discarded and the spool object is rebuilt by
rescanning the (rebound) input.
The Lazy Spool operator will build its spool file in a lazy (noneager)
manner. Each time the spool's parent operator asks for a row, the spool
operator gets a row from its input operator and stores it in the spool.
</quote>
Looking for any possible explanation of this 1000x expansion is the number
'Each time the operator asks for a row, it gets it from the input, and
stores it in the spool.'
Is this an indicator that our Employees table is being asked 7 million times
for a piece of information, and so it adds 7 million rows to the spool
table. If this is the case, then there would be duplicate rows in the spool
table. And that doesnt' seem right.
1,000 rows -> Table Spool/Lazy Spool ->7,000,000 rows ?
Does this seem right?

Delbert Glass

2003-09-10 23:59:51 UTC

Permalink

My guess is his is looking that the execution count thingy;
rather than, the estimated row count thingy.
In which case, we can expect the other part of
the join to have about 4700 rows.

Hopefully the query can be improved
and avoid the table spool altogether along the way.

Well at least he has not drown in the pond, yet ;-)

Bye,
Delbert Glass

Ian Boyd

2003-09-11 13:48:48 UTC

Permalink

Ian,
There are many reasons a query that refers to a 1500-row table could look

at a row

7000000 times.

Are there? So this is entirely normal? And is the result of the server
actually requiring a piece of data from the table 7 million times individual
times?

Can you provide table definitions, a small set of sample
data given as INSERT INTO <tablename> VALUES (...) statements, and
the query or a simplified version of it?

No, not really. It's not my project. It's 4 databases, spread across 3
servers, performing concepts that i'm not familiar with, with a project that
i've never worked on.

i just tried to help him find the slow spot in a larger query.

The actual slow-down happens in:
The first half of a unioned results set
that gets returned by a function
which is used as part of a WHERE IN clause
of a higher level query

i just saw SQL Server giving itself a 4000fold increase in work with this
Lazy Spool operation, and wondered what the hell it was thinking.

Raymond D'Anjou (raydan)

2003-09-11 13:56:51 UTC

Permalink

Post by Ian Boyd
The first half of a unioned results set
that gets returned by a function
which is used as part of a WHERE IN clause
of a higher level query

Functions are notoriously slow.
Here is an extract from this link:
http://www.novicksoftware.com/UDFofWeek/Vol1/T-SQL-UDF-Volume-1-Number-10-udf_Txt_CharIndexRev.htm

However useful a user-defined function is, from a performance
point of view, using the UDFs is often the equivalent of using
cursors. They're great when you need them. Sometimes they're
essential. But they can be a performance headache.

Using UDFs is great! Most of the time. But there are times when
the extra overhead of the UDF execution method become too slow.
Eliminating the UDF, where possible, is a strategy that usually
produces dramatically faster query execution times. By
"dramatically" I mean 2 times to 500 times faster. Enough to make
a real difference in the response time of your application.

Steve Kass

2003-09-11 23:59:17 UTC

Permalink

Ian,

Without seeing the query, it's pretty hard to guess how to
improve things. But here's an example of a meaningless query
with a lazy spool that's executed millions of times - whether
you have anything like this, I can't say:

select *
from Northwind..Orders O
join Northwind..[Order Details] D1
on D1.ProductID%20 in (
select D2.ProductID + D3.ProductID
from Northwind..[Order Details] D2
join Northwind..[Order Details] D3
on right(D2.OrderID,4) in (left(D1.OrderID,4), right(D3.OrderID,4))
union all
select D2.ProductID + D3.ProductID + O.OrderID
from Northwind..[Order Details] D2
join Northwind..[Order Details] D3
on left(D2.OrderID,4) in (right(D1.OrderID,4), left(D3.OrderID,4))
)
and O.OrderID = D1.OrderID

SK

Post by Steve Kass

Ian,
There are many reasons a query that refers to a 1500-row table could look

at a row

7000000 times.

Are there? So this is entirely normal? And is the result of the server
actually requiring a piece of data from the table 7 million times individual
times?

Can you provide table definitions, a small set of sample
data given as INSERT INTO <tablename> VALUES (...) statements, and
the query or a simplified version of it?

No, not really. It's not my project. It's 4 databases, spread across 3
servers, performing concepts that i'm not familiar with, with a project that
i've never worked on.
i just tried to help him find the slow spot in a larger query.
The first half of a unioned results set
that gets returned by a function
which is used as part of a WHERE IN clause
of a higher level query
i just saw SQL Server giving itself a 4000fold increase in work with this
Lazy Spool operation, and wondered what the hell it was thinking.