Discussion:
Adjacency List and nested set -- J celko example -- Guru needed
(too old to reply)
Martin
2004-07-18 09:46:14 UTC
Permalink
Hi,

I am designing a tree structure in my database and I found nearly the
perfect solution to my problem at
http://www.intelligententerprise.com/001020/celko1_2.jhtml

The only problem is I can't get it to work (at least not perfectly) in sql
server. I am basically trying to produce a nested set from an adjacency
list.

The original article was displayed in another implementaion of sql (P/SQL i
think.)

anyway, I took it and converted it to T-sql however it appears to not work
correctly.

If anybody has seen this problem before or could point out where I am going
wrong then I would very much appreciate it.

I have provided the T-SQL that I have at present below
===================================================================

use master
go
DROP DATABASE HIERACHY
GO
CREATE DATABASE HIERACHY
go
use HIERACHY
go

CREATE TABLE Tree
(emp CHAR(10) NOT NULL,
boss CHAR(10));

go
CREATE TABLE Stack
(stack_top INTEGER NOT NULL,
emp CHAR(10) NOT NULL,
lft INTEGER,
rgt INTEGER);
go
====================================
--May have to run below code multiple times
delete from Tree
delete from stack
INSERT INTO Tree SELECT 'Albert',NULL--,1000
INSERT INTO Tree SELECT 'Bert','Albert'--,900
INSERT INTO Tree SELECT 'Chuck','Albert'--,900
INSERT INTO Tree SELECT 'Donna','Chuck'--,800
INSERT INTO Tree SELECT 'Eddie','Chuck'--,700
INSERT INTO Tree SELECT 'Fred','Chuck'--,600


--BEGIN ATOMIC
--delete from stack
--select * from stack
DECLARE @counter INTEGER
DECLARE @max_counter INTEGER
DECLARE @current_top INTEGER

SET @counter = 2;
SET @max_counter = 2 * (SELECT COUNT(*) FROM Tree);
SET @current_top = 1;

INSERT INTO Stack
SELECT 1, emp, 1, NULL
FROM Tree
WHERE boss IS NULL;

DELETE FROM Tree
WHERE boss IS NULL;

WHILE @counter <= (@max_counter - 2)
BEGIN
IF EXISTS (SELECT *
FROM Stack AS S1, Tree AS T1
WHERE S1.emp = T1.boss
AND S1.stack_top = @current_top)
BEGIN -- push when top has subordinates, set lft value
INSERT INTO Stack
SELECT (@current_top + 1), MIN(T1.emp), @counter, NULL
FROM Stack AS S1, Tree AS T1
WHERE S1.emp = T1.boss
AND S1.stack_top = @current_top;

DELETE FROM Tree
WHERE emp = (SELECT emp
FROM Stack
WHERE stack_top = @current_top + 1);

SET @counter = @counter + 1;
SET @current_top = @current_top + 1;
END
ELSE
BEGIN -- pop the stack and set rgt value
UPDATE Stack
SET rgt = @counter,
stack_top = stack_top -- pops the stack
WHERE stack_top = @current_top
SET @counter = @counter + 1;
SET @current_top = @current_top - 1;
END
END
--Finally output the resultset
select * from stack
--CELKO--
2004-07-18 19:28:58 UTC
Permalink
Here is the link on Amazon.com for my new book on "Trees & Hierarchies
in SQL"

http://www.amazon.com/exec/obidos/tg/detail/-/1558609202/qid=1080772873/sr=1-1/ref=sr_1_1/102-7683601-6345721?v=glance&s=books#product-details

To convert a nested sets model into an adjacency list model:

SELECT B.emp AS boss, E.emp
FROM OrgChart AS E
LEFT OUTER JOIN
OrgChart AS B
ON B.lft
= (SELECT MAX(lft)
FROM OrgChart AS S
WHERE E.lft > S.lft
AND E.lft < S.rgt);

To convert an adjacency list to a nested set model, use a push down
stack. Here is version with a stack in SQL/PSM which is easy to
translate into T-SQL, Informix 4GL, PL/SQL or whatever:

-- Tree holds the adjacency model
CREATE TABLE Tree
(node CHAR(10) NOT NULL,
parent CHAR(10));

-- Stack starts empty, will holds the nested set model
CREATE TABLE Stack
(stack_top INTEGER NOT NULL,
node CHAR(10) NOT NULL,
lft INTEGER,
rgt INTEGER);

CREATE PROCEDURE TreeTraversal ()
LANGUAGE SQL
DETERMINISTIC
BEGIN ATOMIC
DECLARE counter INTEGER;
DECLARE max_counter INTEGER;
DECLARE current_top INTEGER;

SET counter = 2;
SET max_counter = 2 * (SELECT COUNT(*) FROM Tree);
SET current_top = 1;

--clear the stack
DELETE FROM Stack;

-- push the root
INSERT INTO Stack
SELECT 1, node, 1, max_counter
FROM Tree
WHERE parent IS NULL;

-- delete rows from tree as they are used
DELETE FROM Tree WHERE parent IS NULL;

WHILE counter <= max_counter- 1
DO IF EXISTS (SELECT *
FROM Stack AS S1, Tree AS T1
WHERE S1.node = T1.parent
AND S1.stack_top = current_top)
THEN BEGIN -- push when top has subordinates and set lft value
INSERT INTO Stack
SELECT (current_top + 1), MIN(T1.node), counter, NULL
FROM Stack AS S1, Tree AS T1
WHERE S1.node = T1.parent
AND S1.stack_top = current_top;

-- delete rows from tree as they are used
DELETE FROM Tree
WHERE node = (SELECT node
FROM Stack
WHERE stack_top = current_top + 1);
-- housekeeping of stack pointers and counter
SET counter = counter + 1;
SET current_top = current_top + 1;
END;
ELSE
BEGIN -- pop the stack and set rgt value
UPDATE Stack
SET rgt = counter,
stack_top = -stack_top -- pops the stack
WHERE stack_top = current_top;
SET counter = counter + 1;
SET current_top = current_top - 1;
END;
END IF;
END WHILE;
-- SELECT node, lft, rgt FROM Stack;
-- the top column is not needed in the final answer
-- move stack contents to new tree table
END;

If you have a big table, use another language. I am faking stacks and
arrays with tables.
martin
2004-07-18 20:38:32 UTC
Permalink
Hi,

thanks for the answer.
I will certainly be buying that book and am hoping shipping to nz will be
relativly fast.

However (before that happens) what would you say is the best way to
permanently store a hierachy in sql server.
The two main methods that I am thinking of are.

1. Store as an adjacency list and everytime the hieracy is requested (say in
a webpage) convert to an adjacency list.
(or maybe have a backgound process to convert to an adjacency list every
so often)
I guess this would make inserts / deletes easier to handle as they would
only being going in the adjacency list and I would not have to worry about
connecting the tree.


2. store as a nested set.

does not have the overhead of converting an adjacency list to a nested
set everytime it is requested (say in a web page)
however inserts and deletes are more complicated.
fo example in the case of an insert I am guessing you would have to
"split the tree" and increment each right colunm by 1.
I guess that this would result in a whole heap of rows being updated,
and would require / result in a table level lock. This would severly impede
concurrency, especially in a heavily used news board for example.

I am also wondering if sql 2005 will let me run your algoithm below in one
of the .net languages from inside SQL SERVER, resulting in increased
performance.

I look formard to any comments / feedback that you might have.

cheers

martin.
Post by --CELKO--
Here is the link on Amazon.com for my new book on "Trees & Hierarchies
in SQL"
http://www.amazon.com/exec/obidos/tg/detail/-/1558609202/qid=1080772873/sr=1-1/ref=sr_1_1/102-7683601-6345721?v=glance&s=books#product-details
Post by --CELKO--
SELECT B.emp AS boss, E.emp
FROM OrgChart AS E
LEFT OUTER JOIN
OrgChart AS B
ON B.lft
= (SELECT MAX(lft)
FROM OrgChart AS S
WHERE E.lft > S.lft
AND E.lft < S.rgt);
To convert an adjacency list to a nested set model, use a push down
stack. Here is version with a stack in SQL/PSM which is easy to
-- Tree holds the adjacency model
CREATE TABLE Tree
(node CHAR(10) NOT NULL,
parent CHAR(10));
-- Stack starts empty, will holds the nested set model
CREATE TABLE Stack
(stack_top INTEGER NOT NULL,
node CHAR(10) NOT NULL,
lft INTEGER,
rgt INTEGER);
CREATE PROCEDURE TreeTraversal ()
LANGUAGE SQL
DETERMINISTIC
BEGIN ATOMIC
DECLARE counter INTEGER;
DECLARE max_counter INTEGER;
DECLARE current_top INTEGER;
SET counter = 2;
SET max_counter = 2 * (SELECT COUNT(*) FROM Tree);
SET current_top = 1;
--clear the stack
DELETE FROM Stack;
-- push the root
INSERT INTO Stack
SELECT 1, node, 1, max_counter
FROM Tree
WHERE parent IS NULL;
-- delete rows from tree as they are used
DELETE FROM Tree WHERE parent IS NULL;
WHILE counter <= max_counter- 1
DO IF EXISTS (SELECT *
FROM Stack AS S1, Tree AS T1
WHERE S1.node = T1.parent
AND S1.stack_top = current_top)
THEN BEGIN -- push when top has subordinates and set lft value
INSERT INTO Stack
SELECT (current_top + 1), MIN(T1.node), counter, NULL
FROM Stack AS S1, Tree AS T1
WHERE S1.node = T1.parent
AND S1.stack_top = current_top;
-- delete rows from tree as they are used
DELETE FROM Tree
WHERE node = (SELECT node
FROM Stack
WHERE stack_top = current_top + 1);
-- housekeeping of stack pointers and counter
SET counter = counter + 1;
SET current_top = current_top + 1;
END;
ELSE
BEGIN -- pop the stack and set rgt value
UPDATE Stack
SET rgt = counter,
stack_top = -stack_top -- pops the stack
WHERE stack_top = current_top;
SET counter = counter + 1;
SET current_top = current_top - 1;
END;
END IF;
END WHILE;
-- SELECT node, lft, rgt FROM Stack;
-- the top column is not needed in the final answer
-- move stack contents to new tree table
END;
If you have a big table, use another language. I am faking stacks and
arrays with tables.
Joe Celko
2004-07-18 21:32:49 UTC
Permalink
... hoping shipping to nz will be relatively fast. <<
That I do not know -- does "relatively fast" mean within the memory of
living man? Let me tell you about the United States Postal Service ...
what would you say is the best way to permanently store a hierarchy
in SQL Server. <<

Most of the time, I would use nested sets because you can do aggregate
reports so easily with it. A persistent hierarchy is most often used
for that kind of thing. Structure changes are not really that expensive
in practice and are fairly easy to verify.

Insertions into an adjacency list model are the fastest, but searches
are very slow, and structure change are VERY costly -- no cycles? no
forests? it takes a lot of procedural code to test for all that.
... news board for example. <<
Then the nodes (messages) are persisent and the structure is in flux and
there are no aggregations up the tree! New game! Nested intervals would
be a good way.

No single magic answer here.

--CELKO--
===========================
Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, datatypes, etc. in your
schema are.

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!
martin
2004-07-18 23:09:36 UTC
Permalink
Hi joe,

I appreciate your time in answer.
perhaps you could just answer one further set of questions please :)


JC>> Most of the time, I would use nested sets because you can do aggregate
JC>>reports so easily with it. A persistent hierarchy is most often used
JC>> for that kind of thing. Structure changes are not really that expensive
JC>>in practice and are fairly easy to verify.

but can you please clarify if a full table lock is neccesary when updating a
nested set. I am assuming that all "rgt" integer values will need to be
increment by 1. There for we must stop any other inserts that could
potentially clash.
If this were the case then image two inserts were done at the same time,
some of the "rgt" would be incremented by 2.

would the reverse not be true for deleting

Any chance of providing an example of how to insert into the nested set and
increment all other values...




JC>> Insertions into an adjacency list model are the fastest, but searches
JC>>are very slow, and structure change are VERY costly -- no cycles? no
JC>> forests? it takes a lot of procedural code to test for all that.

I am assuming structural changes are costly as the corresponding nested set
has to be updated.

JC>>Then the nodes (messages) are persisent and the structure is in flux and
JC>> there are no aggregations up the tree! New game! Nested intervals would
JC>> be a good way.

can you please clarify what yo mean by a "nested interval"
updating the nested set from an adjacency list at frequent interval of say
every 1/2 hour perhaps????


many thanks in advance.

cheers

martin.
Post by Joe Celko
No single magic answer here.
--CELKO--
===========================
Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, datatypes, etc. in your
schema are.
*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!
Joe Celko
2004-07-19 00:12:02 UTC
Permalink
Post by martin
if a full table lock is neccesary when updating a
nested set. I am assuming that all "rgt" integer values will need to be
increment by 1. <<

Not quite. Each lft and rgt value increases by twice the number of
nodes in the inserted subtree to make room. Use a clustered index to
order the physical storage on the rows. The rows are short (lft, rgt,
node_key) so a large number of them fit into one data page. The rows
that are "below" the insertion point are not locked. As each page is
updated, it is freed. How fast can a modern disk drive do a sequential
read/write on physically contigous storage locations?

--CELKO--
===========================
Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, datatypes, etc. in your
schema are.

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!
Steve Kass
2004-07-19 01:25:25 UTC
Permalink
Very funny!

Last year someone named Sarah posted "From what I understand, a database
will tend to be physically ordered by when it was entered, but won't
always necessarily be.", and you answered "The guy is wrong. The idea of
a database is that you don't know or care about the physical storage. A
relational database is based on sets and sets have no ordering."

I'm doubt Sarah was a guy, but you were right about physical storage
then. Now you're getting the physical storage model all wrong as you
try to explain how to affect it, because it matters. You can't even
correctly explain what happens with your own model, either. (If lft and
rgt both increase by 2N, that opens up /how/ much room for a subtree?).

Are you serious, are you an impostor, or should I compliment you on one
of the best bits of self-parody I've ever seen? Simply brilliant!

Unfortunately, I understand parody better than your nested set model.
Why is this a good model to use, when finding a difference between two
tables, Joe_1 and Joe_2, cannot let you conclude that they represent
different hierarchies? (I'm cool with Tropashko's version of the nested
set model, which doesn't have this defect.) Not being able to tell if
two tables represent the same thing in the real world is one of the big
problems with artificial keys, too. I think I'd understand this all
better if you could explain to me what attributes of the real-world
hierarchy are represented by the integer columns lft and rgt. Could
you? Please accept my apologies if I missed your answer the last time I
asked.

Steve Kass
Drew University
Post by martin
Post by martin
if a full table lock is neccesary when updating a
nested set. I am assuming that all "rgt" integer values will need to be
increment by 1. <<
Not quite. Each lft and rgt value increases by twice the number of
nodes in the inserted subtree to make room. Use a clustered index to
order the physical storage on the rows. The rows are short (lft, rgt,
node_key) so a large number of them fit into one data page. The rows
that are "below" the insertion point are not locked. As each page is
updated, it is freed. How fast can a modern disk drive do a sequential
read/write on physically contigous storage locations?
--CELKO--
===========================
Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, datatypes, etc. in your
schema are.
*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!
Steve Kass
2004-07-19 02:22:21 UTC
Permalink
Sorry - I meant to key this thread for future reference.

-- Steve Kass
-- Drew University
-- Ref: 0FD8AB3A-7D98-4AB2-B494-90A6D4782138
Post by Steve Kass
Very funny!
Last year someone named Sarah posted "From what I understand, a
database will tend to be physically ordered by when it was entered,
but won't always necessarily be.", and you answered "The guy is wrong.
The idea of a database is that you don't know or care about the
physical storage. A relational database is based on sets and sets have
no ordering."
I'm doubt Sarah was a guy, but you were right about physical storage
then. Now you're getting the physical storage model all wrong as you
try to explain how to affect it, because it matters. You can't even
correctly explain what happens with your own model, either. (If lft
and rgt both increase by 2N, that opens up /how/ much room for a
subtree?).
Are you serious, are you an impostor, or should I compliment you on
one of the best bits of self-parody I've ever seen? Simply brilliant!
Unfortunately, I understand parody better than your nested set model.
Why is this a good model to use, when finding a difference between two
tables, Joe_1 and Joe_2, cannot let you conclude that they represent
different hierarchies? (I'm cool with Tropashko's version of the
nested set model, which doesn't have this defect.) Not being able to
tell if two tables represent the same thing in the real world is one
of the big problems with artificial keys, too. I think I'd understand
this all better if you could explain to me what attributes of the
real-world hierarchy are represented by the integer columns lft and
rgt. Could you? Please accept my apologies if I missed your answer
the last time I asked.
Steve Kass
Drew University
Post by martin
Post by martin
if a full table lock is neccesary when updating a
nested set. I am assuming that all "rgt" integer values will need to be
increment by 1. <<
Not quite. Each lft and rgt value increases by twice the number of
nodes in the inserted subtree to make room. Use a clustered index to
order the physical storage on the rows. The rows are short (lft, rgt,
node_key) so a large number of them fit into one data page. The rows
that are "below" the insertion point are not locked. As each page is
updated, it is freed. How fast can a modern disk drive do a sequential
read/write on physically contigous storage locations?
--CELKO--
===========================
Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, datatypes, etc. in your
schema are.
*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!
Roji. P. Thomas
2004-07-19 07:06:56 UTC
Permalink
Are you seriously expecting an answer?
--
Roji. P. Thomas
Net Asset Management
https://www.netassetmanagement.com
Post by Steve Kass
Sorry - I meant to key this thread for future reference.
-- Steve Kass
-- Drew University
-- Ref: 0FD8AB3A-7D98-4AB2-B494-90A6D4782138
Post by Steve Kass
Very funny!
Last year someone named Sarah posted "From what I understand, a
database will tend to be physically ordered by when it was entered,
but won't always necessarily be.", and you answered "The guy is wrong.
The idea of a database is that you don't know or care about the
physical storage. A relational database is based on sets and sets have
no ordering."
I'm doubt Sarah was a guy, but you were right about physical storage
then. Now you're getting the physical storage model all wrong as you
try to explain how to affect it, because it matters. You can't even
correctly explain what happens with your own model, either. (If lft
and rgt both increase by 2N, that opens up /how/ much room for a
subtree?).
Are you serious, are you an impostor, or should I compliment you on
one of the best bits of self-parody I've ever seen? Simply brilliant!
Unfortunately, I understand parody better than your nested set model.
Why is this a good model to use, when finding a difference between two
tables, Joe_1 and Joe_2, cannot let you conclude that they represent
different hierarchies? (I'm cool with Tropashko's version of the
nested set model, which doesn't have this defect.) Not being able to
tell if two tables represent the same thing in the real world is one
of the big problems with artificial keys, too. I think I'd understand
this all better if you could explain to me what attributes of the
real-world hierarchy are represented by the integer columns lft and
rgt. Could you? Please accept my apologies if I missed your answer
the last time I asked.
Steve Kass
Drew University
Post by martin
Post by martin
if a full table lock is neccesary when updating a
nested set. I am assuming that all "rgt" integer values will need to be
increment by 1. <<
Not quite. Each lft and rgt value increases by twice the number of
nodes in the inserted subtree to make room. Use a clustered index to
order the physical storage on the rows. The rows are short (lft, rgt,
node_key) so a large number of them fit into one data page. The rows
that are "below" the insertion point are not locked. As each page is
updated, it is freed. How fast can a modern disk drive do a sequential
read/write on physically contigous storage locations?
--CELKO--
===========================
Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, datatypes, etc. in your
schema are.
*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!
--CELKO--
2004-07-19 15:20:40 UTC
Permalink
Post by Steve Kass
Why is this a good model to use, when finding a difference between
two tables, Joe_1 and Joe_2, cannot let you conclude that they
represent different hierarchies? <<

Unh? Read Section 4.12 Comparing Nodes and Structure in the book. You
have three kinds of comparisons:

1) Same nodes in both trees: that is a common query and easily done on
the nodes tables.

2) Same structure in both trees: the same as above, but with (lft,rgt)
pairs.
Remember that the nested sets model has ordered siblings.

3) Same nodes in the same structure in both tables; do both of the
queries above.
Post by Steve Kass
I think I'd understand this all better if you could explain to me
what attributes of the real-world hierarchy are represented by the
integer columns lft and rgt. <<

A position within the hierarchy, just like (longtitude, latitude)
pairs represent a position on the globe.
Steve Kass
2004-07-19 15:57:13 UTC
Permalink
Post by --CELKO--
Post by Steve Kass
Why is this a good model to use, when finding a difference between
two tables, Joe_1 and Joe_2, cannot let you conclude that they
represent different hierarchies? <<
Unh? Read Section 4.12 Comparing Nodes and Structure in the book. You
1) Same nodes in both trees: that is a common query and easily done on
the nodes tables.
2) Same structure in both trees: the same as above, but with (lft,rgt)
pairs.
Remember that the nested sets model has ordered siblings.
3) Same nodes in the same structure in both tables; do both of the
queries above.
A year ago, the row (Elmira,103, 236) was in the database. We hired
some people and fired some people, and we think, but are not sure, that
we are back to where we were. But now Elmira's coordinates are (Elmira,
276, 300). How do we find out if the hierarchy is just as it was before
or not? [We only can if we canonicalize, and the fact that your nested
sets model is not canonicalized is a problem - this is the crucial
difference between your model and Tropashko's.]

I have the same names in Joe_1 and Joe_2 but none of the (lft, rgt)
numbers match. Joe_1 has all even numbers and they range from 10000 to
900000, and Joe_2 has odd numbers ranging from 11 to 97. Unfortunately,
I have no clue as to whether Joe_1 and Joe_2 represent the same
hierarchy. They might, and they might not.
Post by --CELKO--
Post by Steve Kass
I think I'd understand this all better if you could explain to me
what attributes of the real-world hierarchy are represented by the
integer columns lft and rgt. <<
A position within the hierarchy, just like (longtitude, latitude)
pairs represent a position on the globe.
Let me be clearer. Exactly /how/ do (lft,rgt) represent position?
What's the formula that gives me the clear position from the (lft,rgt)
values? Is Elmira the CEO of the company? Her coordinates are (321,
4409). [Answer: "I don't know."] This must have a clear answer for
this to be a useful model. In other words, given a position, how to I
compute the exact lft,rgt values, and vice versa. I know how to do this
for latitude and longitude, given maps, or compasses, or a GPS receiver,
but the bijection between (lft,rgt) and (position in the hierarchy)
isn't in my encyclopedia.

No, (lft,rgt) and not at all just like (longtitude [sic], latitude).
Geographical coordinates have absolute meanings, and (lft,rgt) don't.
Knowing only the (lft, rgt) coordinates of a person tells you nothing,
while (latitude, longitude) tells you global location precisely. It
doesn't tell you altitude, of course, but that's fine, since flat maps
and geographical coordinates aren't intended to model altitude But the
(lft,rgt) values in one person's row *don't* allow you to discover any
attribute of the hierarchy that this model is supposed to be providing,
and least not in the way databases should work - the values in an
isolated row should give you information on their own, or through FK
relationships to attributes in another row whose key is in the given row.

If you specify a location on the globe in any precise way (exactly 372.9
kilometers east of Ouagadougou), the longitude and latitude are not in
doubt. If you give me to (long,lat) pairs, the distance between the
cities can be found, etc., etc. If you walk into an office and ask
Elmira what her position is in the hierarchy, she will tell you who her
boss is and perhaps also who she supervises, but her (lft, rgt) values
are artificial aspects of the database mode and you will never find them
out. I doubt she knows them herself. They might be (12345, 13459) and
they might be (13, 17) in two (different!) faithful representations of
the same organization, depending on irrelevant factors like the order in
which employees were entered into the database, or how many people and
who were hired and fired before she came on board.


SK
Jack D. Ripper
2004-07-19 20:56:34 UTC
Permalink
Post by Steve Kass
Let me be clearer. Exactly /how/ do (lft,rgt) represent position?
What's the formula that gives me the clear position from the (lft,rgt)
values? Is Elmira the CEO of the company? Her coordinates are (321,
4409). [Answer: "I don't know."] This must have a clear answer for
this to be a useful model. In other words, given a position, how to I
compute the exact lft,rgt values, and vice versa. I know how to do this
for latitude and longitude, given maps, or compasses, or a GPS receiver,
but the bijection between (lft,rgt) and (position in the hierarchy)
isn't in my encyclopedia.
No, (lft,rgt) and not at all just like (longtitude [sic], latitude).
Geographical coordinates have absolute meanings, and (lft,rgt) don't.
Knowing only the (lft, rgt) coordinates of a person tells you nothing,
while (latitude, longitude) tells you global location precisely. It
doesn't tell you altitude, of course, but that's fine, since flat maps
and geographical coordinates aren't intended to model altitude But the
(lft,rgt) values in one person's row *don't* allow you to discover any
attribute of the hierarchy that this model is supposed to be providing,
and least not in the way databases should work - the values in an
isolated row should give you information on their own, or through FK
relationships to attributes in another row whose key is in the given row.
If you specify a location on the globe in any precise way (exactly 372.9
kilometers east of Ouagadougou), the longitude and latitude are not in
doubt. If you give me to (long,lat) pairs, the distance between the
cities can be found, etc., etc. If you walk into an office and ask
Elmira what her position is in the hierarchy, she will tell you who her
boss is and perhaps also who she supervises, but her (lft, rgt) values
are artificial aspects of the database mode and you will never find them
out. I doubt she knows them herself. They might be (12345, 13459) and
they might be (13, 17) in two (different!) faithful representations of
the same organization, depending on irrelevant factors like the order in
which employees were entered into the database, or how many people and
who were hired and fired before she came on board.
In essence lft,rgt are identity columns (placeholders).The irony
is delicious:)
--CELKO--
2004-07-20 16:54:30 UTC
Permalink
I have tried posting this twice and it got lost!
two tables, Joe_1 and Joe_2, cannot let you conclude that they
represent different hierarchies? <<

If you are not in a union shop, a university or other hieratrchy where
seniority maters so you don't have sibling ordering, as for example:

Joe_1:
Albert (1,12)
/ \
/ \
Bert (2,3) Chuck (4,11)
/ | \
/ | \
/ | \
/ | \
Donna (5,6) Eddie (7,8) Fred (9,10)


Joe_2:
Albert (1,12)
/ \
/ \
Chuck (2,9) Bert (10,11)
/ | \
/ | \
/ | \
/ | \
Fred (3,4) Eddie (5,6) Donna (7,8)

Give each node a weight, and then sum them hierarchically.

Weights
emp wgt
==================
'Albert' 1
'Bert' 2
'Chuck' 3
'Donna' 4
'Eddie' 5
'Fred' 6

The query is:

SELECT O2.emp, SUM(W1.wgt)
FROM OrgChart AS O1, OrgChart AS O2,
Weights AS W1
WHERE O1.lft BETWEEN O2.lft AND O2.rgt
AND O1.emp = W1.emp
GROUP BY O2.emp;

If both tables are the same, the trees have the same structure without
regard to sibling orderings. This simple trick came from Tropashko
several weeks ago and I feel like an idiot for not seeing it
immediately.

martin
2004-07-19 01:28:38 UTC
Permalink
Joe, Thanks for your time and answers.

cheers

martin
Post by martin
Post by martin
if a full table lock is neccesary when updating a
nested set. I am assuming that all "rgt" integer values will need to be
increment by 1. <<
Not quite. Each lft and rgt value increases by twice the number of
nodes in the inserted subtree to make room. Use a clustered index to
order the physical storage on the rows. The rows are short (lft, rgt,
node_key) so a large number of them fit into one data page. The rows
that are "below" the insertion point are not locked. As each page is
updated, it is freed. How fast can a modern disk drive do a sequential
read/write on physically contigous storage locations?
--CELKO--
===========================
Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, datatypes, etc. in your
schema are.
*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!
Joe Celko
2004-07-19 02:43:29 UTC
Permalink
Joe, Thanks for your time and answers. <<
Please, please, don't thank me -- Buy the book! Royalties, like
plagiarism, are the sincerest form of flattery :)

--CELKO--
===========================
Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, datatypes, etc. in your
schema are.

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!
Loading...