Discussion:
SQL Challenge: Collapsing overlapping date ranges
(too old to reply)
Brian Baker [MVP]
2003-07-25 22:12:11 UTC
Permalink
Hello all,

I have a little challenge that I've been struggling with. The scenario is
below. Note that the source table has over 2 million rows. The first
solution we tried copied the table to a temporary table and then used a
cursor to iterate over the rows and delete or modify from the temporary
table as necessary. This worked okay for a small set of the data but the
performance was extremely poor when ran against the whole table.

Thanks for taking a look and please let me know if you need more info!

/*

Scenario:
I have a table that lists enrollment periods for an insurance company.
It is possible for a member to be enrolled in more than one contract on
a given date. I need to create a query on this table that has no
overlapping date periods for a given member_num.

The following are some assumptions / notes:

1. If the enrollment_begin_dt and enrollment_end_dt fall between the
enrollment_begin_dt and enrollment_end_dt of another row for the same
member_num, the row with the earliest enrollment_begin_dt takes
precedence. If both rows have the same enrollment_begin_dt, the row with
the latest enrollment_end_dt takes precedence.

2. If there is an overlap where the dates do not fall in the range then
the earliest enrollment_begin_dt value has to be retained and the latest
enrollment_end_dt value is retained. All other columns in the resulting
row are taken from the source row with the latest enrollment_end_dt.

3. Rows are considered overlapping if the enrollment_end_dt of one record
= the enrollment_begin_dt of the next record.

4. Only overlapping rows should be modified. Rows with contiguous dates
are OK.

5. The actual table has several more columns that will be present in the
query, but they should not have an impact on the date logic used.

Given the sample data below, here is what I need as the results:

member_num contract_num enrollment_begin_dt enrollment_end_dt
---------- ------------ ------------------- -----------------
107272 61927 03/01/1989 12/31/1992
107272 61930 01/01/1993 09/04/1998
107272 61928 9/5/98 12/31/98
107272 61928 1/1/99 6/30/99
107272 61928 7/1/99 12/31/99
107272 61928 1/1/00 10/31/00

257668 120716 10/23/95 12/31/95
257668 120716 1/1/96 5/31/96
257668 120716 6/1/96 12/31/96
257668 120716 1/1/97 12/31/1998
257668 120716 1/1/99 12/31/99
257668 120716 1/1/00 12/31/00


*/

CREATE TABLE [overlap] (
[cdw_id] [bigint] IDENTITY (1, 1) NOT NULL ,
[member_num] [varchar] (25) NULL,
[contract_num] [int] NOT NULL ,
[enrollment_begin_dt] [datetime] NOT NULL ,
[enrollment_end_dt] [datetime] NULL ,
)

INSERT INTO overlap values ('107272', 61930, '3/1/89', '12/31/92')
INSERT INTO overlap values ('107272', 61927, '4/1/90', '12/31/92')
INSERT INTO overlap values ('107272', 61928, '1/1/93', '12/31/93')
INSERT INTO overlap values ('107272', 61930, '1/1/93', '9/4/98')
INSERT INTO overlap values ('107272', 61928, '1/1/94', '12/31/95')
INSERT INTO overlap values ('107272', 61928, '1/1/96', '2/29/96')
INSERT INTO overlap values ('107272', 61928, '3/1/96', '12/31/96')
INSERT INTO overlap values ('107272', 61928, '1/1/97', '9/4/98')
INSERT INTO overlap values ('107272', 61928, '9/5/98', '12/31/98')
INSERT INTO overlap values ('107272', 61928, '1/1/99', '6/30/99')
INSERT INTO overlap values ('107272', 61928, '7/1/99', '12/31/99')
INSERT INTO overlap values ('107272', 61928, '1/1/00', '10/31/00')

INSERT INTO overlap values ('257668', 120716, '10/23/95', '12/31/95')
INSERT INTO overlap values ('257668', 120716, '1/1/96', '5/31/96')
INSERT INTO overlap values ('257668', 120716, '6/1/96', '12/31/96')
INSERT INTO overlap values ('257668', 120716, '1/1/97', '6/16/97')
INSERT INTO overlap values ('257668', 161507, '1/1/97', '6/13/98')
INSERT INTO overlap values ('257668', 120716, '6/17/97', '6/30/97')
INSERT INTO overlap values ('257668', 120716, '7/1/97', '10/31/97')
INSERT INTO overlap values ('257668', 120716, '11/1/97', '12/31/97')
INSERT INTO overlap values ('257668', 120716, '1/1/98', '12/31/98')
INSERT INTO overlap values ('257668', 120716, '1/1/99', '12/31/99')
INSERT INTO overlap values ('257668', 120716, '1/1/00', '12/31/00')


SELECT * FROM overlap
order by member_num, enrollment_begin_dt

DROP TABLE overlap


Brian Baker
--
Microsoft MVP -- ASP / ASP.NET
Please post responses to the group
Joe Celko
2003-07-26 04:41:40 UTC
Permalink
I have done this one a fews times in the past, but I don't have the code
in front of me and I need to get started on the weekend. Let me give
you an outline, without checking the code.

1) Delete all durations properly contained in another; be careful to
avoid the case where the duration contains itself.

DELETE FROM Overlaps
WHERE EXISTS
(SELECT *
FROM Overlaps AS O1
WHERE O1.begin_dt BETWEEN Overlaps.begin_dt
AND Overlaps.end_dt
AND O1.end_dt BETWEEN Overlaps.begin_dt
AND Overlaps.end_dt
AND (O1.begin_dt <> O2.begin_dt
OR O1.end_dt <> O2.end_dt))

2) Build a calendar table -- a list of dates in the range you need.

3) For each account, pick all possible pairs of start and end dates from
different durations. Do not compare a duration to itself.

4) Look for a calendar date which is not in a duration for that account
that falls between the pairs of start and end dates. Reject it. If all
days are covered by one or more durations, then that pairs of start and
end dates is complete.

5) You can do either do a min-max or stuff all such pairs into the table
and run step #1 again.

--CELKO--
===========================
Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, datatypes, etc. in your
schema are.

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!
Steve Kass
2003-07-26 05:55:05 UTC
Permalink
Brian,

I can't guarantee that this will handle all possibilities correctly,
and it seems a little too slick, but it might be a start. Check the
results carefully, and if you see cases it doesn't handle, let me know.
The indexing I've added should make it reasonably efficient, even
if it gives the wrong answers ;)

CREATE TABLE [overlap] (
[cdw_id] [bigint] IDENTITY (1, 1) NOT NULL ,
[member_num] [varchar] (25) NOT NULL,
[contract_num] [int] NOT NULL ,
[enrollment_begin_dt] [datetime] NOT NULL ,
[enrollment_end_dt] [datetime] NULL ,
PRIMARY KEY (member_num, enrollment_begin_dt, cdw_id)
)
create index overlap_nci1 on overlap(member_num, enrollment_end_dt)


INSERT INTO overlap values ('107272', 61930, '3/1/89', '12/31/92')
INSERT INTO overlap values ('107272', 61927, '4/1/90', '12/31/92')
INSERT INTO overlap values ('107272', 61928, '1/1/93', '12/31/93')
INSERT INTO overlap values ('107272', 61930, '1/1/93', '9/4/98')
INSERT INTO overlap values ('107272', 61928, '1/1/94', '12/31/95')
INSERT INTO overlap values ('107272', 61928, '1/1/96', '2/29/96')
INSERT INTO overlap values ('107272', 61928, '3/1/96', '12/31/96')
INSERT INTO overlap values ('107272', 61928, '1/1/97', '9/4/98')
INSERT INTO overlap values ('107272', 61928, '9/5/98', '12/31/98')
INSERT INTO overlap values ('107272', 61928, '1/1/99', '6/30/99')
INSERT INTO overlap values ('107272', 61928, '7/1/99', '12/31/99')
INSERT INTO overlap values ('107272', 61928, '1/1/00', '10/31/00')

INSERT INTO overlap values ('257668', 120716, '10/23/95', '12/31/95')
INSERT INTO overlap values ('257668', 120716, '1/1/96', '5/31/96')
INSERT INTO overlap values ('257668', 120716, '6/1/96', '12/31/96')
INSERT INTO overlap values ('257668', 120716, '1/1/97', '6/16/97')
INSERT INTO overlap values ('257668', 161507, '1/1/97', '6/13/98')
INSERT INTO overlap values ('257668', 120716, '6/17/97', '6/30/97')
INSERT INTO overlap values ('257668', 120716, '7/1/97', '10/31/97')
INSERT INTO overlap values ('257668', 120716, '11/1/97', '12/31/97')
INSERT INTO overlap values ('257668', 120716, '1/1/98', '12/31/98')
INSERT INTO overlap values ('257668', 120716, '1/1/99', '12/31/99')
INSERT INTO overlap values ('257668', 120716, '1/1/00', '12/31/00')

SELECT
O1.member_num,
O1.enrollment_begin_dt,
MIN(O2.enrollment_end_dt) as enrollment_end_dt
FROM overlap O1, overlap O2
WHERE O2.enrollment_end_dt >= O1.enrollment_end_dt
AND NOT EXISTS (
SELECT * FROM overlap O3
WHERE O3.enrollment_begin_dt < O1.enrollment_begin_dt
AND O3.enrollment_end_dt >= O1.enrollment_begin_dt
AND O3.member_num = O1.member_num
) AND NOT EXISTS (
SELECT * FROM overlap O3
WHERE O3.enrollment_end_dt > O2.enrollment_end_dt
AND O3.enrollment_begin_dt <= O2.enrollment_end_dt
AND O3.member_num = O2.member_num
)
group by O1.member_num, O1.enrollment_begin_dt
order by O1.member_num, O1.enrollment_begin_dt
go

DROP TABLE overlap
Post by Brian Baker [MVP]
Hello all,
I have a little challenge that I've been struggling with. The scenario is
below. Note that the source table has over 2 million rows. The first
solution we tried copied the table to a temporary table and then used a
cursor to iterate over the rows and delete or modify from the temporary
table as necessary. This worked okay for a small set of the data but the
performance was extremely poor when ran against the whole table.
Thanks for taking a look and please let me know if you need more info!
/*
I have a table that lists enrollment periods for an insurance company.
It is possible for a member to be enrolled in more than one contract on
a given date. I need to create a query on this table that has no
overlapping date periods for a given member_num.
1. If the enrollment_begin_dt and enrollment_end_dt fall between the
enrollment_begin_dt and enrollment_end_dt of another row for the same
member_num, the row with the earliest enrollment_begin_dt takes
precedence. If both rows have the same enrollment_begin_dt, the row with
the latest enrollment_end_dt takes precedence.
2. If there is an overlap where the dates do not fall in the range then
the earliest enrollment_begin_dt value has to be retained and the latest
enrollment_end_dt value is retained. All other columns in the resulting
row are taken from the source row with the latest enrollment_end_dt.
3. Rows are considered overlapping if the enrollment_end_dt of one record
= the enrollment_begin_dt of the next record.
4. Only overlapping rows should be modified. Rows with contiguous dates
are OK.
5. The actual table has several more columns that will be present in the
query, but they should not have an impact on the date logic used.
member_num contract_num enrollment_begin_dt enrollment_end_dt
---------- ------------ ------------------- -----------------
107272 61927 03/01/1989 12/31/1992
107272 61930 01/01/1993 09/04/1998
107272 61928 9/5/98 12/31/98
107272 61928 1/1/99 6/30/99
107272 61928 7/1/99 12/31/99
107272 61928 1/1/00 10/31/00
257668 120716 10/23/95 12/31/95
257668 120716 1/1/96 5/31/96
257668 120716 6/1/96 12/31/96
257668 120716 1/1/97 12/31/1998
257668 120716 1/1/99 12/31/99
257668 120716 1/1/00 12/31/00
*/
CREATE TABLE [overlap] (
[cdw_id] [bigint] IDENTITY (1, 1) NOT NULL ,
[member_num] [varchar] (25) NULL,
[contract_num] [int] NOT NULL ,
[enrollment_begin_dt] [datetime] NOT NULL ,
[enrollment_end_dt] [datetime] NULL ,
)
INSERT INTO overlap values ('107272', 61930, '3/1/89', '12/31/92')
INSERT INTO overlap values ('107272', 61927, '4/1/90', '12/31/92')
INSERT INTO overlap values ('107272', 61928, '1/1/93', '12/31/93')
INSERT INTO overlap values ('107272', 61930, '1/1/93', '9/4/98')
INSERT INTO overlap values ('107272', 61928, '1/1/94', '12/31/95')
INSERT INTO overlap values ('107272', 61928, '1/1/96', '2/29/96')
INSERT INTO overlap values ('107272', 61928, '3/1/96', '12/31/96')
INSERT INTO overlap values ('107272', 61928, '1/1/97', '9/4/98')
INSERT INTO overlap values ('107272', 61928, '9/5/98', '12/31/98')
INSERT INTO overlap values ('107272', 61928, '1/1/99', '6/30/99')
INSERT INTO overlap values ('107272', 61928, '7/1/99', '12/31/99')
INSERT INTO overlap values ('107272', 61928, '1/1/00', '10/31/00')
INSERT INTO overlap values ('257668', 120716, '10/23/95', '12/31/95')
INSERT INTO overlap values ('257668', 120716, '1/1/96', '5/31/96')
INSERT INTO overlap values ('257668', 120716, '6/1/96', '12/31/96')
INSERT INTO overlap values ('257668', 120716, '1/1/97', '6/16/97')
INSERT INTO overlap values ('257668', 161507, '1/1/97', '6/13/98')
INSERT INTO overlap values ('257668', 120716, '6/17/97', '6/30/97')
INSERT INTO overlap values ('257668', 120716, '7/1/97', '10/31/97')
INSERT INTO overlap values ('257668', 120716, '11/1/97', '12/31/97')
INSERT INTO overlap values ('257668', 120716, '1/1/98', '12/31/98')
INSERT INTO overlap values ('257668', 120716, '1/1/99', '12/31/99')
INSERT INTO overlap values ('257668', 120716, '1/1/00', '12/31/00')
SELECT * FROM overlap
order by member_num, enrollment_begin_dt
DROP TABLE overlap
Brian Baker
Brian Baker [MVP]
2003-07-28 13:33:25 UTC
Permalink
Hi Steve:

You are correct with the statement in the first paragraph--the only factor
in collapsing the date ranges is the member_num. However, I need to pull
contract_num (as well as several other non-key columns that exist in my
actual table but not in this example) out in my query. In the case where
there are overlapping date values, the values for these other columns
should come from the row with the latest enrollment_end_dt. (See note 2
from the original post.)

You are also correct that the query posed in your last message will not
get the results I'm looking for. Here is the data I'm expecting for
member_num = 107272:

member_num contract_num enrollment_begin_dt enrollment_end_dt
---------- ------------ ------------------- -----------------
107272 61927 03/01/1989 12/31/1992
107272 61930 01/01/1993 09/04/1998
107272 61928 9/5/98 12/31/98
107272 61928 1/1/99 6/30/99
107272 61928 7/1/99 12/31/99
107272 61928 1/1/00 10/31/00

Notice that the enrollment_begin_dt comes from the first row that was
inserted, while the member_num, contract_num, and enrollment_end_dt come
from the second row. Does this explain it well enough? Would it help to
include some additional columns in the sample dataset?

Thanks!
Brian Baker
Post by Steve Kass
Brian,
Maybe I didn't understand the original problem. I ignored
contract_num because I thought you wanted date ranges for
members regardless of the particular contract number(s) in
place during those date ranges. The query I suggested will
collapse ranges for different contract numbers into one range
as long as the pieces overlap.
If you only want to collapse ranges when the contract
numbers are the same, I would think it will work to add
contract_num everywhere I have member_num (as if the member
is now (member_num, contract_num), and if it doesn't, then
there's something I don't understand.
If you want to collapse ranges that have pieces corresponding
to different contract_num values but somehow list the relevant
contract numbers, you'll have to be clearer on what you want the
results to look like.
Ccan you show what you want as a result from your sample
SELECT
O1.member_num,
O1.contract_num,
O1.enrollment_begin_dt,
MIN(O2.enrollment_end_dt) as enrollment_end_dt
FROM overlap O1, overlap O2
WHERE
O1.member_num = O2.member_num
and O1.contract_num = O2.contract_num
and O2.enrollment_end_dt >= O1.enrollment_end_dt
AND NOT EXISTS (
SELECT * FROM overlap O3
WHERE O3.enrollment_begin_dt < O1.enrollment_begin_dt
AND O3.enrollment_end_dt >= O1.enrollment_begin_dt
AND O3.member_num = O1.member_num
and O3.contract_num = O1.contract_num
) AND NOT EXISTS (
SELECT * FROM overlap O3
WHERE O3.enrollment_end_dt > O2.enrollment_end_dt
AND O3.enrollment_begin_dt <= O2.enrollment_end_dt
AND O3.member_num = O2.member_num
and O3.contract_num = O2.contract_num
)
group by O1.member_num, O1.enrollment_begin_dt, O1.contract_num
order by O1.member_num, O1.contract_num, O1.enrollment_begin_dt
go*
Whether it's what you want or not, this query should
return more rows than before. In the original
query I posted, for example, there are few results
for member 107272, since contract *61930 spans many
of the rows for that customer. But if separate
contracts can't be collapsed, all the non-overlapping
pieces of contract 61928 now must be listed separately
in the result.
Steve
*
*
Thanks for offering your code. I did have to add "o1.member_num =
o2.member_num" to the where clause as seen below. That seemed to
return
Post by Steve Kass
the results I needed. However, the trick now is getting the additional
columns that will be needed. If I try to add O1.contract_num to the
select expression (with a corresponding O1.contract_num in the group
by), I get extraneous rows added to the resultset. I was reading a
couple of posts you made recently that were trying to demonstrate how
to use TOP 1 or NOT EXISTS to simulate the Access LAST() function, but
I didn't quite follow the sample...
/* updated query */
SELECT
O1.member_num,
O1.enrollment_begin_dt,
MIN(O2.enrollment_end_dt) as enrollment_end_dt
FROM overlap O1, overlap O2
WHERE
O1.member_num = O2.member_num
and O2.enrollment_end_dt >= O1.enrollment_end_dt
AND NOT EXISTS (
SELECT * FROM overlap O3
WHERE O3.enrollment_begin_dt < O1.enrollment_begin_dt
AND O3.enrollment_end_dt >= O1.enrollment_begin_dt
AND O3.member_num = O1.member_num
) AND NOT EXISTS (
SELECT * FROM overlap O3
WHERE O3.enrollment_end_dt > O2.enrollment_end_dt
AND O3.enrollment_begin_dt <= O2.enrollment_end_dt
AND O3.member_num = O2.member_num
)
group by O1.member_num, O1.enrollment_begin_dt
order by O1.member_num, O1.enrollment_begin_dt
go
Brian Baker
Post by Steve Kass
Brian,
I can't guarantee that this will handle all possibilities correctly,
and it seems a little too slick, but it might be a start. Check the
results carefully, and if you see cases it doesn't handle, let me know.
The indexing I've added should make it reasonably efficient, even
if it gives the wrong answers ;)
CREATE TABLE [overlap] (
[cdw_id] [bigint] IDENTITY (1, 1) NOT NULL ,
[member_num] [varchar] (25) NOT NULL,
[contract_num] [int] NOT NULL ,
[enrollment_begin_dt] [datetime] NOT NULL ,
[enrollment_end_dt] [datetime] NULL ,
PRIMARY KEY (member_num, enrollment_begin_dt, cdw_id)
)
create index overlap_nci1 on overlap(member_num, enrollment_end_dt)
INSERT INTO overlap values ('107272', 61930, '3/1/89', '12/31/92')
INSERT INTO overlap values ('107272', 61927, '4/1/90', '12/31/92')
INSERT INTO overlap values ('107272', 61928, '1/1/93', '12/31/93')
INSERT INTO overlap values ('107272', 61930, '1/1/93', '9/4/98')
INSERT INTO overlap values ('107272', 61928, '1/1/94', '12/31/95')
INSERT INTO overlap values ('107272', 61928, '1/1/96', '2/29/96')
INSERT INTO overlap values ('107272', 61928, '3/1/96', '12/31/96')
INSERT INTO overlap values ('107272', 61928, '1/1/97', '9/4/98')
INSERT INTO overlap values ('107272', 61928, '9/5/98', '12/31/98')
INSERT INTO overlap values ('107272', 61928, '1/1/99', '6/30/99')
INSERT INTO overlap values ('107272', 61928, '7/1/99', '12/31/99')
INSERT INTO overlap values ('107272', 61928, '1/1/00', '10/31/00')
INSERT INTO overlap values ('257668', 120716, '10/23/95', '12/31/95')
INSERT INTO overlap values ('257668', 120716, '1/1/96', '5/31/96')
INSERT INTO overlap values ('257668', 120716, '6/1/96', '12/31/96')
INSERT INTO overlap values ('257668', 120716, '1/1/97', '6/16/97')
INSERT INTO overlap values ('257668', 161507, '1/1/97', '6/13/98')
INSERT INTO overlap values ('257668', 120716, '6/17/97', '6/30/97')
INSERT INTO overlap values ('257668', 120716, '7/1/97', '10/31/97')
INSERT INTO overlap values ('257668', 120716, '11/1/97', '12/31/97')
INSERT INTO overlap values ('257668', 120716, '1/1/98', '12/31/98')
INSERT INTO overlap values ('257668', 120716, '1/1/99', '12/31/99')
INSERT INTO overlap values ('257668', 120716, '1/1/00', '12/31/00')
SELECT
O1.member_num,
O1.enrollment_begin_dt,
MIN(O2.enrollment_end_dt) as enrollment_end_dt
FROM overlap O1, overlap O2
WHERE O2.enrollment_end_dt >= O1.enrollment_end_dt
AND NOT EXISTS (
SELECT * FROM overlap O3
WHERE O3.enrollment_begin_dt < O1.enrollment_begin_dt
AND O3.enrollment_end_dt >= O1.enrollment_begin_dt
AND O3.member_num = O1.member_num
) AND NOT EXISTS (
SELECT * FROM overlap O3
WHERE O3.enrollment_end_dt > O2.enrollment_end_dt
AND O3.enrollment_begin_dt <= O2.enrollment_end_dt
AND O3.member_num = O2.member_num
)
group by O1.member_num, O1.enrollment_begin_dt
order by O1.member_num, O1.enrollment_begin_dt
go
DROP TABLE overlap
Post by Brian Baker [MVP]
Hello all,
I have a little challenge that I've been struggling with. The
scenario is below. Note that the source table has over 2 million
rows. The first solution we tried copied the table to a temporary
table and then used a cursor to iterate over the rows and delete or
modify from the temporary table as necessary. This worked okay for a
small set of the data but the performance was extremely poor when ran
against the whole table.
Thanks for taking a look and please let me know if you need more info!
/*
I have a table that lists enrollment periods for an insurance company.
It is possible for a member to be enrolled in more than one contract
on a given date. I need to create a query on this table that has no
overlapping date periods for a given member_num.
1. If the enrollment_begin_dt and enrollment_end_dt fall between the
enrollment_begin_dt and enrollment_end_dt of another row for the same
member_num, the row with the earliest enrollment_begin_dt takes
precedence. If both rows have the same enrollment_begin_dt, the row
with the latest enrollment_end_dt takes precedence.
2. If there is an overlap where the dates do not fall in the range
then the earliest enrollment_begin_dt value has to be retained and
the
Post by Steve Kass
latest
Post by Steve Kass
Post by Brian Baker [MVP]
enrollment_end_dt value is retained. All other columns in the
resulting
Post by Steve Kass
Post by Brian Baker [MVP]
row are taken from the source row with the latest enrollment_end_dt.
3. Rows are considered overlapping if the enrollment_end_dt of one
record = the enrollment_begin_dt of the next record.
4. Only overlapping rows should be modified. Rows with contiguous
dates
Post by Steve Kass
Post by Brian Baker [MVP]
are OK.
5. The actual table has several more columns that will be present in
the
Post by Steve Kass
Post by Brian Baker [MVP]
query, but they should not have an impact on the date logic used.
member_num contract_num enrollment_begin_dt enrollment_end_dt
---------- ------------ ------------------- -----------------
107272 61927 03/01/1989 12/31/1992
107272 61930 01/01/1993 09/04/1998
107272 61928 9/5/98 12/31/98
107272 61928 1/1/99 6/30/99
107272 61928 7/1/99 12/31/99
107272 61928 1/1/00 10/31/00
257668 120716 10/23/95 12/31/95
257668 120716 1/1/96 5/31/96
257668 120716 6/1/96 12/31/96
257668 120716 1/1/97 12/31/1998
257668 120716 1/1/99 12/31/99
257668 120716 1/1/00 12/31/00
*/
CREATE TABLE [overlap] (
[cdw_id] [bigint] IDENTITY (1, 1) NOT NULL ,
[member_num] [varchar] (25) NULL,
[contract_num] [int] NOT NULL ,
[enrollment_begin_dt] [datetime] NOT NULL ,
[enrollment_end_dt] [datetime] NULL ,
)
INSERT INTO overlap values ('107272', 61930, '3/1/89', '12/31/92')
INSERT INTO overlap values ('107272', 61927, '4/1/90', '12/31/92')
INSERT INTO overlap values ('107272', 61928, '1/1/93', '12/31/93')
INSERT INTO overlap values ('107272', 61930, '1/1/93', '9/4/98')
INSERT INTO overlap values ('107272', 61928, '1/1/94', '12/31/95')
INSERT INTO overlap values ('107272', 61928, '1/1/96', '2/29/96')
INSERT INTO overlap values ('107272', 61928, '3/1/96', '12/31/96')
INSERT INTO overlap values ('107272', 61928, '1/1/97', '9/4/98')
INSERT INTO overlap values ('107272', 61928, '9/5/98', '12/31/98')
INSERT INTO overlap values ('107272', 61928, '1/1/99', '6/30/99')
INSERT INTO overlap values ('107272', 61928, '7/1/99', '12/31/99')
INSERT INTO overlap values ('107272', 61928, '1/1/00', '10/31/00')
INSERT INTO overlap values ('257668', 120716, '10/23/95', '12/31/95')
INSERT INTO overlap values ('257668', 120716, '1/1/96', '5/31/96')
INSERT INTO overlap values ('257668', 120716, '6/1/96', '12/31/96')
INSERT INTO overlap values ('257668', 120716, '1/1/97', '6/16/97')
INSERT INTO overlap values ('257668', 161507, '1/1/97', '6/13/98')
INSERT INTO overlap values ('257668', 120716, '6/17/97', '6/30/97')
INSERT INTO overlap values ('257668', 120716, '7/1/97', '10/31/97')
INSERT INTO overlap values ('257668', 120716, '11/1/97', '12/31/97')
INSERT INTO overlap values ('257668', 120716, '1/1/98', '12/31/98')
INSERT INTO overlap values ('257668', 120716, '1/1/99', '12/31/99')
INSERT INTO overlap values ('257668', 120716, '1/1/00', '12/31/00')
SELECT * FROM overlap
order by member_num, enrollment_begin_dt
DROP TABLE overlap
--
Microsoft MVP -- ASP / ASP.NET
Please post responses to the group
Steve Kass
2003-07-28 14:37:10 UTC
Permalink
Brian,

See below for a two solution. I don't think your choice of contract_num
is completely defined, however. In your sample data, there are two
contracts for 107272 with enrollment end 12/31/1992, and there are
as well for enrollment end 9/4/1998. In the first case, you picked 61927,
which has a later begin date, but in the second case, you 61930, which
has an earlier begin date.

This example breaks ties using the latest begin date, just because
that's worth showing (it was trickier to do).

The first query uses something you might call a trick, that is sometimes
a useful replacement for TOP 1 or NOT EXISTS. First note that the
enrollment_end my query already prints is MIN(O2.enrollment_end_dt).
So to find the contract number that goes with this printed enrollment_end,
a first start is to find MIN(O2.enrollment_end_dt concatenated with
STR(contract_num,10)) and choose RIGHT(10) of that. That will be a
contract_num value corresponding to the correct enrollment_end if
O2.enrollment_end_dt is converted before concatenation to a character
value that sorts in the same order as dates.

But if there is more than one contract_num for the printed value of
MIN(O2.enrollment_end_dt), this first try will choose the smallest
value of contract_num. In order for MIN to choose the largest value
of enrollment_begin_dt instead, something has to be spliced into the
concatenation between O2.enrollment_end_dt and contract_num that
has the property that it becomes smaller (since we are taking MIN())
when enrollment_begin_dt becomes larger. I chose "days remaining
until 12/31/9999", which I convert to a right-justified fixed with string
for correct sorting.

Does that make sense? Here's the query. The down side of this
trick is that it can make the query less efficient, if an index was already
helping out the MIN. If so, it may be rewritable in terms of TOP 1 inside
subqueries, but I think this is easier to get right.

If you need a bunch of additional columns, you can use
the same trick, or you can do what I put as a second query
below, which my former solution as a view to join against for
the additional columns. The second query risks returning extra
rows if the tie-breaking criteria don't resolve a tie with too many
identical values.

One solution might be more efficient than the other.


-- Solution 1
SELECT
O1.member_num,
CAST(RIGHT(MIN(CONVERT(char(8), O2.enrollment_end_dt, 112)+
STR(DATEDIFF(day,O2.enrollment_begin_dt,CONVERT(datetime,'
99991231',112)),8)+
STR(O2.contract_num,10,0)),10) AS int) as contract_num,
O1.enrollment_begin_dt,
MIN(O2.enrollment_end_dt) as enrollment_end_dt
FROM overlap O1, overlap O2
WHERE
O1.member_num = O2.member_num
and O2.enrollment_end_dt >= O1.enrollment_end_dt
AND NOT EXISTS (
SELECT * FROM overlap O3
WHERE O3.enrollment_begin_dt < O1.enrollment_begin_dt
AND O3.enrollment_end_dt >= O1.enrollment_begin_dt
AND O3.member_num = O1.member_num
) AND NOT EXISTS (
SELECT * FROM overlap O3
WHERE O3.enrollment_end_dt > O2.enrollment_end_dt
AND O3.enrollment_begin_dt <= O2.enrollment_end_dt
AND O3.member_num = O2.member_num
)
group by O1.member_num, O1.enrollment_begin_dt
order by O1.member_num, O1.enrollment_begin_dt
go

-- Solution 2

CREATE VIEW V as
SELECT
O1.member_num,
O1.enrollment_begin_dt,
MIN(O2.enrollment_end_dt) as enrollment_end_dt
FROM overlap O1, overlap O2
WHERE
O1.member_num = O2.member_num
and O2.enrollment_end_dt >= O1.enrollment_end_dt
AND NOT EXISTS (
SELECT * FROM overlap O3
WHERE O3.enrollment_begin_dt < O1.enrollment_begin_dt
AND O3.enrollment_end_dt >= O1.enrollment_begin_dt
AND O3.member_num = O1.member_num
) AND NOT EXISTS (
SELECT * FROM overlap O3
WHERE O3.enrollment_end_dt > O2.enrollment_end_dt
AND O3.enrollment_begin_dt <= O2.enrollment_end_dt
AND O3.member_num = O2.member_num
)
group by O1.member_num, O1.enrollment_begin_dt
go

select
V.member_num,
O.contract_num,
V.enrollment_begin_dt,
V.enrollment_end_dt
from V join overlap O
on V.member_num = O.member_num
and V.enrollment_end_dt = O.enrollment_end_dt
where not exists (
select *
from overlap O2
where V.member_num = O2.member_num
and V.enrollment_end_dt = O2.enrollment_end_dt
and O2.enrollment_begin_dt > O.enrollment_begin_dt
)
order by V.member_num, V.enrollment_begin_dt
go

go
drop view V

Steve
Post by Brian Baker [MVP]
You are correct with the statement in the first paragraph--the only factor
in collapsing the date ranges is the member_num. However, I need to pull
contract_num (as well as several other non-key columns that exist in my
actual table but not in this example) out in my query. In the case where
there are overlapping date values, the values for these other columns
should come from the row with the latest enrollment_end_dt. (See note 2
from the original post.)
You are also correct that the query posed in your last message will not
get the results I'm looking for. Here is the data I'm expecting for
member_num contract_num enrollment_begin_dt enrollment_end_dt
---------- ------------ ------------------- -----------------
107272 61927 03/01/1989 12/31/1992
107272 61930 01/01/1993 09/04/1998
107272 61928 9/5/98 12/31/98
107272 61928 1/1/99 6/30/99
107272 61928 7/1/99 12/31/99
107272 61928 1/1/00 10/31/00
Notice that the enrollment_begin_dt comes from the first row that was
inserted, while the member_num, contract_num, and enrollment_end_dt come
from the second row. Does this explain it well enough? Would it help to
include some additional columns in the sample dataset?
Thanks!
Brian Baker
Post by Steve Kass
Brian,
Maybe I didn't understand the original problem. I ignored
contract_num because I thought you wanted date ranges for
members regardless of the particular contract number(s) in
place during those date ranges. The query I suggested will
collapse ranges for different contract numbers into one range
as long as the pieces overlap.
If you only want to collapse ranges when the contract
numbers are the same, I would think it will work to add
contract_num everywhere I have member_num (as if the member
is now (member_num, contract_num), and if it doesn't, then
there's something I don't understand.
If you want to collapse ranges that have pieces corresponding
to different contract_num values but somehow list the relevant
contract numbers, you'll have to be clearer on what you want the
results to look like.
Ccan you show what you want as a result from your sample
SELECT
O1.member_num,
O1.contract_num,
O1.enrollment_begin_dt,
MIN(O2.enrollment_end_dt) as enrollment_end_dt
FROM overlap O1, overlap O2
WHERE
O1.member_num = O2.member_num
and O1.contract_num = O2.contract_num
and O2.enrollment_end_dt >= O1.enrollment_end_dt
AND NOT EXISTS (
SELECT * FROM overlap O3
WHERE O3.enrollment_begin_dt < O1.enrollment_begin_dt
AND O3.enrollment_end_dt >= O1.enrollment_begin_dt
AND O3.member_num = O1.member_num
and O3.contract_num = O1.contract_num
) AND NOT EXISTS (
SELECT * FROM overlap O3
WHERE O3.enrollment_end_dt > O2.enrollment_end_dt
AND O3.enrollment_begin_dt <= O2.enrollment_end_dt
AND O3.member_num = O2.member_num
and O3.contract_num = O2.contract_num
)
group by O1.member_num, O1.enrollment_begin_dt, O1.contract_num
order by O1.member_num, O1.contract_num, O1.enrollment_begin_dt
go*
Whether it's what you want or not, this query should
return more rows than before. In the original
query I posted, for example, there are few results
for member 107272, since contract *61930 spans many
of the rows for that customer. But if separate
contracts can't be collapsed, all the non-overlapping
pieces of contract 61928 now must be listed separately
in the result.
Steve
*
*
Thanks for offering your code. I did have to add "o1.member_num =
o2.member_num" to the where clause as seen below. That seemed to
return
Post by Steve Kass
the results I needed. However, the trick now is getting the additional
columns that will be needed. If I try to add O1.contract_num to the
select expression (with a corresponding O1.contract_num in the group
by), I get extraneous rows added to the resultset. I was reading a
couple of posts you made recently that were trying to demonstrate how
to use TOP 1 or NOT EXISTS to simulate the Access LAST() function, but
I didn't quite follow the sample...
/* updated query */
SELECT
O1.member_num,
O1.enrollment_begin_dt,
MIN(O2.enrollment_end_dt) as enrollment_end_dt
FROM overlap O1, overlap O2
WHERE
O1.member_num = O2.member_num
and O2.enrollment_end_dt >= O1.enrollment_end_dt
AND NOT EXISTS (
SELECT * FROM overlap O3
WHERE O3.enrollment_begin_dt < O1.enrollment_begin_dt
AND O3.enrollment_end_dt >= O1.enrollment_begin_dt
AND O3.member_num = O1.member_num
) AND NOT EXISTS (
SELECT * FROM overlap O3
WHERE O3.enrollment_end_dt > O2.enrollment_end_dt
AND O3.enrollment_begin_dt <= O2.enrollment_end_dt
AND O3.member_num = O2.member_num
)
group by O1.member_num, O1.enrollment_begin_dt
order by O1.member_num, O1.enrollment_begin_dt
go
Brian Baker
Post by Steve Kass
Brian,
I can't guarantee that this will handle all possibilities correctly,
and it seems a little too slick, but it might be a start. Check the
results carefully, and if you see cases it doesn't handle, let me
know.
Post by Steve Kass
Post by Steve Kass
The indexing I've added should make it reasonably efficient, even
if it gives the wrong answers ;)
CREATE TABLE [overlap] (
[cdw_id] [bigint] IDENTITY (1, 1) NOT NULL ,
[member_num] [varchar] (25) NOT NULL,
[contract_num] [int] NOT NULL ,
[enrollment_begin_dt] [datetime] NOT NULL ,
[enrollment_end_dt] [datetime] NULL ,
PRIMARY KEY (member_num, enrollment_begin_dt, cdw_id)
)
create index overlap_nci1 on overlap(member_num, enrollment_end_dt)
INSERT INTO overlap values ('107272', 61930, '3/1/89', '12/31/92')
INSERT INTO overlap values ('107272', 61927, '4/1/90', '12/31/92')
INSERT INTO overlap values ('107272', 61928, '1/1/93', '12/31/93')
INSERT INTO overlap values ('107272', 61930, '1/1/93', '9/4/98')
INSERT INTO overlap values ('107272', 61928, '1/1/94', '12/31/95')
INSERT INTO overlap values ('107272', 61928, '1/1/96', '2/29/96')
INSERT INTO overlap values ('107272', 61928, '3/1/96', '12/31/96')
INSERT INTO overlap values ('107272', 61928, '1/1/97', '9/4/98')
INSERT INTO overlap values ('107272', 61928, '9/5/98', '12/31/98')
INSERT INTO overlap values ('107272', 61928, '1/1/99', '6/30/99')
INSERT INTO overlap values ('107272', 61928, '7/1/99', '12/31/99')
INSERT INTO overlap values ('107272', 61928, '1/1/00', '10/31/00')
INSERT INTO overlap values ('257668', 120716, '10/23/95', '12/31/95')
INSERT INTO overlap values ('257668', 120716, '1/1/96', '5/31/96')
INSERT INTO overlap values ('257668', 120716, '6/1/96', '12/31/96')
INSERT INTO overlap values ('257668', 120716, '1/1/97', '6/16/97')
INSERT INTO overlap values ('257668', 161507, '1/1/97', '6/13/98')
INSERT INTO overlap values ('257668', 120716, '6/17/97', '6/30/97')
INSERT INTO overlap values ('257668', 120716, '7/1/97', '10/31/97')
INSERT INTO overlap values ('257668', 120716, '11/1/97', '12/31/97')
INSERT INTO overlap values ('257668', 120716, '1/1/98', '12/31/98')
INSERT INTO overlap values ('257668', 120716, '1/1/99', '12/31/99')
INSERT INTO overlap values ('257668', 120716, '1/1/00', '12/31/00')
SELECT
O1.member_num,
O1.enrollment_begin_dt,
MIN(O2.enrollment_end_dt) as enrollment_end_dt
FROM overlap O1, overlap O2
WHERE O2.enrollment_end_dt >= O1.enrollment_end_dt
AND NOT EXISTS (
SELECT * FROM overlap O3
WHERE O3.enrollment_begin_dt < O1.enrollment_begin_dt
AND O3.enrollment_end_dt >= O1.enrollment_begin_dt
AND O3.member_num = O1.member_num
) AND NOT EXISTS (
SELECT * FROM overlap O3
WHERE O3.enrollment_end_dt > O2.enrollment_end_dt
AND O3.enrollment_begin_dt <= O2.enrollment_end_dt
AND O3.member_num = O2.member_num
)
group by O1.member_num, O1.enrollment_begin_dt
order by O1.member_num, O1.enrollment_begin_dt
go
DROP TABLE overlap
Post by Brian Baker [MVP]
Hello all,
I have a little challenge that I've been struggling with. The
scenario is below. Note that the source table has over 2 million
rows. The first solution we tried copied the table to a temporary
table and then used a cursor to iterate over the rows and delete or
modify from the temporary table as necessary. This worked okay for a
small set of the data but the performance was extremely poor when ran
against the whole table.
Thanks for taking a look and please let me know if you need more
info!
Post by Steve Kass
Post by Steve Kass
Post by Brian Baker [MVP]
/*
I have a table that lists enrollment periods for an insurance
company.
Post by Steve Kass
Post by Steve Kass
Post by Brian Baker [MVP]
It is possible for a member to be enrolled in more than one contract
on a given date. I need to create a query on this table that has no
overlapping date periods for a given member_num.
1. If the enrollment_begin_dt and enrollment_end_dt fall between the
enrollment_begin_dt and enrollment_end_dt of another row for the same
member_num, the row with the earliest enrollment_begin_dt takes
precedence. If both rows have the same enrollment_begin_dt, the row
with the latest enrollment_end_dt takes precedence.
2. If there is an overlap where the dates do not fall in the range
then the earliest enrollment_begin_dt value has to be retained and
the
Post by Steve Kass
latest
Post by Steve Kass
Post by Brian Baker [MVP]
enrollment_end_dt value is retained. All other columns in the
resulting
Post by Steve Kass
Post by Brian Baker [MVP]
row are taken from the source row with the latest enrollment_end_dt.
3. Rows are considered overlapping if the enrollment_end_dt of one
record = the enrollment_begin_dt of the next record.
4. Only overlapping rows should be modified. Rows with contiguous
dates
Post by Steve Kass
Post by Brian Baker [MVP]
are OK.
5. The actual table has several more columns that will be present in
the
Post by Steve Kass
Post by Brian Baker [MVP]
query, but they should not have an impact on the date logic used.
member_num contract_num enrollment_begin_dt enrollment_end_dt
---------- ------------ ------------------- -----------------
107272 61927 03/01/1989 12/31/1992
107272 61930 01/01/1993 09/04/1998
107272 61928 9/5/98 12/31/98
107272 61928 1/1/99 6/30/99
107272 61928 7/1/99 12/31/99
107272 61928 1/1/00 10/31/00
257668 120716 10/23/95 12/31/95
257668 120716 1/1/96 5/31/96
257668 120716 6/1/96 12/31/96
257668 120716 1/1/97 12/31/1998
257668 120716 1/1/99 12/31/99
257668 120716 1/1/00 12/31/00
*/
CREATE TABLE [overlap] (
[cdw_id] [bigint] IDENTITY (1, 1) NOT NULL ,
[member_num] [varchar] (25) NULL,
[contract_num] [int] NOT NULL ,
[enrollment_begin_dt] [datetime] NOT NULL ,
[enrollment_end_dt] [datetime] NULL ,
)
INSERT INTO overlap values ('107272', 61930, '3/1/89', '12/31/92')
INSERT INTO overlap values ('107272', 61927, '4/1/90', '12/31/92')
INSERT INTO overlap values ('107272', 61928, '1/1/93', '12/31/93')
INSERT INTO overlap values ('107272', 61930, '1/1/93', '9/4/98')
INSERT INTO overlap values ('107272', 61928, '1/1/94', '12/31/95')
INSERT INTO overlap values ('107272', 61928, '1/1/96', '2/29/96')
INSERT INTO overlap values ('107272', 61928, '3/1/96', '12/31/96')
INSERT INTO overlap values ('107272', 61928, '1/1/97', '9/4/98')
INSERT INTO overlap values ('107272', 61928, '9/5/98', '12/31/98')
INSERT INTO overlap values ('107272', 61928, '1/1/99', '6/30/99')
INSERT INTO overlap values ('107272', 61928, '7/1/99', '12/31/99')
INSERT INTO overlap values ('107272', 61928, '1/1/00', '10/31/00')
INSERT INTO overlap values ('257668', 120716, '10/23/95', '12/31/95')
INSERT INTO overlap values ('257668', 120716, '1/1/96', '5/31/96')
INSERT INTO overlap values ('257668', 120716, '6/1/96', '12/31/96')
INSERT INTO overlap values ('257668', 120716, '1/1/97', '6/16/97')
INSERT INTO overlap values ('257668', 161507, '1/1/97', '6/13/98')
INSERT INTO overlap values ('257668', 120716, '6/17/97', '6/30/97')
INSERT INTO overlap values ('257668', 120716, '7/1/97', '10/31/97')
INSERT INTO overlap values ('257668', 120716, '11/1/97', '12/31/97')
INSERT INTO overlap values ('257668', 120716, '1/1/98', '12/31/98')
INSERT INTO overlap values ('257668', 120716, '1/1/99', '12/31/99')
INSERT INTO overlap values ('257668', 120716, '1/1/00', '12/31/00')
SELECT * FROM overlap
order by member_num, enrollment_begin_dt
DROP TABLE overlap
Steve Kass
2003-07-28 14:41:12 UTC
Permalink
I hope the rest of my post wasn't as bad as the first sentence:
Should be "See below for two solutions." - oops. I looked at
the resulting query plans, and they're both fine on the sample
data. Whether one or the other solution works better probably
depends on how many values you're typically taking the MIN() of.

SK
Post by Steve Kass
Brian,
See below for a two solution. I don't think your choice of contract_num
is completely defined, however. In your sample data, there are two
[...]
Brian Baker [MVP]
2003-07-31 22:04:15 UTC
Permalink
Steve:

OK, I think this will be the final followup. To get rid of my duplicates,
I added contract_num to the view using the method you described in
Solution 1. Then I added contract_num as an additional join criteria on
the query against the view.

Thanks!
Brian Baker
I ended up using method 2 from your post (creating a view and then
joining to it). This seemed to be a little cleaner to me and worked
better since I needed to get about 20 additional fields. We did find 70
records (out of 2.3 million) where there were identical begin and end
dates for a given member_num. I'm working with the user to see if we
can come up with a "tie-breaker" for this situation.
Thanks again!
Brian Baker
[snip]
Post by Steve Kass
If you need a bunch of additional columns, you can use
the same trick, or you can do what I put as a second query
below, which my former solution as a view to join against for
the additional columns. The second query risks returning extra
rows if the tie-breaking criteria don't resolve a tie with too many
identical values.
[snip]
--
Microsoft MVP -- ASP / ASP.NET
Please post responses to the group
Loading...