SQL Challenge: Collapsing overlapping date ranges

Discussion:

(too old to reply)

Brian Baker [MVP]

2003-07-25 22:12:11 UTC

Hello all,

I have a little challenge that I've been struggling with. The scenario is
below. Note that the source table has over 2 million rows. The first
solution we tried copied the table to a temporary table and then used a
cursor to iterate over the rows and delete or modify from the temporary
table as necessary. This worked okay for a small set of the data but the
performance was extremely poor when ran against the whole table.

Thanks for taking a look and please let me know if you need more info!

/*

Scenario:
I have a table that lists enrollment periods for an insurance company.
It is possible for a member to be enrolled in more than one contract on
a given date. I need to create a query on this table that has no
overlapping date periods for a given member_num.

The following are some assumptions / notes:

1. If the enrollment_begin_dt and enrollment_end_dt fall between the
enrollment_begin_dt and enrollment_end_dt of another row for the same
member_num, the row with the earliest enrollment_begin_dt takes
precedence. If both rows have the same enrollment_begin_dt, the row with
the latest enrollment_end_dt takes precedence.

2. If there is an overlap where the dates do not fall in the range then
the earliest enrollment_begin_dt value has to be retained and the latest
enrollment_end_dt value is retained. All other columns in the resulting
row are taken from the source row with the latest enrollment_end_dt.

3. Rows are considered overlapping if the enrollment_end_dt of one record
= the enrollment_begin_dt of the next record.

4. Only overlapping rows should be modified. Rows with contiguous dates
are OK.

5. The actual table has several more columns that will be present in the
query, but they should not have an impact on the date logic used.

Given the sample data below, here is what I need as the results:

member_num contract_num enrollment_begin_dt enrollment_end_dt
---------- ------------ ------------------- -----------------
107272 61927 03/01/1989 12/31/1992
107272 61930 01/01/1993 09/04/1998
107272 61928 9/5/98 12/31/98
107272 61928 1/1/99 6/30/99
107272 61928 7/1/99 12/31/99
107272 61928 1/1/00 10/31/00

257668 120716 10/23/95 12/31/95
257668 120716 1/1/96 5/31/96
257668 120716 6/1/96 12/31/96
257668 120716 1/1/97 12/31/1998
257668 120716 1/1/99 12/31/99
257668 120716 1/1/00 12/31/00

*/

CREATE TABLE [overlap] (
[cdw_id] [bigint] IDENTITY (1, 1) NOT NULL ,
[member_num] [varchar] (25) NULL,
[contract_num] [int] NOT NULL ,
[enrollment_begin_dt] [datetime] NOT NULL ,
[enrollment_end_dt] [datetime] NULL ,
)

INSERT INTO overlap values ('107272', 61930, '3/1/89', '12/31/92')
INSERT INTO overlap values ('107272', 61927, '4/1/90', '12/31/92')
INSERT INTO overlap values ('107272', 61928, '1/1/93', '12/31/93')
INSERT INTO overlap values ('107272', 61930, '1/1/93', '9/4/98')
INSERT INTO overlap values ('107272', 61928, '1/1/94', '12/31/95')
INSERT INTO overlap values ('107272', 61928, '1/1/96', '2/29/96')
INSERT INTO overlap values ('107272', 61928, '3/1/96', '12/31/96')
INSERT INTO overlap values ('107272', 61928, '1/1/97', '9/4/98')
INSERT INTO overlap values ('107272', 61928, '9/5/98', '12/31/98')
INSERT INTO overlap values ('107272', 61928, '1/1/99', '6/30/99')
INSERT INTO overlap values ('107272', 61928, '7/1/99', '12/31/99')
INSERT INTO overlap values ('107272', 61928, '1/1/00', '10/31/00')

INSERT INTO overlap values ('257668', 120716, '10/23/95', '12/31/95')
INSERT INTO overlap values ('257668', 120716, '1/1/96', '5/31/96')
INSERT INTO overlap values ('257668', 120716, '6/1/96', '12/31/96')
INSERT INTO overlap values ('257668', 120716, '1/1/97', '6/16/97')
INSERT INTO overlap values ('257668', 161507, '1/1/97', '6/13/98')
INSERT INTO overlap values ('257668', 120716, '6/17/97', '6/30/97')
INSERT INTO overlap values ('257668', 120716, '7/1/97', '10/31/97')
INSERT INTO overlap values ('257668', 120716, '11/1/97', '12/31/97')
INSERT INTO overlap values ('257668', 120716, '1/1/98', '12/31/98')
INSERT INTO overlap values ('257668', 120716, '1/1/99', '12/31/99')
INSERT INTO overlap values ('257668', 120716, '1/1/00', '12/31/00')

SELECT * FROM overlap
order by member_num, enrollment_begin_dt

DROP TABLE overlap

Brian Baker

--
Microsoft MVP -- ASP / ASP.NET
Please post responses to the group

Joe Celko

2003-07-26 04:41:40 UTC

Permalink

I have done this one a fews times in the past, but I don't have the code
in front of me and I need to get started on the weekend. Let me give
you an outline, without checking the code.

1) Delete all durations properly contained in another; be careful to
avoid the case where the duration contains itself.

DELETE FROM Overlaps
WHERE EXISTS
(SELECT *
FROM Overlaps AS O1
WHERE O1.begin_dt BETWEEN Overlaps.begin_dt
AND Overlaps.end_dt
AND O1.end_dt BETWEEN Overlaps.begin_dt
AND Overlaps.end_dt
AND (O1.begin_dt <> O2.begin_dt
OR O1.end_dt <> O2.end_dt))

2) Build a calendar table -- a list of dates in the range you need.

3) For each account, pick all possible pairs of start and end dates from
different durations. Do not compare a duration to itself.

4) Look for a calendar date which is not in a duration for that account
that falls between the pairs of start and end dates. Reject it. If all
days are covered by one or more durations, then that pairs of start and
end dates is complete.

5) You can do either do a min-max or stuff all such pairs into the table
and run step #1 again.

--CELKO--
===========================
Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, datatypes, etc. in your
schema are.

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!

Steve Kass

2003-07-26 05:55:05 UTC

Permalink

Brian,

I can't guarantee that this will handle all possibilities correctly,
and it seems a little too slick, but it might be a start. Check the
results carefully, and if you see cases it doesn't handle, let me know.
The indexing I've added should make it reasonably efficient, even
if it gives the wrong answers ;)

CREATE TABLE [overlap] (
[cdw_id] [bigint] IDENTITY (1, 1) NOT NULL ,
[member_num] [varchar] (25) NOT NULL,
[contract_num] [int] NOT NULL ,
[enrollment_begin_dt] [datetime] NOT NULL ,
[enrollment_end_dt] [datetime] NULL ,
PRIMARY KEY (member_num, enrollment_begin_dt, cdw_id)
)
create index overlap_nci1 on overlap(member_num, enrollment_end_dt)

INSERT INTO overlap values ('107272', 61930, '3/1/89', '12/31/92')
INSERT INTO overlap values ('107272', 61927, '4/1/90', '12/31/92')
INSERT INTO overlap values ('107272', 61928, '1/1/93', '12/31/93')
INSERT INTO overlap values ('107272', 61930, '1/1/93', '9/4/98')
INSERT INTO overlap values ('107272', 61928, '1/1/94', '12/31/95')
INSERT INTO overlap values ('107272', 61928, '1/1/96', '2/29/96')
INSERT INTO overlap values ('107272', 61928, '3/1/96', '12/31/96')
INSERT INTO overlap values ('107272', 61928, '1/1/97', '9/4/98')
INSERT INTO overlap values ('107272', 61928, '9/5/98', '12/31/98')
INSERT INTO overlap values ('107272', 61928, '1/1/99', '6/30/99')
INSERT INTO overlap values ('107272', 61928, '7/1/99', '12/31/99')
INSERT INTO overlap values ('107272', 61928, '1/1/00', '10/31/00')

INSERT INTO overlap values ('257668', 120716, '10/23/95', '12/31/95')
INSERT INTO overlap values ('257668', 120716, '1/1/96', '5/31/96')
INSERT INTO overlap values ('257668', 120716, '6/1/96', '12/31/96')
INSERT INTO overlap values ('257668', 120716, '1/1/97', '6/16/97')
INSERT INTO overlap values ('257668', 161507, '1/1/97', '6/13/98')
INSERT INTO overlap values ('257668', 120716, '6/17/97', '6/30/97')
INSERT INTO overlap values ('257668', 120716, '7/1/97', '10/31/97')
INSERT INTO overlap values ('257668', 120716, '11/1/97', '12/31/97')
INSERT INTO overlap values ('257668', 120716, '1/1/98', '12/31/98')
INSERT INTO overlap values ('257668', 120716, '1/1/99', '12/31/99')
INSERT INTO overlap values ('257668', 120716, '1/1/00', '12/31/00')

SELECT
O1.member_num,
O1.enrollment_begin_dt,
MIN(O2.enrollment_end_dt) as enrollment_end_dt
FROM overlap O1, overlap O2
WHERE O2.enrollment_end_dt >= O1.enrollment_end_dt
AND NOT EXISTS (
SELECT * FROM overlap O3
WHERE O3.enrollment_begin_dt < O1.enrollment_begin_dt
AND O3.enrollment_end_dt >= O1.enrollment_begin_dt
AND O3.member_num = O1.member_num
) AND NOT EXISTS (
SELECT * FROM overlap O3
WHERE O3.enrollment_end_dt > O2.enrollment_end_dt
AND O3.enrollment_begin_dt <= O2.enrollment_end_dt
AND O3.member_num = O2.member_num
)
group by O1.member_num, O1.enrollment_begin_dt
order by O1.member_num, O1.enrollment_begin_dt
go

DROP TABLE overlap

Post by Brian Baker [MVP]
Hello all,
I have a little challenge that I've been struggling with. The scenario is
below. Note that the source table has over 2 million rows. The first
solution we tried copied the table to a temporary table and then used a
cursor to iterate over the rows and delete or modify from the temporary
table as necessary. This worked okay for a small set of the data but the
performance was extremely poor when ran against the whole table.
Thanks for taking a look and please let me know if you need more info!
/*
I have a table that lists enrollment periods for an insurance company.
It is possible for a member to be enrolled in more than one contract on
a given date. I need to create a query on this table that has no
overlapping date periods for a given member_num.
1. If the enrollment_begin_dt and enrollment_end_dt fall between the
enrollment_begin_dt and enrollment_end_dt of another row for the same
member_num, the row with the earliest enrollment_begin_dt takes
precedence. If both rows have the same enrollment_begin_dt, the row with
the latest enrollment_end_dt takes precedence.
2. If there is an overlap where the dates do not fall in the range then
the earliest enrollment_begin_dt value has to be retained and the latest
enrollment_end_dt value is retained. All other columns in the resulting
row are taken from the source row with the latest enrollment_end_dt.
3. Rows are considered overlapping if the enrollment_end_dt of one record
= the enrollment_begin_dt of the next record.
4. Only overlapping rows should be modified. Rows with contiguous dates
are OK.
5. The actual table has several more columns that will be present in the
query, but they should not have an impact on the date logic used.
member_num contract_num enrollment_begin_dt enrollment_end_dt
---------- ------------ ------------------- -----------------
107272 61927 03/01/1989 12/31/1992
107272 61930 01/01/1993 09/04/1998
107272 61928 9/5/98 12/31/98
107272 61928 1/1/99 6/30/99
107272 61928 7/1/99 12/31/99
107272 61928 1/1/00 10/31/00
257668 120716 10/23/95 12/31/95
257668 120716 1/1/96 5/31/96
257668 120716 6/1/96 12/31/96
257668 120716 1/1/97 12/31/1998
257668 120716 1/1/99 12/31/99
257668 120716 1/1/00 12/31/00
*/
CREATE TABLE [overlap] (
[cdw_id] [bigint] IDENTITY (1, 1) NOT NULL ,
[member_num] [varchar] (25) NULL,
[contract_num] [int] NOT NULL ,
[enrollment_begin_dt] [datetime] NOT NULL ,
[enrollment_end_dt] [datetime] NULL ,
)
INSERT INTO overlap values ('107272', 61930, '3/1/89', '12/31/92')
INSERT INTO overlap values ('107272', 61927, '4/1/90', '12/31/92')
INSERT INTO overlap values ('107272', 61928, '1/1/93', '12/31/93')
INSERT INTO overlap values ('107272', 61930, '1/1/93', '9/4/98')
INSERT INTO overlap values ('107272', 61928, '1/1/94', '12/31/95')
INSERT INTO overlap values ('107272', 61928, '1/1/96', '2/29/96')
INSERT INTO overlap values ('107272', 61928, '3/1/96', '12/31/96')
INSERT INTO overlap values ('107272', 61928, '1/1/97', '9/4/98')
INSERT INTO overlap values ('107272', 61928, '9/5/98', '12/31/98')
INSERT INTO overlap values ('107272', 61928, '1/1/99', '6/30/99')
INSERT INTO overlap values ('107272', 61928, '7/1/99', '12/31/99')
INSERT INTO overlap values ('107272', 61928, '1/1/00', '10/31/00')
INSERT INTO overlap values ('257668', 120716, '10/23/95', '12/31/95')
INSERT INTO overlap values ('257668', 120716, '1/1/96', '5/31/96')
INSERT INTO overlap values ('257668', 120716, '6/1/96', '12/31/96')
INSERT INTO overlap values ('257668', 120716, '1/1/97', '6/16/97')
INSERT INTO overlap values ('257668', 161507, '1/1/97', '6/13/98')
INSERT INTO overlap values ('257668', 120716, '6/17/97', '6/30/97')
INSERT INTO overlap values ('257668', 120716, '7/1/97', '10/31/97')
INSERT INTO overlap values ('257668', 120716, '11/1/97', '12/31/97')
INSERT INTO overlap values ('257668', 120716, '1/1/98', '12/31/98')
INSERT INTO overlap values ('257668', 120716, '1/1/99', '12/31/99')
INSERT INTO overlap values ('257668', 120716, '1/1/00', '12/31/00')
SELECT * FROM overlap
order by member_num, enrollment_begin_dt
DROP TABLE overlap
Brian Baker

Brian Baker [MVP]

2003-07-28 13:33:25 UTC

Permalink

Hi Steve:

You are correct with the statement in the first paragraph--the only factor
in collapsing the date ranges is the member_num. However, I need to pull
contract_num (as well as several other non-key columns that exist in my
actual table but not in this example) out in my query. In the case where
there are overlapping date values, the values for these other columns
should come from the row with the latest enrollment_end_dt. (See note 2
from the original post.)

You are also correct that the query posed in your last message will not
get the results I'm looking for. Here is the data I'm expecting for
member_num = 107272:

member_num contract_num enrollment_begin_dt enrollment_end_dt
---------- ------------ ------------------- -----------------
107272 61927 03/01/1989 12/31/1992
107272 61930 01/01/1993 09/04/1998
107272 61928 9/5/98 12/31/98
107272 61928 1/1/99 6/30/99
107272 61928 7/1/99 12/31/99
107272 61928 1/1/00 10/31/00

Notice that the enrollment_begin_dt comes from the first row that was
inserted, while the member_num, contract_num, and enrollment_end_dt come
from the second row. Does this explain it well enough? Would it help to
include some additional columns in the sample dataset?

Thanks!
Brian Baker

Post by Steve Kass
Brian,
Maybe I didn't understand the original problem. I ignored
contract_num because I thought you wanted date ranges for
members regardless of the particular contract number(s) in
place during those date ranges. The query I suggested will
collapse ranges for different contract numbers into one range
as long as the pieces overlap.
If you only want to collapse ranges when the contract
numbers are the same, I would think it will work to add
contract_num everywhere I have member_num (as if the member
is now (member_num, contract_num), and if it doesn't, then
there's something I don't understand.
If you want to collapse ranges that have pieces corresponding
to different contract_num values but somehow list the relevant
contract numbers, you'll have to be clearer on what you want the
results to look like.
Ccan you show what you want as a result from your sample
SELECT
O1.member_num,
O1.contract_num,
O1.enrollment_begin_dt,
MIN(O2.enrollment_end_dt) as enrollment_end_dt
FROM overlap O1, overlap O2
WHERE
O1.member_num = O2.member_num
and O1.contract_num = O2.contract_num
and O2.enrollment_end_dt >= O1.enrollment_end_dt
AND NOT EXISTS (
SELECT * FROM overlap O3
WHERE O3.enrollment_begin_dt < O1.enrollment_begin_dt
AND O3.enrollment_end_dt >= O1.enrollment_begin_dt
AND O3.member_num = O1.member_num
and O3.contract_num = O1.contract_num
) AND NOT EXISTS (
SELECT * FROM overlap O3
WHERE O3.enrollment_end_dt > O2.enrollment_end_dt
AND O3.enrollment_begin_dt <= O2.enrollment_end_dt
AND O3.member_num = O2.member_num
and O3.contract_num = O2.contract_num
)
group by O1.member_num, O1.enrollment_begin_dt, O1.contract_num
order by O1.member_num, O1.contract_num, O1.enrollment_begin_dt
go*
Whether it's what you want or not, this query should
return more rows than before. In the original
query I posted, for example, there are few results
for member 107272, since contract *61930 spans many
of the rows for that customer. But if separate
contracts can't be collapsed, all the non-overlapping
pieces of contract 61928 now must be listed separately
in the result.
Steve
*
*

Thanks for offering your code. I did have to add "o1.member_num =
o2.member_num" to the where clause as seen below. That seemed to

return