Discussion:
Selecting rows with repeating values in one column
(too old to reply)
Andy Dufresne
2013-05-29 13:33:12 UTC
Permalink
In SQL Server R2 this query:

SELECT id, code, sequence_number
FROM mytable

returns this data:

id code sequence_number
1 381 0
2 381 1
3 382 0
4 382 1
5 383 0
6 383 1
7 384 0
8 384 1
9 384 1
10 385 0
11 385 1
12 386 0
13 386 1
14 387 0
15 387 1
16 388 0
17 388 1
18 388 1
19 389 0
20 389 1
21 389 1
22 390 0
23 390 1
24 391 0
25 391 1
26 391 1
27 392 0
28 392 1
29 392 1
30 393 0
31 393 1
32 394 0
33 394 1
34 395 0
35 395 1

Analysing the above you see that code 381 has two rows. Those two rows have sequence_numbers of 0 and 1 respectively.

This is repeated until code 384 which has three rows - one row where (sequence_number = 0) but TWO rows where sequence_number = 1

Code 385 and on repeats the origional pattern of 2 rows with sequence_numbers of 0 and 1, until code 388 which has TWO rows where sequence_number = 1 again.
Code 389, 391 and 392 repeat this "1 row for (sequence_number = 0) and two rows for (sequence_number = 1)" pattern.

How do I SELECT the MAX(ID) column for each of the rows where the code and sequence_number columns have TWO or more rows where (sequence_number = 1)?

In other words, I want to SELECT just these rows:

9 384 1
18 388 1
21 389 1
26 391 1
29 392 1

Does that make sense?

Thanks.
Gert-Jan Strik
2013-05-29 14:45:44 UTC
Permalink
Try this:

SELECT MAX(id), code, 1
FROM mytable
WHERE sequence_number = 1
GROUP BY code
HAVING COUNT(*) > 1

--
Gert-Jan
Andy Dufresne
2013-05-30 11:16:46 UTC
Permalink
Thanks.

That query works but is limited to "WHERE sequence_number = 1"

If I now change the SQL Server 2008 R2 data a bit and run the query again:


SELECT id, code, sequence_number
FROM mytable


it now returns this data:


id code sequence_number
1 381 0
2 381 1

3 382 0
4 382 1

5 383 0
6 383 1

7 383 1

8 384 0

9 384 1

10 385 0
11 385 1

12 385 2

13 386 0
14 386 1
15 386 1
16 386 2

17 387 0
18 387 1
19 387 1
20 387 2
21 387 3
22 387 3

23 388 0
24 388 1
25 388 1
26 388 2
27 388 2

28 389 0
29 389 1

How do I SELECT just these rows:

7 383 1
15 386 1
19 387 1
22 387 3
25 388 1
27 388 2

These are the the MAX(id) rows where there is more than one record with the same sequence_number.

I want all the highest id for each unique combination of code and sequence_number.

Thanks.
Erland Sommarskog
2013-05-30 21:14:58 UTC
Permalink
Post by Andy Dufresne
These are the the MAX(id) rows where there is more than one record with
the same sequence_number.
I want all the highest id for each unique combination of code and sequence_number.
WITH ranking AS (
SELECT id, code, sequence_number,
row_no = row_number() OVER(PARTITION BY code
ORDER BY sequence_number DESC, id DESC)
FROM mytable
)
SELECT id, code, sequence_number
FROM ranking
WHERE row_no = 1
--
Erland Sommarskog, Stockholm, ***@sommarskog.se
rpresser
2013-05-31 12:38:49 UTC
Permalink
Post by Erland Sommarskog
WITH ranking AS (
SELECT id, code, sequence_number,
row_no = row_number() OVER(PARTITION BY code
ORDER BY sequence_number DESC, id DESC)
FROM mytable
)
SELECT id, code, sequence_number
FROM ranking
WHERE row_no = 1
Is there any performance difference between your CTE version and this
subquery version?

SELECT id, code, sequence_number
FROM (
SELECT id, code, sequence_number,
row_no = row_number() OVER(PARTITION BY code
ORDER BY sequence_number DESC, id DESC)
FROM mytable
) AS ranking
where row_no = 1
Gert-Jan Strik
2013-05-31 18:00:22 UTC
Permalink
Post by rpresser
Is there any performance difference between your CTE version and this
subquery version?
No.
--
Gert-Jan

Gert-Jan Strik
2013-05-29 14:46:06 UTC
Permalink
Try this:

SELECT MAX(id), code, 1
FROM mytable
WHERE sequence_number = 1
GROUP BY code
HAVING COUNT(*) > 1

--
Gert-Jan
Post by Andy Dufresne
SELECT id, code, sequence_number
FROM mytable
id code sequence_number
1 381 0
2 381 1
3 382 0
4 382 1
5 383 0
6 383 1
7 384 0
8 384 1
9 384 1
10 385 0
11 385 1
12 386 0
13 386 1
14 387 0
15 387 1
16 388 0
17 388 1
18 388 1
19 389 0
20 389 1
21 389 1
22 390 0
23 390 1
24 391 0
25 391 1
26 391 1
27 392 0
28 392 1
29 392 1
30 393 0
31 393 1
32 394 0
33 394 1
34 395 0
35 395 1
Analysing the above you see that code 381 has two rows. Those two rows have sequence_numbers of 0 and 1 respectively.
This is repeated until code 384 which has three rows - one row where (sequence_number = 0) but TWO rows where sequence_number = 1
Code 385 and on repeats the origional pattern of 2 rows with sequence_numbers of 0 and 1, until code 388 which has TWO rows where sequence_number = 1 again.
Code 389, 391 and 392 repeat this "1 row for (sequence_number = 0) and two rows for (sequence_number = 1)" pattern.
How do I SELECT the MAX(ID) column for each of the rows where the code and sequence_number columns have TWO or more rows where (sequence_number = 1)?
9 384 1
18 388 1
21 389 1
26 391 1
29 392 1
Does that make sense?
Thanks.
Andy Dufresne
2013-05-30 08:24:06 UTC
Permalink
Try this: SELECT MAX(id), code, 1 FROM mytable WHERE sequence_number = 1 GROUP BY code HAVING COUNT(*) > 1 -- Gert-Jan Andy Dufresne wrote: > In SQL Server R2 this query: > > SELECT id, code, sequence_number > FROM mytable > > returns this data: > > id code sequence_number > 1 381 0 > 2 381 1 > 3 382 0 > 4 382 1 > 5 383 0 > 6 383 1 > 7 384 0 > 8 384 1 > 9 384 1 > 10 385 0 > 11 385 1 > 12 386 0 > 13 386 1 > 14 387 0 > 15 387 1 > 16 388 0 > 17 388 1 > 18 388 1 > 19 389 0 > 20 389 1 > 21 389 1 > 22 390 0 > 23 390 1 > 24 391 0 > 25 391 1 > 26 391 1 > 27 392 0 > 28 392 1 > 29 392 1 > 30 393 0 > 31 393 1 > 32 394 0 > 33 394 1 > 34 395 0 > 35 395 1 > > Analysing the above you see that code 381 has two rows. Those two rows have sequence_numbers of 0 and 1 respectively. > > This is repeated until code 384 which has three rows - one row where (sequence_number = 0) but TWO rows where sequence_number = 1 > > Code 385 and on repeats the origional pattern of 2 rows with sequence_numbers of 0 and 1, until code 388 which has TWO rows where sequence_number = 1 again. > Code 389, 391 and 392 repeat this "1 row for (sequence_number = 0) and two rows for (sequence_number = 1)" pattern. > > How do I SELECT the MAX(ID) column for each of the rows where the code and sequence_number columns have TWO or more rows where (sequence_number = 1)? > > In other words, I want to SELECT just these rows: > > 9 384 1 > 18 388 1 > 21 389 1 > 26 391 1 > 29 392 1 > > Does that make sense? > > Thanks.
Thanks Gert-Jan.

What is the purpose of the "1" in that select: SELECT MAX(id), code, 1 ...
Andy Dufresne
2013-05-30 08:25:06 UTC
Permalink
Thanks Gert-Jan.

What is the purpose of the "1" in that select: SELECT MAX(id), code, 1 ...
Gert-Jan Strik
2013-05-30 17:49:16 UTC
Permalink
Post by Andy Dufresne
Thanks Gert-Jan.
What is the purpose of the "1" in that select: SELECT MAX(id), code, 1 ...
You asked for this output:
9 384 1
18 388 1
21 389 1
26 391 1
29 392 1

The "1" serves no other purpose than to add this 3rd column with a 1 in it for
every row.

To answer your other question, if you want a generic duplicates selection, you
can use:

SELECT MAX(id) AS max_id, code, sequence_number
FROM mytable
GROUP BY code, sequence_number
HAVING COUNT(*) > 1
ORDER BY max_id

--
Gert-Jan
Andy Dufresne
2013-05-30 11:06:58 UTC
Permalink
Thanks.

That query works but is limited to "WHERE sequence_number = 1"

If I now change the SQL Server 2008 R2 data a bit and run the query again:

SELECT id, code, sequence_number
FROM mytable

it now returns this data:

id code sequence_number
1 381 0
2 381 1

3 382 0
4 382 1

5 383 0
6 383 1
7 383 1

8 384 0
9 384 1

10 385 0
11 385 1
12 385 2

13 386 0
14 386 1
15 386 1
16 386 2

17 387 0
18 387 1
19 387 1
20 387 2
21 387 1

22 388 0
23 388 1
24 388 1
25 388 2
26 388 2

27 389 0
28 389 1

How do I SELECT just these rows:

7 383 1
15 386 1
21 387 1
24 388 1
26 388 2

These are the the MAX(id) rows where there is more than one record with the same sequence_number.

I want all the highest id for each unique combination of code and sequence_number.

Thanks.
Continue reading on narkive:
Loading...