Split string function

--CELKO--

2015-02-18 19:57:18 UTC

This function is really a BASIC program written in the old Sybase T-SQL dialect. Until you learn how to use a declarative language, like SQL, you will keep doing this over and over. We do not use loops and if-then-else control flow logic. There are many versions of the same idea if you google it. Look at using PATINDEX (),if you are determined to keep avoiding learning SQL.

The better answer is that the data going into the database should already be scrubbed. You do that in an input layer in the tiered architecture.

Let me give you a "cut and paste" I use in the SQL Server groups:

1) The dangerous, slow kludge is to use dynamic SQL and admit that any random future user is a better programmer than you are. It is used by Newbies who do not understand SQL or even what a compiled language is. A string is a string; it is a scalar value like any other parameter; it is not code. Again, this is not just an SQL problem; this is a basic misunderstanding of programming principles.

2) Passing a list of parameters to a stored procedure can be done by putting them into a string with a separator. I like to use the traditional comma. Let's assume that you have a whole table full of such parameter lists:

CREATE TABLE InputStrings
(keycol CHAR(10) NOT NULL PRIMARY KEY,
input_string VARCHAR(255) NOT NULL);

INSERT INTO InputStrings
VALUES ('first', '12,34,567,896'),
('second', '312,534,997,896'),
etc.

This will be the table that gets the outputs, in the form of the original key column and one parameter per row.

It makes life easier if the lists in the input strings start and end with a comma. You will need a table of sequential numbers -- a standard SQL programming trick, Now, the query,

CREATE VIEW ParmList (keycol, place, parm)
AS
SELECT keycol,
COUNT(S2.seq), -- reverse order
CAST (SUBSTRING (I1.input_string
FROM S1.seq
FOR MIN(S2.seq) - S1.seq -1)
AS INTEGER)
FROM InputStrings AS I1, Series AS S1, Series AS S2
WHERE SUBSTRING (',' + I1.input_string + ',', S1.seq, 1) = ','
AND SUBSTRING (',' + I1.input_string + ',', S2.seq, 1) = ','
AND S1.seq < S2.seq
GROUP BY I1.keycol, I1.input_string, S1.seq;

The S1 and S2 copies of Series are used to locate bracketing pairs of commas, and the entire set of substrings located between them is extracted and cast as integers in one non-procedural step. The trick is to be sure that the right hand comma of the bracketing pair is the closest one to the first comma. The relative position of each element in the list is given by the value of "place", but it does a count down so you can plan horizontal placement in columns.

This might be faster now:

WITH Commas(keycol, comma_seq, comma_place)
AS
(SELECT I1.keycol, S1.seq,
ROW_NUMBER() OVER (PARTITION BY I1.keycol ORDER BY S1.seq)
FROM InputStrings AS I1, Series AS S1
WHERE SUBSTRING (',' + I1.input_string + ','
FROM S1.seq
FOR 1) = ','
AND S1.seq <= CHARLENGTH (I1.input_string))

SELECT SUBSTRING(',' + I1.input_string + ','
FROM C1.comma_place +1
FOR C2.comma_place - C1.comma_place - 1)
FROM Commas AS C1, Commas AS C2
WHERE C2.comma_seq = C1.comma_seq + 1
AND C1.keycol = C2.keycol;

The idea is to get all the positions of the commas in the CTE and then use (n, n+1) pairs of positions to locate substrings. The hope is that the ROW_NUMBER() is faster than the GROUP BY in the first attempt. Since it is materialized before the body of the query (in theory), there are opportunities for parallelism indexing and other things to speed up the works.

Hey, I can write kludges with the best of them, but I don't. You need to at the very least write a routine to clean out blanks, handle double commas and non-numerics in the strings, take care of floating point and decimal notation, etc. Basically, you must write part of a compiler in SQL. Yeeeech! Or decide that you do not want to have data integrity, which is what most Newbies do in practice altho they do not know it.

A procedural loop is even worse. You have no error checking, no ability to pass local variables or expressions, etc.

CREATE PROCEDURE HomemadeParser(@input_string VARCHAR(8000))
AS
BEGIN
DECLARE @comma_position INTEGER;

CREATE TABLE #Slices
(slice_value INTEGER);

SET @input_string = @input_string + ','; --add sentinel comma
SET @comma_position = CHARINDEX(',', @input_string);

WHILE @comma_position > 1
BEGIN
INSERT INTO #Slices (slice_value)
VALUES(CAST(LEFT(@input_string, (@comma_position - 1)) AS INTEGER));
SET @input_string = RIGHT(@input_string, LEN(@input_string)-@comma_position)
SET @comma_position = CHARINDEX(',', @input_string)
END;
END;

Better answer:
http://www.simple-talk.com/sql/learn-sql-server/values()-and-long-parameter-lists/

http://www.simple-talk.com/sql/learn-sql-server/values()-and-long-parameter-lists---part-ii/

Do this with a long parameter list. You can pass up to 2000+ parameters in T-SQL, which is more than you probably will ever need. The compiler will do all that error checking that the query version and the procedural code simply do not have unless you write a full parser with the standard error codes. You can now pass local variables to your procedure; you can pass other data types and get automatic conversions, etc. In short, this is just good software engineering.

CREATE PROCEDURE LongList
(@p1 INTEGER = NULL,
@p2 INTEGER = NULL,
@p3 INTEGER = NULL,
@p4 INTEGER = NULL,
@p5 INTEGER = NULL)

x IN (SELECT parm
FROM (VALUES (@p1), (@p2), (@p3), (@p4), (@p5)) AS X(parm)
WHERE parm IS NOT NULL;

You get all the advantages of the real compiler and can do all kinds of things with the values.