Discussion:
Regexp: match values that differ by one character
(too old to reply)
Andy Dufresne
2013-09-13 11:52:07 UTC
Permalink
In MSSQL 2008 R2:

create table mytable(
id (bigint),
productname(nvarcharmax),
productcode(nvarcharmax)
)

select id,productname,productcode
from mytable
where productcode like '12345a'

will return:

id,productname,productcode
2121,super cleaner,12345a

Is it possible to include slight variations of that productcode so that it returns any other record where the value differs by one character?

Example, can I get it to also return productcode = 19345a or 12335a?

id,productname,productcode
2121,super cleaner,12345a
4321,really good cleaner,19345a
4343,wow cleaner,12335a

I don't care what the postion of the differing values are, so 12375a, 42345a, 12342a, 123455, 11345a, etc would all be acceptable.

I want all records "where productcode like '12345a'"

productcode will always have a length of 6 characters (i.e. '12345a7' or '123456a' is not valid).

I suppose I'm looking for something like a regexp for sql. Can this be done?

Thanks
Erland Sommarskog
2013-09-13 18:45:54 UTC
Permalink
Post by Andy Dufresne
Is it possible to include slight variations of that productcode so that
it returns any other record where the value differs by one character?
Example, can I get it to also return productcode = 19345a or 12335a?
To something like this, I would recommend writing a CLR function where
you have access to the RegEx classes in .Net. This is not suitable to do
in T-SQL.
--
Erland Sommarskog, Stockholm, ***@sommarskog.se
rpresser
2013-09-17 03:44:30 UTC
Permalink
Post by Andy Dufresne
Is it possible to include slight variations of that productcode so that it returns any other record where the value differs by one character?
You may be interested in the Levenshein distance, which is a non-regex solution, but it's a function that returns an integer representing the difference between two strings.

http://www.kodyaz.com/articles/fuzzy-string-matching-using-levenshtein-distance-sql-server.aspx

So if the l-distance is zero, the strings are identical; if it's 1, then only one character is different. That difference could be an insert or a delete, but if you also constrain the strings to be the same length then you can eliminate that possibility.
Babu M
2013-10-09 20:11:30 UTC
Permalink
Post by Andy Dufresne
create table mytable(
id (bigint),
productname(nvarcharmax),
productcode(nvarcharmax)
)
select id,productname,productcode
from mytable
where productcode like '12345a'
id,productname,productcode
2121,super cleaner,12345a
Is it possible to include slight variations of that productcode so that it returns any other record where the value differs by one character?
Example, can I get it to also return productcode = 19345a or 12335a?
id,productname,productcode
2121,super cleaner,12345a
4321,really good cleaner,19345a
4343,wow cleaner,12335a
I don't care what the postion of the differing values are, so 12375a, 42345a, 12342a, 123455, 11345a, etc would all be acceptable.
I want all records "where productcode like '12345a'"
productcode will always have a length of 6 characters (i.e. '12345a7' or '123456a' is not valid).
I suppose I'm looking for something like a regexp for sql. Can this be done?
Thanks
Loading...