Discussion:
table join on varbinary column
(too old to reply)
j***@yahoo.com
2008-08-19 07:46:34 UTC
Permalink
Hi there,
Anyone have any idea if it cause performance issue to inner join 2
tables with clustered indexed column which is varbinary data type?

CREATE TABLE [dbo].[table1](
[ID] [int] IDENTITY(1,1) NOT NULL,
----//////others columns here,
[MD5Key] [varbinary](8000) NOT NULL
CONSTRAINT [PK_rptFctCpCpk_201] PRIMARY KEY CLUSTERED
(
[MD5Key] ASC,
[ID] ASC
) ON [PRIMARY]
) ON [PRIMARY]

create table #table2(MID int, MD5Key2 varbinary(8000))
CREATE CLUSTERED INDEX [#tmpMD5Key2_IDX01] on [dbo].[#tempMD5Key2]
(MD5Key2)


Estimated there is 20 millions record in table 1 and 200k records in
temp table.

It hit performance issue during execution. May I know if it is
practical to use the varbinary column as key to join 2 tables?

The idea behind is to encode the data in MD5 format using hashbytes()
and the temp table was constructed dynamically to encode all search
criteria using hashbytes as well, the 2 tables are joined to find the
target records and output to report.

Please give advice if it is possible to use the varbinary column as
key to join two huge tables.

Many thanks!


j4m35bond
Eric Isaacs
2008-08-19 08:47:13 UTC
Permalink
Post by j***@yahoo.com
Please give advice if it is possible to use the varbinary column as
key to join two huge tables.
Yes it is possible, but it might not be your best option. The longer
those keys are, the more expensive the join will be.

Does the MD5Key value just contain a 128 bit value that represents the
key as your field name suggests, or is it the encrypted data itself?
If it's just the key, and the key is always 128 bits, then a
BINARY(128) value would be a better and more efficient choice than a
VARBINARY(8000).

-Eric Isaacs
j***@yahoo.com
2008-08-19 09:36:39 UTC
Permalink
Post by j***@yahoo.com
Please give advice if it is possible to use the varbinary column as
key to join two huge tables.
Yes it is possible, but it might not be your best option.  The longer
those keys are, the more expensive the join will be.
Does the MD5Key value just contain a 128 bit value that represents the
key as your field name suggests, or is it the encrypted data itself?
If it's just the key, and the key is always 128 bits, then a
BINARY(128) value would be a better and more efficient choice than a
VARBINARY(8000).
-Eric Isaacs
Eric,
I set it to 8000 because I read the documentation from MSDN, it shows
that the output of hashbytes function will be varbinary(8000)

select top 10000 datalength(md5key) from table1
select top 10000 len(md5key) from table1


both the query return "16" for all records... So can I assume the
hashbytes() always return 128bit? Is it safe to set to 128bit
(16bytes)? I need to take into account all possible input values for
the encoded string.
Eric Isaacs
2008-08-19 16:32:36 UTC
Permalink
Post by j***@yahoo.com
both the query return "16" for all records... So can I assume the
hashbytes() always return 128bit? Is it safe to set to 128bit
(16bytes)? I need to take into account all possible input values for
the encoded string.
Yes, the hashbytes key is aways 128 bits, so yes, that would be
BINARY(16), which is even better yet.

-Eric Isaacs
j***@yahoo.com
2008-08-20 03:50:43 UTC
Permalink
Post by Eric Isaacs
Post by j***@yahoo.com
both the query return "16" for all records... So can I assume the
hashbytes() always return 128bit? Is it safe to set to 128bit
(16bytes)? I need to take into account all possible input values for
the encoded string.
Yes, the hashbytes key is aways 128 bits, so yes, that would be
BINARY(16), which is even better yet.
-Eric Isaacs
Alright then.
The information below might be misleading. It shows return value of
varbinary (maximum 8000 bytes)

http://msdn.microsoft.com/en-us/library/ms174415.aspx
Eric Isaacs
2008-08-20 05:40:30 UTC
Permalink
Post by j***@yahoo.com
Alright then.
The information below might be misleading. It shows return value of
varbinary (maximum 8000 bytes)
http://msdn.microsoft.com/en-us/library/ms174415.aspx
Yes, but that's because it can be used with any hash algorithm. If
you're using MD5, it will return a 128 bit/16 byte key...

http://en.wikipedia.org/wiki/MD5

You can make it VARBINARY(8000) bytes, but only the first 16 bytes/128
bits will be used with when it returns the MD5 key.

-Eric Isaacs
j***@yahoo.com
2008-08-20 10:30:21 UTC
Permalink
Post by j***@yahoo.com
Alright then.
The information below might be misleading. It shows return value of
varbinary (maximum 8000 bytes)
http://msdn.microsoft.com/en-us/library/ms174415.aspx
Yes, but that's because it can be used with any hash algorithm.  If
you're using MD5, it will return a 128 bit/16 byte key...
http://en.wikipedia.org/wiki/MD5
You can make it VARBINARY(8000) bytes, but only the first 16 bytes/128
bits will be used with when it returns the MD5 key.
-Eric Isaacs
Thanks for the info.

My query is still slow when doing a dynamic sql through table_1-31
join with the md5key temp table. =(

j***@yahoo.com
2008-08-19 09:42:09 UTC
Permalink
Post by j***@yahoo.com
Please give advice if it is possible to use the varbinary column as
key to join two huge tables.
Yes it is possible, but it might not be your best option.  The longer
those keys are, the more expensive the join will be.
Does the MD5Key value just contain a 128 bit value that represents the
key as your field name suggests, or is it the encrypted data itself?
If it's just the key, and the key is always 128 bits, then a
BINARY(128) value would be a better and more efficient choice than a
VARBINARY(8000).
-Eric Isaacs
For MD5Key in my table, I used hashbytes('MD5', 'some very long string
up to 8000bytes') function to encode the data.


select top 10000 datalength(md5key) from table1
select top 10000 len(md5key) from table1

However both the query return 128bit. I am not familiar with the
hashbytes functions. Documents said it return varbinary8000 but so far
I can see only 128bit which is 16bytes.

Any idea?
Roy Harvey (SQL Server MVP)
2008-08-19 13:08:49 UTC
Permalink
Post by j***@yahoo.com
For MD5Key in my table, I used hashbytes('MD5', 'some very long string
up to 8000bytes') function to encode the data.
select top 10000 datalength(md5key) from table1
select top 10000 len(md5key) from table1
However both the query return 128bit. I am not familiar with the
hashbytes functions. Documents said it return varbinary8000 but so far
I can see only 128bit which is 16bytes.
It appears that the length of the actual value returned by HASHBYTES
depends on the algorithm chosen, but not on the value hashed.

SELECT DISTINCT 'MD2', LEN(HASHBYTES('MD2', name)) FROM sysobjects
UNION ALL
SELECT DISTINCT 'MD4', LEN(HASHBYTES('MD4', name)) FROM sysobjects
UNION ALL
SELECT DISTINCT 'MD5', LEN(HASHBYTES('MD5', name)) FROM sysobjects
UNION ALL
SELECT DISTINCT 'SHA', LEN(HASHBYTES('SHA', name)) FROM sysobjects
UNION ALL
SELECT DISTINCT 'SHA1', LEN(HASHBYTES('SHA1', name)) FROM sysobjects

---- -----------
MD2 16
MD4 16
MD5 16
SHA 20
SHA1 20

So I would store the hash in a column with the length matching the
algorithm. And since they are all the same length it would not be
varying length.

Roy Harvey
Beacon Falls, CT
Loading...