What is Checksum?
Checksum is a function available in sql server 2005 and sql server 2008. It is intended to build a hash index based on an expression or column list.
Determining if two rows or expressions are equal can be a difficult and resource intensive process. This can be the case where the update was conditional based on all of the columns being equal or not for a specific row. Without checksums it is a long process of using innerjoins to identify the exact row to update in the update statements.
When is Checksum used?
When you need to compare the unique characteristics of an expression, columns or a table consider using the Checksum function.
When you have to update a row where many columns are compared to determine if the data is unique, use CHECKSUM function to build a unique value using checksum function and then compare the CHECKSUM values.
What *IS* a HASHBYTE?
HASHBYTE is a function in SQL Server which can be used for creating hash values . Hash value is nothing but a string generated as per the hash algorithm (MDx, SHAx) and the input. All of these algorithms have their own advantages and disadvantages.
Why would I use it?
Most of the times I use this function for table lookups to detect if there is any change in the data.For Ex. If there is any Update in a row, the value created using the function can detect it as the hash value also gets changed when data in a row gets changed.
What the heck are MD2, MD4, MD5, SHA and SHA1?
All of these are Hash Algorithms. These are big topics and also you can find alot of whitepapers if you want to go in detail.
MD2 Algorithm : This algorithm is optimized for 8 bit computers.16 bytes hash value will be created by using this algorithm.
MD4 and MD5 are optimized for 32 bit machines. MD5 is slower than MD4 but MD5 is more secure to use.
SHAx algorithms are more secure than MDx (Ofcourse there are controversies on which one to be used). Performance wise I havent found any difference between SHA1 and MD5, though I have heard people have realized the difference. Personally I would go for SHA1.
SHA2 is not still supported in SQL Server 2008 R2.
what is the advantage of using Hash bytes over SCD and LOOKUP tables in SSIS???
SCD performs row by row comparisons between the source data and target dimension table. By using SCD we will lose the advantage of set based processing and the fast loading process cannot be achieved by SCD. Implementing Hash function will be much faster than using SCD in SSIS. But there can be a problem using Hash functions. I have seen people reverting back to SCD from Hash Functions as hash functions sometime produce duplicate records. It all depends on the data you are working on.
Checksum is a function available in sql server 2005 and sql server 2008. It is intended to build a hash index based on an expression or column list.
Determining if two rows or expressions are equal can be a difficult and resource intensive process. This can be the case where the update was conditional based on all of the columns being equal or not for a specific row. Without checksums it is a long process of using innerjoins to identify the exact row to update in the update statements.
When is Checksum used?
When you need to compare the unique characteristics of an expression, columns or a table consider using the Checksum function.
When you have to update a row where many columns are compared to determine if the data is unique, use CHECKSUM function to build a unique value using checksum function and then compare the CHECKSUM values.
What *IS* a HASHBYTE?
HASHBYTE is a function in SQL Server which can be used for creating hash values . Hash value is nothing but a string generated as per the hash algorithm (MDx, SHAx) and the input. All of these algorithms have their own advantages and disadvantages.
Why would I use it?
Most of the times I use this function for table lookups to detect if there is any change in the data.For Ex. If there is any Update in a row, the value created using the function can detect it as the hash value also gets changed when data in a row gets changed.
What the heck are MD2, MD4, MD5, SHA and SHA1?
All of these are Hash Algorithms. These are big topics and also you can find alot of whitepapers if you want to go in detail.
MD2 Algorithm : This algorithm is optimized for 8 bit computers.16 bytes hash value will be created by using this algorithm.
MD4 and MD5 are optimized for 32 bit machines. MD5 is slower than MD4 but MD5 is more secure to use.
SHAx algorithms are more secure than MDx (Ofcourse there are controversies on which one to be used). Performance wise I havent found any difference between SHA1 and MD5, though I have heard people have realized the difference. Personally I would go for SHA1.
SHA2 is not still supported in SQL Server 2008 R2.
what is the advantage of using Hash bytes over SCD and LOOKUP tables in SSIS???
SCD performs row by row comparisons between the source data and target dimension table. By using SCD we will lose the advantage of set based processing and the fast loading process cannot be achieved by SCD. Implementing Hash function will be much faster than using SCD in SSIS. But there can be a problem using Hash functions. I have seen people reverting back to SCD from Hash Functions as hash functions sometime produce duplicate records. It all depends on the data you are working on.
Checksum | Hashbytes |
Faster but can produce lot of duplicate values | Slower but efficient than checksum |
Returns an int value | Returns a varbinary(8000) |
Microsoft does NOT recommend using CHECKSUM for change detection purposes | Use Hashbytes for change detection purposes |
No comments:
Post a Comment