Labels

Showing posts with label SSIS. Show all posts
Showing posts with label SSIS. Show all posts

Monday, August 3, 2015

Programatically invoking multiple instances of SSIS package to execute them in parallel

using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.IO;
using System.Data.OleDb;
using System.Threading;
using System.Collections.Generic;

 public void Main()
        {
           
            var totalInstances = Convert.ToInt32(Dts.Variables["User::ParallelExecutionCount"].Value);
            var RowIteratorPkg = Dts.Variables["User::RowIteratorFilePath"].Value.ToString();

            try
            {
                if (File.Exists(RowIteratorPkg))
                {
                    var taskList = new System.Threading.Tasks.Task[totalInstances];

                    for (var i = 0; i < totalInstances; i++)
                    {
                        int runningInstance = i;
                        taskList[i] = System.Threading.Tasks.Task.Factory.StartNew(() => TriggerNewPackageInstance(RowIteratorPkg, runningInstance, totalInstances));
                    }

                    System.Threading.Tasks.Task.WaitAll(taskList);
                Dts.TaskResult = (int)ScriptResults.Success;
            }
                else
                {
                    logException(999, "RowIterator package does not exist in the configured location: " + RowIteratorPkg);
                    Dts.TaskResult = (int)ScriptResults.Failure;
                }
            }
            catch (Exception ex)
            {
                logException(999, "Exception in executing the RowIterator package: " + ex.Message);
                Dts.TaskResult = (int)ScriptResults.Failure;
            }
        }

        private void TriggerNewPackageInstance(string RowIteratorPkg, Int32 runningInstance, Int32 totalInstances)
        {
            try
            {
            var App = new Application();
            var pkg = new Package();
            pkg = App.LoadPackage(RowIteratorPkg, null);
                pkg.Variables["PackageInstanceNumber"].Value = (Int32)runningInstance;
                pkg.Variables["ParallelInstances"].Value = (Int32)totalInstances;
                pkg.Variables["DBConString"].Value = DbConnString;
            pkg.Execute();
        }
            catch (Exception ex)
            {
                logException(999, "Exception in executing the package: " + ex.Message + " Running Instance: " + runningInstance.ToString() + "totalInstance : " + totalInstances.ToString());
            }
        }

Thursday, June 25, 2015

SSIS - How to allow the foreach loop continue its execution even if any of the task inside it fails


  1. Click on Event Handlers.
  2. Click on the Grid icon under variables tab.
  3. Check the Show System Variables option.
  4. Under the variable list search for the system variable called Propagate.
  5. Change its value to False.

SSIS - SQL Data Type Mapping


SSIS Data Type
SSIS Expression
SQL Server
single-byte signed integer
(DT_I1)

two-byte signed integer
(DT_I2)
smallint
four-byte signed integer
(DT_I4)
int
eight-byte signed integer
(DT_I8)
bigint
single-byte unsigned integer
(DT_UI1)
tinyint
two-byte unsigned integer
(DT_UI2)

four-byte unsigned integer
(DT_UI4)

eight-byte unsigned integer
(DT_UI8)

float
(DT_R4)
real
double-precision float
(DT_R8)
float
string
(DT_STR, «length», «code_page»)
char, varchar
Unicode text stream
(DT_WSTR, «length»)
nchar, nvarchar, sql_variant, xml
date
(DT_DATE)
date
Boolean
(DT_BOOL)
bit
numeric
(DT_NUMERIC, «precision», «scale»)
decimal, numeric
decimal
(DT_DECIMAL, «scale»)
decimal
currency
(DT_CY)
smallmoney, money
unique identifier
(DT_GUID)
uniqueidentifier
byte stream
(DT_BYTES, «length»)
binary, varbinary, timestamp
database date
(DT_DBDATE)
date
database time
(DT_DBTIME)

database time with precision
(DT_DBTIME2, «scale»)
time(p)
database timestamp
(DT_DBTIMESTAMP)
datetime, smalldatetime
database timestamp with precision
(DT_DBTIMESTAMP2, «scale»)
datetime2
database timestamp with timezone
(DT_DBTIMESTAMPOFFSET, «scale»)
datetimeoffset(p)
file timestamp
(DT_FILETIME)

image
(DT_IMAGE)
image
text stream
(DT_TEXT, «code_page»)
text
Unicode string
(DT_NTEXT)
ntext

Thursday, August 1, 2013

NULL Handling in SSIS

Using Derived Column Transformation

Samples:
-----------

TRIM([CONVERSION_DATE  ]) == "" ? (DT_DATE)"1/1/1900" : (DT_DATE)TRIM([CONVERSION_DATE  ])

TRIM([CICD RESOURCE_SEQ_NUM]) == "" ? NULL(DT_I4) : (DT_I4)TRIM([CICD RESOURCE_SEQ_NUM])

TRIM([CICD USAGE_RATE_OR_AMOUNT]) == "" ? NULL(DT_NUMERIC,19,4) : (DT_NUMERIC,19,4)TRIM([CICD USAGE_RATE_OR_AMOUNT])

TRIM([BR STANDARD_RATE_FLAG]) == "" ? "-" : (DT_WSTR,1)TRIM([BR STANDARD_RATE_FLAG])

TRIM(ORGANIZATION_ID) == "" ? NULL(DT_I8) : (DT_I8)TRIM(ORGANIZATION_ID)

Wednesday, November 2, 2011

Checksum Vs. Hashbytes

What is Checksum?
Checksum is a function available in sql server 2005 and sql server 2008. It is intended to build a hash index based on an expression or column list.

Determining if two rows or expressions are equal can be a difficult and resource intensive process. This can be the case where the update was conditional based on all of the columns being equal or not for a specific row. Without checksums it is a long process of using innerjoins to identify the exact row to update in the update statements.

When is Checksum used?
When you need to compare the unique characteristics of an expression, columns or a table consider using the Checksum function.

When you have to update a row where many columns are compared to determine if the data is unique, use CHECKSUM function to build a unique value using checksum function and then compare the CHECKSUM values.


What *IS* a HASHBYTE?

HASHBYTE  is a function in SQL Server  which can be used for creating hash values . Hash value is nothing but a string generated as per the hash algorithm (MDx, SHAx) and the input. All of these algorithms have their own advantages and disadvantages.

Why would I use it?

Most of the times I use this function for table lookups to detect if there is any change in the data.For Ex. If there is any Update in a row, the value created using the function can detect it as the hash value also gets changed when data in a row gets changed.

What the heck are MD2, MD4, MD5, SHA and SHA1?

All of these are Hash Algorithms. These are big topics and also you can find alot of whitepapers if you want to go in detail.

MD2 Algorithm : This algorithm is optimized for 8 bit computers.16 bytes hash value will be created by using this algorithm.

MD4 and MD5 are optimized for 32 bit machines. MD5 is slower than MD4 but MD5 is more secure to use.

SHAx algorithms are more secure than MDx  (Ofcourse there are controversies on which one to be used). Performance wise I havent found any difference between SHA1 and MD5, though I have heard people have realized the difference. Personally I would go for SHA1.

SHA2 is not still supported in SQL Server 2008 R2.

what is the advantage of using Hash bytes over SCD and LOOKUP tables in SSIS???

SCD performs row by row comparisons between the source data and target dimension table. By using SCD we will lose the advantage of set based processing and the fast loading process cannot be achieved by SCD. Implementing Hash function will be much faster than using SCD in SSIS. But there can be a problem using Hash functions.  I have seen people reverting back to SCD from Hash Functions as hash functions sometime produce duplicate records. It all depends on the data you are working on.


Checksum
Hashbytes
Faster but can produce lot of duplicate values
Slower but efficient than checksum
Returns an int value
Returns a varbinary(8000)
Microsoft does NOT recommend using CHECKSUM for change detection purposes
Use Hashbytes for change detection purposes