SQL Server Merge Statement

I have 2 tables one containing historical data using type 2 SCD (Slowly changing dimensions) called DimBrand and another containing just the latest dimension data called LatestDimBrand. Using the merge function I will insert new records from LatestDimBrand into DimBrand, I will archive (apply an end date) to any DimBrand records which do not appear in the latest data, and finally enter a new record and archive the old record for any Brands which may have changed.

DimBrand (the target of our Inserts, Updates, Deletes) and DimLatestBrand (the source for Inserts,Updates,Deletes):

DimBrand table
DimLatestBrand table

Here is complete merge statement;

INSERT #DimBrand    ([BrandCode],[BrandName],[StartDate])
SELECT                 [BrandCode],[BrandName],getdate()                                    
FROM                (
                    MERGE #DimBrand AS Target
                    USING    (
                            SELECT    [BrandCode],[BrandName],[StartDate],[EndDate]                       
                            FROM    #LatestDimBrand
                            ) AS Source
                    ON     (Target.[BrandCode] = Source.[BrandCode])    
                    -------------------------------                       
                    WHEN MATCHED AND Target.[BrandName] <> Source.[BrandName]
                        THEN
                        UPDATE SET Target.[EndDate] = getdate()
                    -------------------------------
                    WHEN NOT MATCHED BY TARGET 
                        THEN
                        INSERT ( 
                                [BrandCode]                        
                                ,[BrandName]                    
                                ,[StartDate]              
                                ,[EndDate]                                            
                        )
                        VALUES (      
                                Source.[BrandCode], 
                                Source.[BrandName], 
                                Source.[StartDate],
                                Source.[EndDate]   
                        )
                    -------------------------------
                    WHEN NOT MATCHED BY SOURCE 
                        THEN 
                        UPDATE SET Target.[EndDate] = getdate()
                    -------------------------------
                    OUTPUT $Action, Source.*
                    ) As i([Action],[BrandCode],[BrandName],[StartDate],[EndDate])
                    -------------------------------
WHERE                [Action] = 'UPDATE'
AND                    BrandCode IS NOT NULL

Resources

https://docs.microsoft.com/en-us/sql/t-sql/statements/merge-transact-sql?view=sql-server-ver15

https://www.mssqltips.com/sqlservertip/2883/using-the-sql-server-merge-statement-to-process-type-2-slowly-changing-dimensions/

Extract numeric values from text in SQL Server

I have this paragraph;

Speeding ticket is 210.99USD. Aggravated DUI could add up 1 year jail time.

This is a standard text where the pattern is like this;

Speeding ticket is [Amount]. Aggravated DUI could add up [Term] year jail time.

The ask is to extract Amount and Term from this text. The approach is to split the paragraph and use SQL IsNumeric functions to extract the values;

Here is a sample prototype;

DECLARE @ParagraphText NVARCHAR(MAX) = N'Speeding ticket is 210.99USD. Aggrevated DUI could add up 1 year jail time.'

--table variable
DECLARE @Test TABLE (ValueColumn VARCHAR(8000))
INSERT @Test
--I am using a custom function but you can use string_split() on SQL 2017 and upward
SELECT * FROM dbo.fnSplitString(@ParagraphText, ' ')

--using windows ranking function to get both values
SELECT ROW_NUMBER() OVER(ORDER BY ValueColumn) [ROW_NUMBER],*
FROM
(
    SELECT
    CONVERT(DECIMAL(20,8),
    CASE 
	WHEN IsNumeric(ValueColumn)=1 THEN 	CONVERT(FLOAT,ValueColumn)
	ELSE CONVERT(FLOAT,'0'+LEFT(ValueColumn,PATINDEX('%[^0-9.]%',ValueColumn)-1))
    END) AS ExtractedColumn
    ,ValueColumn
    FROM @Test
) x
WHERE x.ExtractedColumn > 0

SQL Server window functions

One of the most obvious and useful set of window functions are ranking functions where rows from the data set are ranked accordingly. There are three ranking functions:

ROW_NUMBER()
RANK()
DENSE_RANK()

The difference is easy to remember. For the examples, let’s assume we have this stocks data set.

IF OBJECT_ID('tempdb..#stocks') IS NOT NULL DROP TABLE #stocks;
;With Stocks AS
(
    SELECT 'MSFT' Symbol UNION ALL
    SELECT 'MSFT' Symbol UNION ALL
    SELECT 'MSFT' Symbol UNION ALL
    SELECT 'AAPL' Symbol UNION ALL
    SELECT 'GOOG' Symbol UNION ALL
    SELECT 'GOOG' Symbol UNION ALL
    SELECT 'YHOO' Symbol UNION ALL
    SELECT 'T' Symbol
)
SELECT * INTO #stocks FROM Stocks;
--SELECT * FROM #stocks

ROW_NUMBER()

This will assigns unique numbers to each row within the PARTITION given to the ORDER BY clause. SQL Server require an explicit ORDER BY clause in the OVER() clause for string data type. ORDER BY is not mandatory in monetary columns (INT, DECIMAL, FLOAT …).

SELECT Symbol, ROW_NUMBER() OVER(ORDER BY Symbol) [ROW_NUMBER]
FROM #stocks

RANK()

This behaves like ROW_NUMBER(), except that “equal” rows are ranked the same. If we substitute RANK() from previous query:

SELECT Symbol, RANK() OVER(ORDER BY Symbol) [RANK]
FROM #stocks

As you can see, we have gaps between different ranks. We can avoid those gaps by using following;

DENSE_RANK()

DENSE_RANK() is a rank with no gaps, i.e. it is “dense”. We can write:

SELECT Symbol, DENSE_RANK() OVER(ORDER BY Symbol) [DENSE_RANK]
FROM #stocks

To get a good understanding of these three ranking functions is to see them all in action side-by-side. Run this query

SELECT
    SYMBOL,
    ROW_NUMBER() OVER(ORDER BY Symbol) [ROW_NUMBER],
    RANK() OVER(ORDER BY Symbol) [RANK],
    DENSE_RANK() OVER(ORDER BY Symbol) [DENSE_RANK]
FROM #stocks

Sometimes we don’t have order by column and we simple want to return row numbers using Row_Number function. Here is the same query with changes;

SELECT
    SYMBOL,
    ROW_NUMBER() OVER(ORDER BY Symbol) [ROW_NUMBER],
    RANK() OVER(ORDER BY Symbol) [RANK],
    DENSE_RANK() OVER(ORDER BY Symbol) [DENSE_RANK]
FROM #stocks
SELECT
	SYMBOL,
    ROW_NUMBER() OVER(ORDER BY (SELECT 1)) [ROW_NUMBER],
    RANK() OVER(ORDER BY (SELECT 1)) [RANK],
    DENSE_RANK() OVER(ORDER BY (SELECT 1)) [DENSE_RANK]
FROM #stocks

If you compare this result with earlier, you can see that RANK and DENSE_Rank functions doesn’t like the constant, so the values are 1 in respective columns.

You can use any literal value in Order by clause;

order by (select 0)
order by (select 1)
order by (select null)
order by (select 'test')

The above means that when you are using constant ordering is not performed by query optimizer.

Resources

https://docs.microsoft.com/en-us/sql/t-sql/queries/select-over-clause-transact-sql?view=sql-server-ver15

https://stackoverflow.com/questions/44105691/row-number-without-order-by

Word searching/matching in SQL Server

This is a sample of how to match strings which are not exact and have a different order of words. Usually the strings have similar digit patterns but the words may be in different order.

https://stackoverflow.com/questions/48380545/fuzzy-string-matching-sql-words-in-different-order

Another approach is using TF*IDF. This is known as Term Frequency and Inverse Document Frequency. Here is a reference;

TF*IDF in C# example

SSIS Fuzzy lookup has a good support for this;

Fuzzy lookup using SSIS

Using class library in SSIS Projects without GAC installation

I don’t want to add custom assembly in GAC. Here is my research.

There is no direct way to do that from integration services packages, because the only workaround is using CurrentDomain_AssemblyResolve function – loading an assembly from a location instead of installing it into GAC.

There are 3 ways for using custom dll’s on deployment:

  • Assigning DLL’s to the GAC
  • Using the AssemblyResolve Function
  • Copy all Dll’s to the sql server DTS Assemblies folder (example for SQL Server 2008: C:\Program Files\Microsoft SQL Server\100\DTS\Binn) and to the .Net framework assemblies folder.

There could be an interesting workaround by using a Web Service instead of direct dll, so you can create a web service which contains the methods you are using and add a Web reference instead of a Local assembly

Here is an example.

This might be your starting point.

Using custom DLL in script task

Resources

https://docs.microsoft.com/en-us/answers/questions/92003/unable-to-reference-custom-class-library-in-ssis-s.html

https://www.codeproject.com/Articles/895028/Refering-dlls-in-SSIS-Script-Task

https://www.dotnetspider.com/resources/45645-How-to-use-custom-DLL-in-SSIS-Package.aspx