SSIS: Use regex in the Foreach File Enumerator

Regex filter for Foreach Loop

Case
The standard Foreach Loop File Enumerator only has a wildcard filter, but in some cases a regular expression filter would be more useful. See/vote this request at Microsoft Connect.Solution
At the moment I’m working on a custom File Enumerator with regular expression support, but for those who don’t want to wait or don’t want to use custom components… You could achieve the same result with a Script Task.

*update: Now also available as Custom Foreach Enumerator. *

1) Foreach Loop
For this case I will use a standard Foreach File Enumerator that fills a variable FilePath. The filter is *.* so all files will be returned.

Standard foreach loop

2) Variables
I will use two extra variables in this example: RegexFilter (string) for storing the regular expression and PassesRegexFilter (boolean) for indicating whether the filename passes the regular expression filter.

Variables

3) Script Task
Add a Script Task in front of the Data Flow Task and give it a suitable name.

Script Task

4) Add variables
Edit the Script Task and add the FilePath and RegexFilter as ReadOnlyVariables and the PassesRegexFilter as ReadWriteVariable.

Variables

5) The Script
Copy the following script to the Script Task.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
// C# Code for filtering filenames with Regex
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;
using System.Text.RegularExpressions;   // Added
using System.IO;                        // Added
namespace ST_02b6595da2274d7182409fb43af929ae.csproj
{
    [System.AddIn.AddIn("ScriptMain", Version = "1.0", Publisher = "", Description = "")]
    public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
    {
        #region VSTA generated code
        enum ScriptResults
        {
            Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
            Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
        };
        #endregion
        public void Main()
        {
            // Get the filename from the complete filepath
            String FileName = Path.GetFileName(Dts.Variables["User::FilePath"].Value.ToString());
            // Create a regex object with the pattern from the SSIS variable
            Regex FileFilterRegex = new Regex(Dts.Variables["User::RegexFilter"].Value.ToString());
            
            // Check if it is match and return that value (boolean) to the SSIS variable
            Dts.Variables["User::PassesRegexFilter"].Value = FileFilterRegex.IsMatch(FileName);
            
            Dts.TaskResult = (int)ScriptResults.Success;
        }
    }
}

or with VB.Net

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
' VB.Net Code for filtering filenames with Regex
Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Runtime
Imports System.Text.RegularExpressions   ' Added
Imports System.IO                        ' Added
<System.AddIn.AddIn("ScriptMain", Version:="1.0", Publisher:="", Description:="")> _
<System.CLSCompliantAttribute(False)> _
Partial Public Class ScriptMain
 Inherits Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
 Enum ScriptResults
  Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success
  Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
 End Enum
 
    Public Sub Main()
        ' Get the filename from the complete filepath
        Dim FileName As String = Path.GetFileName(Dts.Variables("User::FilePath").Value.ToString())
        ' Create a regex object with the pattern from the SSIS variable
        Dim FileFilterRegex As Regex = New Regex(Dts.Variables("User::RegexFilter").Value.ToString())
        ' Check if it is match and return that value (boolean) to the SSIS variable
        Dts.Variables("User::PassesRegexFilter").Value = FileFilterRegex.IsMatch(FileName)
        Dts.TaskResult = ScriptResults.Success
    End Sub
End Class

6) Precedence Constraints Expression
Connect the Script Task to the Data Flow Task and add an expression that does the real filtering:
@[User::PassesRegexFilter] == true

Precedence Constraint with expression

7) The result
For testing the result, I added an other Script Task below the Data Flow Task that logs all files. My regular expression in this test case: .*[0-9]\.csv$    (all csv files that end with a number in the filename)

The result

Note 1: to gain some performance I should have changed the Foreach Loop File Enumerator wildcard from *.* to *.csv

Note 2: this method could be a bit inefficient if your file collection contains thousands of files and you only need two of them.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s