C# SSIS: Skip first and last row in a flat file conneciton

There are many situations which you can use power of scripting in SSIS data flow. Today I encounter a situation here , there is source flat file here with header and trailer records which have different structure from data rows. Data rows delimited by vertical line ( | ) , but header and trailer not. Also Trailer rows information needs to import to another destination.

So we have a situation to read both data rows delimited and trailer row which isn’t delimited ( it’s fixed length on each value )

This is one of the situations which Power of script component makes sense.

In this sample I used a Script Component as Source.

Problem:

This is content of source flat file:

HVWBB630 201012302010123005473333
Iseyapcm            |88071523|E|Y|N|N|N|Y|
Anitakaul1          |88072163|E|Y|N|N|N|Y|
efarrellssb         |88072233|E|Y|N|N|N|Y|
markghouse          |88072450|E|Y|N|N|N|Y|
dechopra            |88074988|E|Y|N|N|N|Y|
TVWBB630 00000005122010123020101230TRADE PROFILE

first row is header row which should be skipped so we don’t consider this .

and last row is trailer which have useful information for us, like:

Program Name : VWBB630

Record Count: 512

Date:  20101230

Data rows should import into a data table, and trailer information should import in another destination table.

Solution:

1- Add a Data flow task.

2- Add a Script Component Transformation as SOURCE .


3- Double click on script component, and in General tab, set language as Visual Basic


4- Go to Inputs and Outputs Tab,

rename the existing Output0 to OutputMainRows

and add another output named OutputTrailerRow


Under OutputMainRows add 8 columns with names : Column 0 , Column 1 , … , Column 7 and set data type of them as DT_STR

( this is because we have 8 data columns in this source file sample )

Under OutputTrailerRow add 3 Columns as below:

Column name                      data type

————————————–

ProgramName                      DT_STR

RecordCount                       DT_I4

Date                                  DT_DATE


4- Go back to general tab, and click on Edit Script,

Here is where we should read data from source file, in this sample I used System.IO.StreamReader .NET class which get us the ability to read data from file.

First of all , in the class definition, declare a variable of this type:

Dim sr As System.IO.StreamReader

the StreamReader class should be instantiated once the script component is in pre execute step, so I write this line in the PreExecute() method as below:

Public Overrides Sub PreExecute()
MyBase.PreExecute()
sr = New System.IO.StreamReader(“D:\SSIS\ScriptComponentAsSource\sourcefile.txt”)
End Sub


Now we opened the file and we can read it with .ReadLine() method line by line, this is where we process source data and redirect each data to appropriate output.

We should use the CreateNewOutputRows method, and write our main code there, also note that for adding row to every output we should use this structure<OUTPUTNAME>Buffer.AddRow()

This is the code which should be written in CreateNewOutputRows() :

Public Overrides Sub CreateNewOutputRows()
Dim lineIndex As Integer = 0

While (Not sr.EndOfStream)
Dim line As String = sr.ReadLine()
If (lineIndex <> 0) Then ‘remove header row
Dim columnArray As String() = line.Split(Convert.ToChar(“|”))
If (columnArray.Length > 1) Then
‘main rows
OutputMainRowsBuffer.AddRow()
OutputMainRowsBuffer.Column0 = columnArray(0)
OutputMainRowsBuffer.Column1 = columnArray(1)
OutputMainRowsBuffer.Column2 = columnArray(2)
OutputMainRowsBuffer.Column3 = columnArray(3)
OutputMainRowsBuffer.Column4 = columnArray(4)
OutputMainRowsBuffer.Column5 = columnArray(5)
OutputMainRowsBuffer.Column6 = columnArray(6)
OutputMainRowsBuffer.Column7 = columnArray(7)
Else
‘trailer row
OutputTrailerRowBuffer.AddRow()
OutputTrailerRowBuffer.ProgramName = line.Substring(1, 7)
OutputTrailerRowBuffer.RecordCount = line.Substring(9, 10)
OutputTrailerRowBuffer.Date = DateTime.ParseExact(line.Substring(19, 8), “yyyyMMdd”, New   System.Globalization.CultureInfo(“en-US”))
End If
End If
lineIndex = lineIndex + 1
End While

End Sub


When reading data completed we should close the StreamReader , the PostExecute() method is the best place to close connections.

Public Overrides Sub PostExecute()
MyBase.PostExecute()
sr.Close()
End Sub


This is whole code in this script component:

‘ Microsoft SQL Server Integration Services Script Component
‘ Write scripts using Microsoft Visual Basic 2008.
‘ ScriptMain is the entry point class of the script.

Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

<Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute()> _
<CLSCompliant(False)> _
Public Class ScriptMain
Inherits UserComponent
Dim sr As System.IO.StreamReader

Public Overrides Sub PreExecute()
MyBase.PreExecute()
sr = New System.IO.StreamReader(“D:\SSIS\ScriptComponentAsSource\sourcefile.txt”)
End Sub

Public Overrides Sub PostExecute()
MyBase.PostExecute()
sr.Close()
End Sub

Public Overrides Sub CreateNewOutputRows()
Dim lineIndex As Integer = 0

While (Not sr.EndOfStream)
Dim line As String = sr.ReadLine()
If (lineIndex <> 0) Then ‘remove header row
Dim columnArray As String() = line.Split(Convert.ToChar(“|”))
If (columnArray.Length > 1) Then
‘main rows
OutputMainRowsBuffer.AddRow()
OutputMainRowsBuffer.Column0 = columnArray(0)
OutputMainRowsBuffer.Column1 = columnArray(1)
OutputMainRowsBuffer.Column2 = columnArray(2)
OutputMainRowsBuffer.Column3 = columnArray(3)
OutputMainRowsBuffer.Column4 = columnArray(4)
OutputMainRowsBuffer.Column5 = columnArray(5)
OutputMainRowsBuffer.Column6 = columnArray(6)
OutputMainRowsBuffer.Column7 = columnArray(7)
Else
‘trailer row
OutputTrailerRowBuffer.AddRow()
OutputTrailerRowBuffer.ProgramName = line.Substring(1, 7)
OutputTrailerRowBuffer.RecordCount = line.Substring(9, 10)
OutputTrailerRowBuffer.Date = DateTime.ParseExact(line.Substring(19, 8), “yyyyMMdd”, New System.Globalization.CultureInfo(“en-US”))
End If
End If
lineIndex = lineIndex + 1
End While

End Sub

End Class

Save and close the script editor.

5- Now you have two outputs in script component. connect OutputMainRows to the table which you want to insert data rows.


and connect OutputTrailerRow to the destination table which should store trailer information.


6- Now is the time to run package, This is output showed in the OutputMainRows:


and this is output in the OutputTrailerRow:


Hope this be helpful post.