This project is read-only.

What is RegExtractor

RegExtractor is an SSIS component that allows you to define a regular expression with captured matches (backreferences) that you can apply to a column in your data flow. The captured matches are output as new columns in the data flow.

Installation 32 bit

copy RegExtractor.dll "%ProgramFiles%\Microsoft SQL Server\100\DTS\PipelineComponents\"
gacutil /u RegExtractor
gacutil.exe -iF "%ProgramFiles%\Microsoft Sql Server\100\DTS\PipelineComponents\RegExtractor.dll"

If you do not have gacutil.exe, I think you can replace the last two steps with copying the .dll to C:\WINDOWS\assembly in Window explorer.

Installation 64 bit

copy RegExtractor.dll "%ProgramFiles%\Microsoft SQL Server\100\DTS\PipelineComponents\"
copy RegExtractor.dll "%ProgramFiles%\Microsoft SQL Server (x86)\100\DTS\PipelineComponents\"
gacutil /u RegExtractor
gacutil.exe -iF "%ProgramFiles%\Microsoft Sql Server\100\DTS\PipelineComponents\RegExtractor.dll"


If you do not have gacutil.exe, I think you can replace the last two steps with copying the .dll to C:\WINDOWS\assembly in Window explorer.

Configure Visual Studio

  • In the toolbar on the left, right click and select 'Choose Items...'
  • Click SSIS Data Flow Items
  • Select RegExtractor
  • Click OK
You should now see 'RegExtractor' show up in the control flow items.

Use

You must understand Regular Expressions and Backreferneces to use this component. A great tutorial is here: http://www.regular-expressions.info/tutorial.html
  • To use the RegExtractor component, drag the component into a Data Flow.
  • Attach it to an upstream component
  • Right click on component and select 'Show Advanced Editor'
  • Click the 'Input Columns' tab
  • Select each column from the input that you either want to use in this component or pass through to other downstream components
  • Click 'Comonent Properties' tab
  • You will see a 'Properties' screen pop up.
  • The bottom section 'Custom Properties' is where you will configure the RegExtractor
  • There are four things to configure
  • FirstMatchOnly (True or False): Set to true if you only want to extract value from the first instance found for the regex
  • RegEx: Here is where the 'magic' happens. Put in a regular expression with backreferences (using parenthesis), preferably using named backreferences (http://www.regular-expressions.info/named.html). All caputured matches will be written to new columns in your data flow. All instances of regextractor also generate a column called match_0 that outputs the entire expression match. Example to extract the first word after Name:Name:\s+(<?first_nm>[A-Z]+)
  • RegexOptions: See http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regexoptions.aspx. Separate multiple options with pipes. All options available in .NET are also available in the RegExtractor. Options I frequently use:
    1. RegexOptions.ExplicitCapture : only outputs named backreferences into output stream
    2. RegexOptions.IgnoreCase: Ignore case for letters
    3. Example: RegexOptions.ExplicitCapture | RegexOptions.IgnoreCase
  • TextColumn: You have to tell RegExtractor which column contains the text you want to apply the regular expression to. Type the name of the column here.

That's really all there is to it. Please let me know if any of the documentation can be clarified. Also, how I can attach graphics. I have some screen shots that might be handy.

Last edited Jan 5, 2010 at 6:06 PM by ejust, version 19

Comments

DarrenSQLIS Mar 10, 2011 at 12:56 AM 
To add images to these Wiki pages you first need to be in Edit mode. Then at the bottom of the page use the "Add File Attachment" section to upload the image files. You can then reference these images in the Wiki text using the "image" tag, e.g. [image:ImageFilename.png]