Rhino Etl is an extract, transform and load utility that enables you to move data from many different sources, transform them however you like and then load it into a different destination source. If you have worked with ETL's in the past you may be familiar with technologies such as: DTS (Data Transformation Services) or SSIS (SQL Server Integration Services) in the Microsoft world.

Rhino Etl processes can be created by either writing C# classes or through a DSL. The basic idea behind each process is a Pipeline. Each process has input rows and output rows. I will demonstrate using the DSL. Let's assume for a moment that you have 5 records in a database as depicted in the picture above. You would like to get those records out of the database, format them and finally into a pipe delimited text file.
Process 1: Extract data from the database
The input command below will execute the SQL statement and return all rows into the pipeline.
input "SourceDatabase", Command = """
SELECT
FirstName, LastName, LoyaltyNumber
FROM LoyaltyTable
"""
It's important to note here that each step in the pipeline does not process or gather all of the rows before it passes it to the next step in the pipeline. As each row is emitted from one process and it is sent to the next. This can dramatically reduce the memory footprint required to run the complete process. The next step is to transform the rows from the first process.
Process 2: Transform the data
This operation uses simple boo syntax (python-like) to manipulate each row. You can modify existing columns or create new ones just by setting their value.
operation transform_names:
for row in rows:
row.Name = row.FirstName + " " + row.LastName
row.Loyalty = row.LoyaltyNumber.Substring(0,3) + "-" + row.LoyaltyNumber.Substring(3,row.LoyaltyNumber.Length-3)
yield row
This operation will take each row and create a new field called Name with the First and Last names combined together. Along with parsing out the Loyalty the way we'd like it.
Process 3: Export the data to a text file
In order to export the data we must define what format should be exported. For text files Rhino Etl comes with the FileHelpers library that allows you to define the format by simply creating a class. Since we're demonstrating this using the DSL here is how the class would look for our export file.
import FileHelpers
[DelimitedRecord("|")]
public class LoyaltyRow:
public Name as string
public Loyalty as string
Then we need a simple operation to write that file to disk.
operation export_file:
engine = Rhino.Etl.Core.Files.FluentFile.For[of LoyaltyRow]()
file = engine.To("""DatabaseExport.txt""")
for row in rows:
record = row.ToObject[of LoyaltyRow]()
file.Write(record)
yield row
file.Dispose()
Complete File
Those are the major pieces of an etl script. Here is what the complete file looks like.
import FileHelpers
[DelimitedRecord("|")]
public class LoyaltyRow:
public Name as string
public Loyalty as string
operation transform_names:
for row in rows:
row.Name = row.FirstName + " " + row.LastName
row.Loyalty = row.LoyaltyNumber.Substring(0,3) + "-" + row.LoyaltyNumber.Substring(3,row.LoyaltyNumber.Length-3)
yield row
operation export_file:
engine = Rhino.Etl.Core.Files.FluentFile.For[of LoyaltyRow]()
file = engine.To("""DatabaseExport.txt""")
for row in rows:
record = row.ToObject[of LoyaltyRow]()
file.Write(record)
yield row
file.Dispose()
process Test:
input "SourceDatabase", Command = """
SELECT
FirstName, LastName, LoyaltyNumber
FROM LoyaltyTable
"""
transform_names()
export_file()
Configuration
Since we are referencing a database connection string named "SourceDatabase" it must pull that information from a configuration file. Here is a very simple one. You need to specify the name, connectionString along with the providerName to indicate which ADO.NET provider to use.
<?xml version="1.0" encoding="utf-8" ?>
<configuration>
<connectionStrings>
<add name="SourceDatabase"
connectionString="Data Source=.\sqlexpress;Initial Catalog=rhino-etl-examples;Integrated Security=SSPI;Timeout=300;"
providerName="System.Data.SqlClient.SqlConnection, System.Data, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089"/>
</connectionStrings>
</configuration>
Execution
Once you have your two files (Example.boo for the ETL and Example.config for the configuration) you can execute it on the command line using Rhino.Etl.Cmd.
Rhino.Etl.Cmd -f:Example.boo -c:Example.config -v