Thursday, October 13, 2011

Sqoop and Microsoft SQL Server

From Microsoft's technet:
With SQL Server-Hadoop Connector [1], you import data from:
Tables in SQL Server to delimited text files on HDFS
Tables in SQL Server to SequenceFiles files on HDFS
Tables in SQL Server to tables in Hive*
Queries executed in SQL Server to delimited text files on HDFS
Queries executed in SQL Server to SequenceFiles files on HDFS
Queries executed in SQL Server to tables in Hive*
 
With SQL Server-Hadoop Connector, you can export data from:
Delimited text files on HDFS to SQL Server
SequenceFiles on HDFS to SQL Server
Hive Tables* to tables in SQL Server
But before it works you have to setup the connector. First get the MS JDBC driver [2]:
You have just to download the driver, unpack them and copy the driver (sqljdbc4.jar) file to the $SQOOP_HOME/lib/ directory. Now download the connector (.tar.gz) from [1], unpack them and set the MSSQL_CONNECTOR_HOME into that directory. Let's assume you unpack into /usr/sqoop/connector/mssql, do:
# export MSSQL_CONNECTOR_HOME=/usr/sqoop/connector/mssql 

control the export:
# echo $MSSQL_CONNECTOR_HOME
/usr/sqoop/connector/mssql


and run the install.sh in the unpacked directory.
sh ./install.sh

Tip: create a profile.d file:
# cat /etc/profile.d/mssql.sh
export MSSQL_CONNECTOR_HOME=/usr/sqoop/connector/mssql
and chmod into 755

An example:
Sqoop <=> MS SQL Server and hadoop processing works well. Just setup a larger PoC with split the data in 3 maps:
# sqoop import --connect 'jdbc:sqlserver://<IP>;username=dbuser;password=dbpasswd;database=<DB>' --table <table> --target-dir /path/to/hdfs/dir --split-by <KEY> -m 3

=> export of 1.3 GB data tooks around one minute. After processing just send back:
# sqoop export --connect 'jdbc:sqlserver://<IP>;username=dbuser;password=dbpasswd;database=<DB>' --table=<table> --direct --export-dir /path/from/hdfs/dir

You can do the same operations as you know from oracle or mysql sqoop scripts.

[1] http://www.microsoft.com/download/en/details.aspx?id=27584
[2] http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=21599