Sqoop and Microsoft SQL Server

From Microsoft's technet:
With SQL Server-Hadoop Connector [1], you import data from:
Tables in SQL Server to delimited text files on HDFS
Tables in SQL Server to SequenceFiles files on HDFS
Tables in SQL Server to tables in Hive*
Queries executed in SQL Server to delimited text files on HDFS
Queries executed in SQL Server to SequenceFiles files on HDFS
Queries executed in SQL Server to tables in Hive*
 
With SQL Server-Hadoop Connector, you can export data from:
Delimited text files on HDFS to SQL Server
SequenceFiles on HDFS to SQL Server
Hive Tables* to tables in SQL Server
But before it works you have to setup the connector. First get the MS JDBC driver [2]:
You have just to download the driver, unpack them and copy the driver (sqljdbc4.jar) file to the $SQOOP_HOME/lib/ directory. Now download the connector (.tar.gz) from [1], unpack them and set the MSSQL_CONNECTOR_HOME into that directory. Let's assume you unpack into /usr/sqoop/connector/mssql, do:
# export MSSQL_CONNECTOR_HOME=/usr/sqoop/connector/mssql 

control the export:
# echo $MSSQL_CONNECTOR_HOME
/usr/sqoop/connector/mssql


and run the install.sh in the unpacked directory.
sh ./install.sh

Tip: create a profile.d file:
# cat /etc/profile.d/mssql.sh
export MSSQL_CONNECTOR_HOME=/usr/sqoop/connector/mssql
and chmod into 755

An example:
Sqoop <=> MS SQL Server and hadoop processing works well. Just setup a larger PoC with split the data in 3 maps:
# sqoop import --connect 'jdbc:sqlserver://<IP>;username=dbuser;password=dbpasswd;database=<DB>' --table <table> --target-dir /path/to/hdfs/dir --split-by <KEY> -m 3

=> export of 1.3 GB data tooks around one minute. After processing just send back:
# sqoop export --connect 'jdbc:sqlserver://<IP>;username=dbuser;password=dbpasswd;database=<DB>' --table=<table> --direct --export-dir /path/from/hdfs/dir

You can do the same operations as you know from oracle or mysql sqoop scripts.

[1] http://www.microsoft.com/download/en/details.aspx?id=27584
[2] http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=21599

Comments

  1. hi,
    I am trying to import from sql server into HDFS, but I am getting errors as:

    hadoop@ubuntu:~/sqoop-1.1.0/bin$ ./sqoop import --connect 'jdbc:sqlserver://192.168.230.1;username=xxx;password=xxxxx;database=HadoopTest' --table PersonInfo --target-dir /home/hadoop/hadoop-0.21.0/

    11/12/10 12:13:20 ERROR tool.BaseSqoopTool: Got error creating database manager: java.io.IOException: No manager for connect string: jdbc:sqlserver://192.168.230.1;username=xxx;password=xxxxx;database=HadoopTest
    at com.cloudera.sqoop.ConnFactory.getManager(ConnFactory.java:119)
    at com.cloudera.sqoop.tool.BaseSqoopTool.init(BaseSqoopTool.java:178)
    at com.cloudera.sqoop.tool.ImportTool.init(ImportTool.java:81)
    at com.cloudera.sqoop.tool.ImportTool.run(ImportTool.java:411)
    at com.cloudera.sqoop.Sqoop.run(Sqoop.java:134)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83)
    at com.cloudera.sqoop.Sqoop.runSqoop(Sqoop.java:170)
    at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:196)
    at com.cloudera.sqoop.Sqoop.main(Sqoop.java:205)

    What is the problem I am not getting?
    My Hadoop version : hadoop-0.21.0
    Sqoop version : sqoop-1.1.0

    Pls suggest me solution.
    Thanks.

    ReplyDelete
  2. The driver is installed and sqoop can find it? The install.sh was running without an error?

    ReplyDelete
  3. I followed all the steps but can't get sqoop running.I am getting this error.Can you please tell what is wrong

    [hduser@master bin]$ ./sqoop-help
    Warning: $HADOOP_HOME is deprecated.

    Exception in thread "main" java.lang.NoClassDefFoundError: com/cloudera/sqoop/Sqoop
    Caused by: java.lang.ClassNotFoundException: com.cloudera.sqoop.Sqoop
    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
    Could not find the main class: com.cloudera.sqoop.Sqoop. Program will exit.

    ReplyDelete
  4. Anonymous18 July, 2012

    How do you have sqoop installed? What shows sqoop -version?

    ReplyDelete
  5. Untar the sqoop to /usr/local/sqoop
    downloaded sqoop-sqlserver connector and copied to connectors folder
    and ran install.sh
    Copied hadoop-core-1.0.3.jar in sqoop lib
    Copied sqoop-sqlserver-1.0.jar,mysql-connector-java-5.1.21-bin.jar in sqoop lib
    Set the environment variables

    MSSQL_CONNECTOR_HOME=/usr/local/sqoop/sqoop-sqlserver-1.0/
    HADOOP_HOME=/usr/local/hadoop
    SQOOP_CONF_DIR=/usr/local/sqoop/conf
    SQOOP_HOME=/usr/local/sqoop
    HBASE_HOME=/usr/local/hbase-0.92.1/
    HADOOP_CLASSPATH=:/usr/local/sqoop/sqoop-1.4.1-incubating.jar

    ReplyDelete
  6. Laura,

    I am new to hadoop and sqoop.
    Can you tell me the steps to install hadoop and sqoop on my ubuntu 12.04. I did install hadoop 1.0.3 but unable to install sqoop.

    ReplyDelete
  7. when i export the to SQL server,it cause the Exception below:
    SQLServerException:incorrect sytnax near ','

    what's wrong??

    ReplyDelete
  8. Anonymous25 July, 2012

    @Andy: Follow the instructions here:
    http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/

    After you've got them running, download the latest sqoop release from sqoop.apache.org

    ReplyDelete
  9. Anonymous25 July, 2012

    @张伯坚:

    hard to say, without a trace or logs. Please post your question in user@sqoop.apache.org again, with stack trace, query and logs.

    ReplyDelete
    Replies
    1. i have sent one .but it reply me that i'm a unsubscriber.
      The Exception is below,can you help me ?
      stserhadoop01:~> sqoop export --connect 'jdbc:sqlserver://10.245.243.177;username=dtjh;password=dtjh123;database=bstestsys' --table tb_hd_test12e --export-dir tb_hd_test12
      12/07/25 16:34:58 INFO SqlServer.MSSQLServerManagerFactory: Using Microsoft's SQL Server - Hadoop Connector
      12/07/25 16:34:58 INFO tool.CodeGenTool: Beginning code generation
      12/07/25 16:34:59 INFO manager.SqlManager: Executing SQL statement: SELECT TOP 1 * FROM [tb_hd_test12e]
      12/07/25 16:34:59 INFO manager.SqlManager: Executing SQL statement: SELECT TOP 1 * FROM [tb_hd_test12e]
      12/07/25 16:34:59 INFO orm.CompilationManager: HADOOP_HOME is /myfile/hadoop/hd/hadoop-0.20.2/bin/..
      12/07/25 16:34:59 INFO orm.CompilationManager: Found hadoop core jar at: /myfile/hadoop/hd/hadoop-0.20.2/bin/../hadoop-0.20.2-core.jar
      12/07/25 16:35:00 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-dtjh/compile/37ea8350ec6482d4d343d5e0b93000b4/tb_hd_test12e.jar
      12/07/25 16:35:00 INFO mapreduce.ExportJobBase: Beginning export of tb_hd_test12e
      12/07/25 16:35:00 INFO manager.SqlManager: Executing SQL statement: SELECT TOP 1 * FROM [tb_hd_test12e]
      12/07/25 16:35:00 INFO input.FileInputFormat: Total input paths to process : 4
      12/07/25 16:35:00 INFO input.FileInputFormat: Total input paths to process : 4
      12/07/25 16:35:00 INFO mapred.JobClient: Running job: job_201207241957_0072
      12/07/25 16:35:01 INFO mapred.JobClient: map 0% reduce 0%
      12/07/25 16:35:13 INFO mapred.JobClient: Task Id : attempt_201207241957_0072_m_000000_0, Status : FAILED
      java.io.IOException: com.microsoft.sqlserver.jdbc.SQLServerException: incorrect syntax near ','
      at com.cloudera.sqoop.mapreduce.AsyncSqlRecordWriter.close(AsyncSqlRecordWriter.java:189)
      at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:507)
      at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:623)
      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
      at org.apache.hadoop.mapred.Child.main(Child.java:170)
      Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: incorrect syntax near ','
      at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:197)
      at com.microsoft.sqlserver.jdbc.SQLServerStatement.getNextResult(SQLServerStatement.java:1493)
      at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.doExecutePreparedStatement(SQLServerPreparedStatement.java:390)
      at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement$PrepStmtExecCmd.doExecute(SQLServerPreparedStatement.java:340)
      at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:4575)
      at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:1400)
      at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:179)
      at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:154)
      at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeUpdate(SQLServerPreparedStatement.java:308)
      at com.cloudera.sqoop.mapreduce.AsyncSqlOutputFormat$AsyncSqlExecThread.run(AsyncSqlOutputFormat.java:233)

      12/07/25 16:35:13 WARN mapred.JobClient: Error reading task outputConnection refused
      12/07/25 16:35:13 WARN mapred.JobClient: Error reading task outputConnection refused

      Delete
  10. how can we export using sqoop to mssql using select statements??

    ReplyDelete

Post a Comment

Popular posts from this blog

Export HDFS over CIFS (Samba3)

Hive query shows ERROR "too many counters"

Connect to HiveServer2 with a kerberized JDBC client (Squirrel)