Connect to Sqoop via CLI
Overview
Sqoop provides a command line interface implemented as a bin/sqoop script. To interact with Sqoop via CLI, connect to a cluster host with a Sqoop client via SSH and run the desired Sqoop CLI command.
For example, check the version by running:
$ sqoop version
Output example:
2024-05-21 13:44:12,685 (main) [INFO - org.apache.sqoop.Sqoop.<init>(Sqoop.java:94)] Running Sqoop version: 1.4.7_arenadata2 Sqoop 1.4.7_arenadata2
Running sqoop help
prints the list of all Sqoop commands. You can display help for a specific tool by entering sqoop help <tool-name>
, for example, sqoop help import
.
The commands have the following syntax:
$ sqoop [COMMAND] [GENERIC_OPTIONS] [COMMAND_OPTIONS]
Where:
-
COMMAND
— a Sqoop CLI command; -
GENERIC_OPTIONS
— generic Hadoop command-line arguments; -
COMMAND_OPTIONS
— the selected command’s options.
Usage examples
Sqoop provides several tools, one of which is the import-all-tables
command, which allows you to copy data from a database to HDFS.
Here’s a command example for copying data from the test_db
PostgreSQL database:
$ sqoop import-all-tables --connect jdbc:postgresql://127.0.0.1:5432/test_db --username <username> --password <password>
Where username
and password
are the credentials of a database user.
Sample lines from the output:
2024-05-22 17:30:07,477 (main) [INFO - org.apache.sqoop.manager.SqlManager.initOptionDefaults(SqlManager.java:98)] Using default fetchSize of 1000 2024-05-22 17:30:07,709 (main) [INFO - org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:91)] Beginning code generation 2024-05-22 17:30:07,718 (main) [INFO - org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:776)] Executing SQL statement: SELECT t.* FROM "test_db" AS t LIMIT 1 2024-05-22 17:30:07,747 (main) [INFO - org.apache.sqoop.orm.CompilationManager.findHadoopJars(CompilationManager.java:100)] HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce 2024-05-22 17:30:07,678 (main) [INFO - org.apache.sqoop.orm.CompilationManager.jar(CompilationManager.java:362)] Writing jar file: /tmp/sqoop-hdfs/compile/9f4a1740d29b617ea7bd61b54d16b63b/test_db.jar ... 2024-05-22 17:30:07,680 (main) [INFO - org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:190)] Transferred 173 bytes in 19.1641 seconds (9.0273 bytes/sec) 2024-05-22 17:30:07,699 (main) [INFO - org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:192)] Retrieved 7 records.
To check whether the data has been copied correctly, run the hdfs dfs -ls <target-directory>
command:
$ hdfs dfs -ls hdfs://adhdev/user/hdfs/test_db
The output should display files with the table data:
Found 6 items -rw-r--r-- 3 hdfs hadoop 0 2024-05-22 17:29 hdfs://adhdev/user/hdfs/test_db/_SUCCESS -rw-r--r-- 3 hdfs hadoop 17 2024-05-22 17:29 hdfs://adhdev/user/hdfs/test_db/part-m-00000 -rw-r--r-- 3 hdfs hadoop 13 2024-05-22 17:29 hdfs://adhdev/user/hdfs/test_db/part-m-00001 -rw-r--r-- 3 hdfs hadoop 22 2024-05-22 17:29 hdfs://adhdev/user/hdfs/test_db/part-m-00002 -rw-r--r-- 3 hdfs hadoop 17 2024-05-22 17:29 hdfs://adhdev/user/hdfs/test_db/part-m-00003 -rw-r--r-- 3 hdfs hadoop 16 2024-05-22 17:29 hdfs://adhdev/user/hdfs/test_db/part-m-00004
When connecting to a database via Sqoop, make sure that:
-
The JDBC driver for the right database is available in the /var/lib/sqoop/ directory and Sqoop has read and write access to it.
-
The user, whose credentials used for connecting to the database, has access to all DB tables.
-
You are running the command as the user who has write access to the destination directory in HDFS.