Connect to Sqoop via CLI

Overview

Sqoop provides a command line interface implemented as a bin/sqoop script. To interact with Sqoop via CLI, connect to a cluster host with a Sqoop client via SSH and run the desired Sqoop CLI command.

For example, check the version by running:

$ sqoop version

Output example:

2024-05-21 13:44:12,685 (main) [INFO - org.apache.sqoop.Sqoop.<init>(Sqoop.java:94)] Running Sqoop version: 1.4.7_arenadata2
Sqoop 1.4.7_arenadata2

Running sqoop help prints the list of all Sqoop commands. You can display help for a specific tool by entering sqoop help <tool-name>, for example, sqoop help import.

The commands have the following syntax:

$ sqoop [COMMAND] [GENERIC_OPTIONS] [COMMAND_OPTIONS]

Where:

  • COMMAND — a Sqoop CLI command;

  • GENERIC_OPTIONS — generic Hadoop command-line arguments;

  • COMMAND_OPTIONS — the selected command’s options.

Usage examples

Sqoop provides several tools, one of which is the import-all-tables command, which allows you to copy data from a database to HDFS.

Here’s a command example for copying data from the test_db PostgreSQL database:

$ sqoop import-all-tables --connect jdbc:postgresql://127.0.0.1:5432/test_db --username <username> --password <password>

Where username and password are the credentials of a database user.

Sample lines from the output:

2024-05-22 17:30:07,477 (main) [INFO - org.apache.sqoop.manager.SqlManager.initOptionDefaults(SqlManager.java:98)] Using default fetchSize of 1000
2024-05-22 17:30:07,709 (main) [INFO - org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:91)] Beginning code generation
2024-05-22 17:30:07,718 (main) [INFO - org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:776)] Executing SQL statement: SELECT t.* FROM "test_db" AS t LIMIT 1
2024-05-22 17:30:07,747 (main) [INFO - org.apache.sqoop.orm.CompilationManager.findHadoopJars(CompilationManager.java:100)] HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
2024-05-22 17:30:07,678 (main) [INFO - org.apache.sqoop.orm.CompilationManager.jar(CompilationManager.java:362)] Writing jar file: /tmp/sqoop-hdfs/compile/9f4a1740d29b617ea7bd61b54d16b63b/test_db.jar
...
2024-05-22 17:30:07,680 (main) [INFO - org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:190)] Transferred 173 bytes in 19.1641 seconds (9.0273 bytes/sec)
2024-05-22 17:30:07,699 (main) [INFO - org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:192)] Retrieved 7 records.

To check whether the data has been copied correctly, run the hdfs dfs -ls <target-directory> command:

$ hdfs dfs -ls hdfs://adhdev/user/hdfs/test_db

The output should display files with the table data:

Found 6 items
-rw-r--r--   3 hdfs hadoop          0 2024-05-22 17:29 hdfs://adhdev/user/hdfs/test_db/_SUCCESS
-rw-r--r--   3 hdfs hadoop         17 2024-05-22 17:29 hdfs://adhdev/user/hdfs/test_db/part-m-00000
-rw-r--r--   3 hdfs hadoop         13 2024-05-22 17:29 hdfs://adhdev/user/hdfs/test_db/part-m-00001
-rw-r--r--   3 hdfs hadoop         22 2024-05-22 17:29 hdfs://adhdev/user/hdfs/test_db/part-m-00002
-rw-r--r--   3 hdfs hadoop         17 2024-05-22 17:29 hdfs://adhdev/user/hdfs/test_db/part-m-00003
-rw-r--r--   3 hdfs hadoop         16 2024-05-22 17:29 hdfs://adhdev/user/hdfs/test_db/part-m-00004

When connecting to a database via Sqoop, make sure that:

  • The JDBC driver for the right database is available in the /var/lib/sqoop/ directory and Sqoop has read and write access to it.

  • The user, whose credentials used for connecting to the database, has access to all DB tables.

  • You are running the command as the user who has write access to the destination directory in HDFS.

Found a mistake? Seleсt text and press Ctrl+Enter to report it