import-all-tables

Mikhail Serov

The import-all-tables tool imports a set of tables from an RDBMS to HDFS. The data from each table is stored in a separate directory in HDFS.

For the import-all-tables tool to be useful, the following conditions must be met:

Each table must have a single-column primary key or the --autoreset-to-one-mapper option must be used.
You should import all columns of each table.
You should not use non-default splitting column, nor impose any conditions using the WHERE clause.

The tool usage is shown below.

$ sqoop import-all-tables <generic-args> <import-args>
$ sqoop-import-all-tables <generic-args> <import-args>

Although the generic Hadoop arguments must precede any import-all-tables arguments, the import-all-tables arguments can be specified in any order with respect to one another.

Common arguments
--connect <jdbc-uri>	Specifies the JDBC connection string
--connection-manager <class-name>	Specifies the connection manager class to use
--connection-param-file <filename>	Specifies optional properties file that provides connection parameters
--driver <class-name>	Specifies the JDBC driver class to use
--hadoop-mapred-home <dir>	Overrides `$HADOOP_MAPRED_HOME`
--help	Prints usage instructions
--password-file	Sets the path to a file containing the authentication password
-P	Reads the password from the console
--password <password>	Specifies the authentication password
--username <username>	Specifies the authentication username
--verbose	Prints more information while working
--relaxed-isolation	Instructs Sqoop to use the read-uncommitted isolation level

Import control arguments
--as-avrodatafile	Imports data as Avro Data Files
--as-sequencefile	Imports data to SequenceFiles
--as-textfile	Imports data as plain text (default)
--as-parquetfile	Imports data to Parquet files
--autoreset-to-one-mapper	Import should use one mapper if a table has no primary key and no split-by column is provided. Cannot be used with the `--split-by <column-name>` option
--direct	Uses the direct connector for the database (if exists)
--exclude-tables <tables>	Specifies a comma-separated list of tables to exclude from the import process
--inline-lob-limit <n>	Sets the maximum size for an inline LOB
-m,--num-mappers <n>	Specifies to use `n` map tasks to import in parallel
--warehouse-dir <dir>	Sets an HDFS parent directory for table destination
-z,--compress	Enables compression
--compression-codec <c>	Specifies a Hadoop compression codec to use (default is gzip)

These arguments behave in the same way as when used with the sqoop-import tool, but the --table, --split-by, --columns, and --where arguments are not allowed for import-all-tables. The --exclude-tables argument is intended for import-all-tables only.

Output line formatting arguments
--enclosed-by <char>	Sets a required field enclosing character
--escaped-by <char>	Sets an escape character
--fields-terminated-by <char>	Sets a field separator character
--lines-terminated-by <char>	Sets an end-of-line character
--mysql-delimiters	Uses the MySQL default delimiter set: fields — `,`, lines — `\n`, escaped-by — `\`, optionally-enclosed-by — `'`
--optionally-enclosed-by <char>	Sets an optional field enclosing character

Input parsing arguments
--input-enclosed-by <char>	Sets a character that encloses the input
--input-escaped-by <char>	Sets an input escape character
--input-fields-terminated-by <char>	Sets an input field separator
--input-lines-terminated-by <char>	Sets an input end-of-line character
--input-optionally-enclosed-by <char>	Sets a field-enclosing character

Hive arguments
--create-hive-table	If set, then the job fails if the target Hive table exists
--hive-home <dir>	Overrides `$HIVE_HOME`
--hive-import	Imports tables into Hive (uses the Hive’s default delimiters if none are set)
--hive-overwrite	Overwrites existing data in the Hive table
--hive-table <table-name>	Sets the table name to use when importing to Hive
--hive-drop-import-delims	Drops `\n`, `\r`, and `\01` from string fields when importing to Hive
--hive-delims-replacement	Replaces `\n`, `\r`, and `\01` in string fields with user-defined string when importing to Hive
--hive-partition-key	Sets the Hive partition key
--hive-partition-value <v>	Sets the Hive partition value
--map-column-hive <map>	Overrides default mapping from SQL type data types to Hive data types. If you specify commas in this argument, use URL-encoded keys and values, for example, use `DECIMAL(1%2C%201)` instead of `DECIMAL(1, 1)`

Code generation arguments
--bindir <dir>	Sets the output directory for compiled objects
--jar-file <file>	Disables code generation; the provided JAR is used instead
--outdir <dir>	Sets the output directory for generated code
--package-name <name>	Puts auto-generated classes into the specified package

The import-all-tables tool does not support the --class-name argument. You may, however, specify a package with --package-name in which all generated classes will be placed.

Found a mistake? Seleсt text and press Ctrl+Enter to report it