codegen

Mikhail Serov

The codegen tool generates Java classes that encapsulate and interpret imported records. The Java definition of a record is instantiated as part of the import process, but can also be performed separately. For example, if Java source is lost, it can be recreated. New versions of a class can be created that use different delimiters between fields, and so on.

The tool usage is shown below.

$ sqoop codegen <generic-args> <codegen-args>
$ sqoop-codegen <generic-args> <codegen-args>

Common arguments
--connect <jdbc-uri>	Specifies the JDBC connection string
--connection-manager <class-name>	Specifies the connection manager class to use
--connection-param-file <filename>	Specifies optional properties file that provides connection parameters
--driver <class-name>	Specifies the JDBC driver class to use
--hadoop-mapred-home <dir>	Overrides `$HADOOP_MAPRED_HOME`
--help	Prints usage instructions
--password-file	Sets the path to a file containing the authentication password
-P	Reads the password from the console
--password <password>	Specifies the authentication password
--username <username>	Specifies the authentication username
--verbose	Prints more information while working
--relaxed-isolation	Instructs Sqoop to use the read-uncommitted isolation level

Code generation arguments
--bindir <dir>	Sets the output directory for compiled objects
--class-name <name>	Specifies a name for generated class. This overrides `--package-name`. When combined with `--jar-file`, sets the input class
--jar-file <file>	Disables code generation; the provided JAR is used instead
--map-column-java <m>	Overrides the default mapping from SQL type to Java type for column `<m>`
--outdir <dir>	Sets the output directory for generated code
--package-name <name>	Puts auto-generated classes into the specified package

Output line formatting arguments
--enclosed-by <char>	Sets a required field enclosing character
--escaped-by <char>	Sets an escape character
--fields-terminated-by <char>	Sets a field separator character
--lines-terminated-by <char>	Sets an end-of-line character
--mysql-delimiters	Uses the MySQL default delimiter set: fields — `,`, lines — `\n`, escaped-by — `\`, optionally-enclosed-by — `'`
--optionally-enclosed-by <char>	Sets an optional field enclosing character

Input parsing arguments
--input-enclosed-by <char>	Sets a character that encloses the input
--input-escaped-by <char>	Sets an input escape character
--input-fields-terminated-by <char>	Sets an input field separator
--input-lines-terminated-by <char>	Sets an input end-of-line character
--input-optionally-enclosed-by <char>	Sets a field-enclosing character

Hive arguments
--create-hive-table	If set, then the job fails if the target Hive table exists
--hive-home <dir>	Overrides `$HIVE_HOME`
--hive-import	Imports tables into Hive (uses the Hive’s default delimiters if none are set)
--hive-overwrite	Overwrites existing data in the Hive table
--hive-table <table-name>	Sets the table name to use when importing to Hive
--hive-drop-import-delims	Drops `\n`, `\r`, and `\01` from string fields when importing to Hive
--hive-delims-replacement	Replaces `\n`, `\r`, and `\01` in string fields with user-defined string when importing to Hive
--hive-partition-key	Sets the Hive partition key
--hive-partition-value <v>	Sets the Hive partition value
--map-column-hive <map>	Overrides default mapping from SQL type data types to Hive data types. If you specify commas in this argument, use URL-encoded keys and values, for example, use `DECIMAL(1%2C%201)` instead of `DECIMAL(1, 1)`

If Hive arguments are provided to the codegen tool, Sqoop generates a file containing the HQL statements to create a table and load data.

Found a mistake? Seleсt text and press Ctrl+Enter to report it