Конференция Arenadata
Новое время — новый Greenplum
Мы приглашаем вас принять участие в конференции, посвященной будущему Open-Source Greenplum 19 сентября в 18:00:00 UTC +3. Встреча будет проходить в гибридном формате — и офлайн, и онлайн. Онлайн-трансляция будет доступна для всех желающих.
Внезапное закрытие Greenplum его владельцем — компанией Broadcom - стало неприятным сюрпризом для всех, кто использует или планирует начать использовать решения на базе этой технологии. Многие ожидают выхода стабильной версии Greenplum 7 и надеются на её дальнейшее активное развитие.
Arenadata не могла допустить, чтобы разрабатываемый годами Open-Source проект Greenplum прекратил своё существование, поэтому 19 сентября мы представим наш ответ на данное решение Broadcom, а участники сообщества получат исчерпывающие разъяснения на все вопросы о дальнейшей судьбе этой технологии.

На конференции вас ждёт обсуждение следующих тем:

  • План возрождения Greenplum;
  • Дорожная карта;
  • Экспертное обсуждение и консультации.
Осталось до события

import

The import tool imports an individual table from an RDBMS to HDFS. Each row from a table is represented as a separate record in HDFS. Records can be stored as text files (one record per line), or in the binary representation.

The tool usage is shown below.

$ sqoop import <generic-args> <import-args>
$ sqoop-import <generic-args> <import-args>

Although the generic Hadoop arguments must precede any import arguments, the import arguments can be specified in any order with respect to one another.

Common arguments

--connect <jdbc-uri>

Specifies the JDBC connection string

--connection-manager <class-name>

Specifies the connection manager class to use

--connection-param-file <filename>

Specifies optional properties file that provides connection parameters

--driver <class-name>

Specifies the JDBC driver class to use

--hadoop-mapred-home <dir>

Overrides $HADOOP_MAPRED_HOME

--help

Prints usage instructions

--password-file

Sets the path to a file containing the authentication password

-P

Reads the password from the console

--password <password>

Specifies the authentication password

--username <username>

Specifies the authentication username

--verbose

Prints more information while working

--relaxed-isolation

Instructs Sqoop to use the read-uncommitted isolation level

Validation arguments

--validate

Enables the validation of the copied data, supports single table copy only

--validator <class-name>

Specifies validator class to use

--validation-threshold <class-name>

Specifies validation threshold class to use

--validation-failurehandler <class-name>

Specifies validation failure handler class to use

Import control arguments

--append

Appends data to an existing dataset in HDFS

--as-avrodatafile

Imports data as Avro Data Files

--as-sequencefile

Imports data to SequenceFiles

--as-textfile

Imports data as plain text (default)

--as-parquetfile

Imports data to Parquet files

--autoreset-to-one-mapper

Import should use one mapper if a table has no primary key and no split-by column is provided. Cannot be used with the --split-by <column-name> option

--boundary-query <statement>

Specifies a boundary query used for creating splits

--columns <col,col,col…>

Specifies columns to import from the table

--delete-target-dir

Deletes the import target directory if it exists

--direct

Uses the direct connector for the database (if exists)

-e,--query <statement>

Imports the results of the <statement> query

--fetch-size <n>

Number of entries to fetch from a database at once

--inline-lob-limit <n>

Sets the maximum size for an inline LOB

-m,--num-mappers <n>

Specifies to use n map tasks to import in parallel

--null-string <null-string>

The string to use for a null value for string columns

--null-non-string <null-string>

The string to use for a null value for non-string columns

--split-by <column-name>

Specifies the table column used to split work units. Cannot be used with the --autoreset-to-one-mapper option

--split-limit <n>

Sets the upper limit for each split size. This only applies to Integer and Date columns. For date or timestamp fields, it is calculated in seconds

--table <table-name>

The table to read

--target-dir <dir>

The HDFS destination directory

--temporary-rootdir <dir>

Sets an HDFS directory for temporary files created during the import (overrides default _sqoop)

--warehouse-dir <dir>

Sets an HDFS parent directory for table destination

--where <where clause>

The WHERE clause to use during the import

-z,--compress

Enables compression

--compression-codec <c>

Specifies a Hadoop compression codec to use (default is gzip)

The --null-string and --null-non-string arguments are optional. If not specified, then the string null will be used.

Output line formatting arguments

--enclosed-by <char>

Sets a required field enclosing character

--escaped-by <char>

Sets an escape character

--fields-terminated-by <char>

Sets a field separator character

--lines-terminated-by <char>

Sets an end-of-line character

--mysql-delimiters

Uses the MySQL default delimiter set: fields — ,, lines — \n, escaped-by — \, optionally-enclosed-by — '

--optionally-enclosed-by <char>

Sets an optional field enclosing character

Input parsing arguments

--input-enclosed-by <char>

Sets a character that encloses the input

--input-escaped-by <char>

Sets an input escape character

--input-fields-terminated-by <char>

Sets an input field separator

--input-lines-terminated-by <char>

Sets an input end-of-line character

--input-optionally-enclosed-by <char>

Sets a field-enclosing character

When Sqoop imports data to HDFS, it generates a Java class that can reinterpret the text files that it creates when doing a delimited-format import. The delimiters are chosen with arguments such as --fields-terminated-by; this controls both how the data is written to disk, and how the generated parse() method reinterprets this data. The delimiters used by the parse() method can be chosen independently of the output arguments, by using --input-fields-terminated-by, and so on. This is useful, for example, to generate classes that can parse records created with one set of delimiters, and emit the records to a different set of files using a separate set of delimiters.

Hive arguments

--create-hive-table

If set, then the job fails if the target Hive table exists

--hive-home <dir>

Overrides $HIVE_HOME

--hive-import

Imports tables into Hive (uses the Hive’s default delimiters if none are set)

--hive-overwrite

Overwrites existing data in the Hive table

--hive-table <table-name>

Sets the table name to use when importing to Hive

--hive-drop-import-delims

Drops \n, \r, and \01 from string fields when importing to Hive

--hive-delims-replacement

Replaces \n, \r, and \01 in string fields with user-defined string when importing to Hive

--hive-partition-key

Sets the Hive partition key

--hive-partition-value <v>

Sets the Hive partition value

--map-column-hive <map>

Overrides default mapping from SQL type data types to Hive data types. If you specify commas in this argument, use URL-encoded keys and values, for example, use DECIMAL(1%2C%201) instead of DECIMAL(1, 1)

HBase arguments

--column-family <family>

Sets the target column family for the import

--hbase-create-table

If specified, creates missing HBase tables

--hbase-row-key <col>

Specifies which input column to use as the row key. If the input table contains a composite key, then <col> must be a comma-separated list of composite key attributes

--hbase-table <table-name>

Specifies an HBase table to use as the target instead of HDFS

--hbase-bulkload

Enables bulk loading

Code generation arguments

--bindir <dir>

Sets the output directory for compiled objects

--class-name <name>

Specifies a name for generated class. This overrides --package-name. When combined with --jar-file, sets the input class

--jar-file <file>

Disables code generation; the provided JAR is used instead

--map-column-java <m>

Overrides the default mapping from SQL type to Java type for column <m>

--outdir <dir>

Sets the output directory for generated code

--package-name <name>

Puts auto-generated classes into the specified package

Found a mistake? Seleсt text and press Ctrl+Enter to report it