merge

Mikhail Serov

The merge tool allows you to combine two datasets where entries in one dataset should overwrite entries of an older dataset. For example, an incremental import in the last-modified mode will generate multiple datasets in HDFS where successively newer data appears in each dataset. The merge tool will "flatten" two datasets into one, taking the newest available records for each primary key.

The tool usage is shown below.

$ sqoop merge <generic-args> <merge-args>
$ sqoop-merge <generic-args> <merge-args>

Although the generic Hadoop arguments must precede any merge arguments, the merge arguments can be specified in any order with respect to one another.

Merge options
--class-name <class>	Specifies the name of the record-specific class to use during the merge job
--jar-file <file>	Specifies the name of the JAR to load the record class from
--merge-key <col>	Specifies the name of a column to use as the merge key
--new-data <path>	Specifies the path of the newer dataset
--onto <path>	Specifies the path of the older dataset
--target-dir <path>	Specifies the target path for the output of the merge job

Found a mistake? Seleсt text and press Ctrl+Enter to report it