distcp
The tool is used for inter/intra-cluster copying of files and directories.
The tool usage is as follows:
$ mapred distcp <src> <dst> [args]
-append |
Allows to reuse existing data in destination files and append new data to them if possible |
-async |
Runs the |
-atomic |
Instructs |
-bandwidth <arg> |
Specifies a bandwidth per map (in MB/second) |
-blocksperchunk <arg> |
The number of blocks per chunk. When specified, splits files into chunks to copy in parallel |
-copybuffersize |
The size of the copy buffer to use (in bytes).
Defaults to |
-delete |
Deletes files existing in |
-diff <oldSnapshot> <newSnapshot> |
Allows to identify the difference between source and target, and apply the diff to the target to make it in sync with source |
-f <urilist_uri> |
Specifies a path to a file with a list of URIs to be copied |
-filelimit <n> |
Limits the total number of files to copy to be <= |
-filters |
The path to a file containing a list of pattern strings, one string per line, to exclude paths that match the pattern from the copy |
-i |
Ignores failures |
-log <path/to/logdir> |
Saves logs to <path/to/logdir> |
-m |
Defines the maximum number of simultaneous copies |
-numListstatusThreads |
The number of threads to use for building file listings |
-overwrite |
If provided, overwrites the destination |
-p <arg> |
Preserve status (replication, block-size, user, group, permission, checksum-type, ACL, XATTR, timestamps).
If |
-rdiff <newSnapshot> <oldSnapshot> |
Allows to identify the changes on the target since |
-sizelimit <n> |
Deprecated.
Limits the total size to be <= |
-skipcrccheck |
Defines whether to skip CRC checks for source and target paths |
-strategy <arg> |
The copy strategy to be used in |
-tmp <path/to/dir> |
An intermediate work path to be used for atomic commits |
-update |
Updates the target, copying only missing files or directories |
-v |
Logs additional info (path, size) in the SKIP/COPY log |
-xtrack <path> |
Saves information about missing source files to the specified |
Example:
$ mapred distcp hdfs://nn1:8020/foo/bar hdfs://nn2:8020/bar/foo