Hadoop CLI

Sergey Ostapov

Contents

Overview
Hadoop shell basic structure
User commands
Administration commands

Overview

This article includes the reference documentation for the Hadoop shell command-line tool.

NOTE

Currently, the hadoop dfs command is deprecated, use hdfs dfs instead

All the Hadoop commands and subprojects follow the same basic structure:

The usage is as follows:

$ shellcommand [SHELL_OPTIONS] [COMMAND] [GENERIC_OPTIONS] [COMMAND_OPTIONS]

Hadoop shell basic structure

shellcommand

The command of the project being invoked. For example, Hadoop commonly uses hadoop, HDFS uses hdfs, and YARN uses yarn

SHELL_OPTIONS

Options that the shell processes before executing Java

COMMAND

Action to perform

GENERIC_OPTIONS

The common set of options supported by multiple commands

COMMAND_OPTIONS

Various command options for the Hadoop common subprojects

All the shell commands accept a common set of options. For some commands, these options are ignored. For example, passing --hostnames on a command that only executes on a single host will be ignored.

Shell options
--buildpaths	Enables developer versions of JARs
--config confdir	Overwrites the default configuration directory. The default directory is $HADOOP_HOME/etc/hadoop
--daemon mode	If the command supports daemonization (e.g., `hdfs namenode`), executes in the appropriate mode. Supported modes are `start` to start the process in a daemon mode, `stop` to stop the process, and `status` to determine the active status of the process. The status will return an LSB-compliant result code. If no option is provided, commands that support daemonization will run in the foreground. For commands that don’t support daemonization, this option is ignored
--debug	Enables shell-level configuration debugging information
--help	Displays shell script usage information
--hostnames	When `--workers` is used, overrides the workers file with a whitespace-delimited list of hostnames where to execute a multi-host subcommand. If `--workers` isn’t used, this option is ignored
--hosts	When `--workers` is used, overrides the workers file with another file that contains a list of hostnames where to execute a multi-host subcommand. If `--workers` isn’t used, this option is ignored
--loglevel loglevel	Overrides the log level. Valid log levels are `FATAL`, `ERROR`, `WARN`, `INFO`, `DEBUG`, and `TRACE`. Default is `INFO`
--workers	If possible, executes this command on all hosts in the workers file

Many subcommands share a common set of configuration options to alter their behavior.

Generic options
-archives <comma separated list of archives>	Specifies comma-separated archives to be extracted onto the compute machines. Applies only to a job
-conf <configuration file>	Specifies an application configuration file
-D <property>=<value>	Sets a value for a given property
-files <comma separated list of files>	Specifies comma-separated files to be copied to the MapReduce cluster. Applies only to a job
-fs <file:///> or <hdfs://namenode:port>	Specifies the default file system URL to use. Overrides `fs.defaultFS` property from configurations
-jt <local> or <resourcemanager:port>	Specifies a ResourceManager. Applies only to a job
-libjars <comma separated list of jars>	Specifies comma-separated JAR files to include in the classpath. Applies only to a job

User commands

These commands are helpful for Hyperwave cluster users.

Command Description

Administration commands

Commands are useful for administrators of a Hyperwave cluster.

Command

Description

daemonlog

Gets/sets the log level

Found a mistake? Seleсt text and press Ctrl+Enter to report it