Examples of configuring YARN HA

Configuration properties

You can tune most of the failover functionality using various configuration properties. Following is a list of the required and important ones.

The yarn-default.xml file carries a full list of knobs. See this file for more information including default values. See ResourceManager Restart for instructions on setting up the state-store.

Configuration parameter Description

hadoop.zk.address

A list of host:port pairs of the ZooKeeper servers. Used for both the HDFS state-store and embedded leader-election

yarn.resourcemanager.ha.enabled

Enable RM HA

yarn.resourcemanager.ha.rm-ids

A list of logical IDs for the RMs. e.g., rm1,rm2

yarn.resourcemanager.hostname.rm-id

For each rm-id, specifies the name of the host where the RM is installed. Alternately, you can set each of the RM’s service addresses

yarn.resourcemanager.address.rm-id

For each rm-id, specifies the resource manager host:port for clients to submit jobs. If set, overrides the hostname set in yarn.resourcemanager.hostname.rm-id

yarn.resourcemanager.scheduler.address.rm-id

For each rm-id, specifies the scheduler host:port for ApplicationMasters to obtain resources. If set, overrides the hostname set in yarn.resourcemanager.hostname.rm-id

yarn.resourcemanager.resource-tracker.address.rm-id

For each rm-id, specifies the host:port for NodeManagers to connect. If set, overrides the hostname set in yarn.resourcemanager.hostname.rm-id

yarn.resourcemanager.admin.address.rm-id

For each rm-id, specifies the host:port for administrative commands. If set, overrides the hostname set in yarn.resourcemanager.hostname.rm-id

yarn.resourcemanager.webapp.address.rm-id

For each rm-id, specifies the host:port of the RM web application. You do not need this if you set yarn.http.policy to HTTPS_ONLY. If set, overrides the hostname set in yarn.resourcemanager.hostname.rm-id

yarn.resourcemanager.webapp.https.address.rm-id

For each rm-id, specifies the host:port of the RM HTTPS web application. You do not need this if you set yarn.http.policy to HTTP_ONLY. If set, overrides the hostname set in yarn.resourcemanager.hostname.rm-id

yarn.resourcemanager.ha.id

Identifies the RM in the ensemble. This is optional; however, if set, admins have to ensure that all the RMs have their own IDs in the config

yarn.resourcemanager.ha.automatic-failover.enabled

Enable automatic failover. By default, it is enabled, but the service is active only when HA is enabled

yarn.resourcemanager.ha.automatic-failover.embedded

Use embedded leader-elector to pick the active RM when automatic failover is enabled. By default, it is enabled, but the service is active only when HA is enabled

yarn.resourcemanager.cluster-id

Identifies the cluster. Used by the elector to ensure an RM doesn’t take over the active role in another cluster

yarn.client.failover-proxy-provider

The class to be used by clients, AMs, and NMs to failover to the active RM

yarn.client.failover-no-ha-proxy-provider

The class to be used by clients, AMs and NMs to failover to the active RM, when not running in HA mode

yarn.client.failover-max-attempts

The maximum number of times FailoverProxyProvider should attempt failover

yarn.client.failover-sleep-base-ms

The sleep base (in milliseconds) to be used for calculating the exponential delay between failover processes

yarn.client.failover-sleep-max-ms

The maximum sleep time (in milliseconds) between failover processes

yarn.client.failover-retries

The number of retries per attempt to connect to an RM

yarn.client.failover-retries-on-socket-timeouts

The number of retries per attempt to connect to an RM on socket timeouts

Sample configuration

Here is a sample of minimal setup for RM failover:

<property>
  <name>yarn.resourcemanager.ha.enabled</name>
  <value>true</value>
</property>
<property>
  <name>yarn.resourcemanager.cluster-id</name>
  <value>cluster1</value>
</property>
<property>
  <name>yarn.resourcemanager.ha.rm-ids</name>
  <value>rm1,rm2</value>
</property>
<property>
  <name>yarn.resourcemanager.hostname.rm1</name>
  <value>master1</value>
</property>
<property>
  <name>yarn.resourcemanager.hostname.rm2</name>
  <value>master2</value>
</property>
<property>
  <name>yarn.resourcemanager.webapp.address.rm1</name>
  <value>master1:8088</value>
</property>
<property>
  <name>yarn.resourcemanager.webapp.address.rm2</name>
  <value>master2:8088</value>
</property>
<property>
  <name>hadoop.zk.address</name>
  <value>zk1:2181,zk2:2181,zk3:2181</value>
</property>

Administrative commands

The yarn rmadmin utility has a few HA-specific command options to check the health and state of an RM and its transition to the active or standby mode. When HA is enabled, the commands take the service ID of an RM set by yarn.resourcemanager.ha.rm-ids as an argument.

Requesting the RM1 state:

$ yarn rmadmin -getServiceState rm1

The output:

active

Similar request for the RM2 state:

$ yarn rmadmin -getServiceState rm2
standby

If automatic failover is enabled, you cannot use manual transition command.

CAUTION
Although you can override this by the –forcemanual flag, use it with caution.

A transition attempt:

$ yarn rmadmin -transitionToStandby rm1

The request is denied:

Automatic failover is enabled for org.apache.hadoop.yarn.client.RMHAServiceTarget@1d8299fd
Refusing to manually manage HA state, since it may cause
a split-brain scenario or other incorrect state.
If you are very sure you know what you are doing, please
specify the forcemanual flag.

See additionally YARN CLI for frequently used YARN commands.

Found a mistake? Seleсt text and press Ctrl+Enter to report it