Examples of configuring YARN HA
Configuration properties
You can tune most of the failover functionality using various configuration properties. Following is a list of the required and important ones.
The yarn-default.xml file carries a full list of knobs. See this file for more information including default values. See ResourceManager Restart for instructions on setting up the state-store.
Configuration parameter | Description |
---|---|
hadoop.zk.address |
A list of host:port pairs of the ZooKeeper servers. Used for both the HDFS state-store and embedded leader-election |
yarn.resourcemanager.ha.enabled |
Enable RM HA |
yarn.resourcemanager.ha.rm-ids |
A list of logical IDs for the RMs. e.g., |
yarn.resourcemanager.hostname.rm-id |
For each |
yarn.resourcemanager.address.rm-id |
For each |
yarn.resourcemanager.scheduler.address.rm-id |
For each |
yarn.resourcemanager.resource-tracker.address.rm-id |
For each |
yarn.resourcemanager.admin.address.rm-id |
For each |
yarn.resourcemanager.webapp.address.rm-id |
For each |
yarn.resourcemanager.webapp.https.address.rm-id |
For each |
yarn.resourcemanager.ha.id |
Identifies the RM in the ensemble. This is optional; however, if set, admins have to ensure that all the RMs have their own IDs in the config |
yarn.resourcemanager.ha.automatic-failover.enabled |
Enable automatic failover. By default, it is enabled, but the service is active only when HA is enabled |
yarn.resourcemanager.ha.automatic-failover.embedded |
Use embedded leader-elector to pick the active RM when automatic failover is enabled. By default, it is enabled, but the service is active only when HA is enabled |
yarn.resourcemanager.cluster-id |
Identifies the cluster. Used by the elector to ensure an RM doesn’t take over the active role in another cluster |
yarn.client.failover-proxy-provider |
The class to be used by clients, AMs, and NMs to failover to the active RM |
yarn.client.failover-no-ha-proxy-provider |
The class to be used by clients, AMs and NMs to failover to the active RM, when not running in HA mode |
yarn.client.failover-max-attempts |
The maximum number of times FailoverProxyProvider should attempt failover |
yarn.client.failover-sleep-base-ms |
The sleep base (in milliseconds) to be used for calculating the exponential delay between failover processes |
yarn.client.failover-sleep-max-ms |
The maximum sleep time (in milliseconds) between failover processes |
yarn.client.failover-retries |
The number of retries per attempt to connect to an RM |
yarn.client.failover-retries-on-socket-timeouts |
The number of retries per attempt to connect to an RM on socket timeouts |
Sample configuration
Here is a sample of minimal setup for RM failover:
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>cluster1</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>master1</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>master2</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>master1:8088</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>master2:8088</value>
</property>
<property>
<name>hadoop.zk.address</name>
<value>zk1:2181,zk2:2181,zk3:2181</value>
</property>
Administrative commands
The yarn rmadmin
utility has a few HA-specific command options to check the health and state of an RM and its transition to the active or standby mode. When HA is enabled, the commands take the service ID of an RM set by yarn.resourcemanager.ha.rm-ids
as an argument.
Requesting the RM1 state:
$ yarn rmadmin -getServiceState rm1
The output:
active
Similar request for the RM2 state:
$ yarn rmadmin -getServiceState rm2 standby
If automatic failover is enabled, you cannot use manual transition command.
CAUTION
Although you can override this by the –forcemanual flag, use it with caution.
|
A transition attempt:
$ yarn rmadmin -transitionToStandby rm1
The request is denied:
Automatic failover is enabled for org.apache.hadoop.yarn.client.RMHAServiceTarget@1d8299fd Refusing to manually manage HA state, since it may cause a split-brain scenario or other incorrect state. If you are very sure you know what you are doing, please specify the forcemanual flag.
See additionally YARN CLI for frequently used YARN commands.