Configure and use GPU on YARN

Elena Kostyuchenko

Contents

Overview
GPU configuration
Use GPU on YARN

Overview

For computationally intensive tasks, Hadoop can be configured to run YARN applications on GPU instead of CPU.

Prerequisites for using GPU on YARN:

An Nvidia GPU.
Nvidia drivers.
If Docker is used, nvidia-docker 1.0 has to be installed.
YARN uses Capacity scheduler.

GPU configuration

To configure the GPU support on YARN via ADCM:

On the Clusters page, select the desired cluster.
Navigate to Services and click at YARN.
Toggle the Show advanced option, find the GPU on YARN setting and set it to true.

GPU on YARN setting in ADCM
Find and click at the capacity-scheduler.xml property.

In the opened window, change the yarn.scheduler.capacity.resource-calculator property value to org.apache.hadoop.yarn.util.resource.DominantResourceCalculator:

<configuration>
  <property>
    <name>yarn.scheduler.capacity.resource-calculator</name>
    <value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value>
  </property>
</configuration>

In the yarn-site.xml parameter group, edit the properties to create the desired configuration for your cluster that would account for the GPU usage. For example:

<configuration>
  <property>
    <name>yarn.nodemanager.resource-plugins</name>
    <value>yarn.io/gpu</value>
  </property>
  <property>
     <name>yarn.resource-types</name>
     <value>yarn.io/gpu</value>
  </property>
  <property>
     <name>yarn.nodemanager.resource-plugins.gpu.allowed-gpu-devices</name>
     <value>auto</value>
  </property>
  <property>
     <name>yarn.nodemanager.resource-plugins.gpu.path-to-discovery-executables</name>
     <value>/usr/bin</value>
  </property>
  <property>
     <name>yarn.nodemanager.linux-container-executor.cgroups.mount</name>
     <value>true</value>
  </property>
  <property>
     <name>yarn.nodemanager.linux-container-executor.cgroups.mount-path</name>
     <value>/sys/fs/cgroup</value>
  </property>
  <property>
     <name>yarn.nodemanager.linux-container-executor.cgroups.hierarchy</name>
     <value>yarn</value>
  </property>
  <property>
    <name>yarn.nodemanager.container-executor.class</name>
    <value>org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor</value>
  </property>
  <property>
    <name>yarn.nodemanager.linux-container-executor.group</name>
    <value>yarn</value>
  </property>
</configuration>

Find the container-executor.cfg template parameter and add the following values to the template:

yarn.nodemanager.linux-container-executor.group=yarn

#--Original container-exectuor.cfg Content--

[gpu]
module.enabled=true
[cgroups]
root=/sys/fs/cgroup
yarn-hierarchy=yarn

If you need to run GPU applications under Docker environment, add these additional properties to container-executor.cfg template:

[docker]
docker.allowed.devices=/dev/nvidiactl,/dev/nvidia-uvm,/dev/nvidia-uvm-tools,/dev/nvidia1,/dev/nvidia0 (1)
docker.allowed.volume-drivers (2)
docker.allowed.ro-mounts=nvidia_driver_375.66 (3)
docker.allowed.runtimes=nvidia (4)

1	Adds GPU devices to the Docker section. Enter the list of values separated by comma. You can find the list of devices using the `ls /dev/nvidia*` command.
2	Adds `nvidia-docker` to volume-driver whitelist.
3	Adds `nvidia_driver_<version>` to read-only mounts whitelist.
4	To use nvidia-docker-v2 as the GPU Docker plugin, add `nvidia` to runtimes whitelist.

Click Apply and confirm changes to YARN configuration by clicking Save.
In the Actions drop-down menu, select Restart, make sure the Apply configs from ADCM option is set to true and click Run. This will automatically update all the necessary configuration files on all hosts.

NOTE

In some cases, it might be required to share the NodeManager’s local directory to all users to avoid permission-related errors.

Depending on your cluster configuration, some of the parameters can be optional for using GPUs and can be set to default values.

GPU on YARN: optional parameters
Parameter	Description	Default value
yarn.nodemanager.resource-plugins	Set to `yarn.io/gpu` to enable GPU isolation module on the NodeManager side if necessary. By default, YARN automatically detects and configures GPUs	—
yarn.nodemanager.resource-plugins.gpu.allowed-gpu-devices	A list of comma-separated GPU devices in the following format: `GPU1_index:GPU1_minor_number, GPU2_index:GPU2_minor_number`. For example: `0:0,1:1,2:2`. If the default value is set, YARN will automatically discover GPU resources in the system. If it fails, you can use this property to manually specify the GPUs	auto
yarn.nodemanager.resource-plugins.gpu.path-to-discovery-executables	The path to the GPU discovery binary used by NodeManagers for gathering the information about GPUs if `yarn.nodemanager.resource.gpu.allowed-gpu-devices` is set to `auto`. For example: `/usr/local/bin/nvidia-smi`. If this value is empty, a NodeManager will try to locate the binary on its own	—
yarn.nodemanager.resource-plugins.gpu.docker-plugin	The Docker command plugin for GPU. By default, the Nvidia Docker v1.0 is used	nvidia-docker-v1
yarn.nodemanager.resource-plugins.gpu.docker-plugin.nvidia-docker-v1.endpoint	The Nvidia Docker plugin endpoint	http://localhost:3476/v1.0/docker/cli
yarn.nodemanager.linux-container-executor.cgroups.mount	Use the CGroup devices controller to do per-GPU device isolation	true

Use GPU on YARN

Here are the examples of running YARN applications using the yarn jar command in a YARN or Docker container.

YARN container
Docker container

To run an application in a YARN container, run:

$ yarn jar <path/to/hadoop-yarn-applications-distributedshell.jar> \
  -jar <path/to/hadoop-yarn-applications-distributedshell.jar> \
  -shell_command /usr/local/nvidia/bin/nvidia-smi \
  -container_resources memory-mb=3072,vcores=1,yarn.io/gpu=2 \
  -num_containers 2

This command requests 2 containers, 3GB each, 1 vCore, and 2 GPUs.

To run an application in a Docker container, run:

$ yarn jar <path/to/hadoop-yarn-applications-distributedshell.jar> \
       -jar <path/to/hadoop-yarn-applications-distributedshell.jar> \
       -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker \
       -shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=<docker-image-name> \
       -shell_command nvidia-smi \
       -container_resources memory-mb=3072,vcores=1,yarn.io/gpu=2 \
       -num_containers 2

This command requests 2 containers, 3GB each, 1 vCore, and 2 GPUs.

Found a mistake? Seleсt text and press Ctrl+Enter to report it