Configure and use GPU on YARN
Overview
For computationally intensive tasks, Hadoop can be configured to run YARN applications on GPU instead of CPU.
Prerequisites for using GPU on YARN:
-
An Nvidia GPU.
-
Nvidia drivers.
-
If Docker is used, nvidia-docker 1.0 has to be installed.
-
YARN uses Capacity scheduler.
GPU configuration
To configure the GPU support on YARN via ADCM:
-
On the Clusters page, select the desired cluster.
-
Navigate to Services and click at YARN.
-
Toggle the Show advanced option, find the GPU on YARN setting and set it to
true
.GPU on YARN setting in ADCM -
Find and click at the capacity-scheduler.xml property.
-
In the opened window, change the
yarn.scheduler.capacity.resource-calculator
property value toorg.apache.hadoop.yarn.util.resource.DominantResourceCalculator
:<configuration> <property> <name>yarn.scheduler.capacity.resource-calculator</name> <value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value> </property> </configuration>
-
In the yarn-site.xml parameter group, edit the properties to create the desired configuration for your cluster that would account for the GPU usage. For example:
<configuration> <property> <name>yarn.nodemanager.resource-plugins</name> <value>yarn.io/gpu</value> </property> <property> <name>yarn.resource-types</name> <value>yarn.io/gpu</value> </property> <property> <name>yarn.nodemanager.resource-plugins.gpu.allowed-gpu-devices</name> <value>auto</value> </property> <property> <name>yarn.nodemanager.resource-plugins.gpu.path-to-discovery-executables</name> <value>/usr/bin</value> </property> <property> <name>yarn.nodemanager.linux-container-executor.cgroups.mount</name> <value>true</value> </property> <property> <name>yarn.nodemanager.linux-container-executor.cgroups.mount-path</name> <value>/sys/fs/cgroup</value> </property> <property> <name>yarn.nodemanager.linux-container-executor.cgroups.hierarchy</name> <value>yarn</value> </property> <property> <name>yarn.nodemanager.container-executor.class</name> <value>org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor</value> </property> <property> <name>yarn.nodemanager.linux-container-executor.group</name> <value>yarn</value> </property> </configuration>
-
Find the container-executor.cfg template parameter and add the following values to the template:
yarn.nodemanager.linux-container-executor.group=yarn #--Original container-exectuor.cfg Content-- [gpu] module.enabled=true [cgroups] root=/sys/fs/cgroup yarn-hierarchy=yarn
If you need to run GPU applications under Docker environment, add these additional properties to container-executor.cfg template:
[docker] docker.allowed.devices=/dev/nvidiactl,/dev/nvidia-uvm,/dev/nvidia-uvm-tools,/dev/nvidia1,/dev/nvidia0 (1) docker.allowed.volume-drivers (2) docker.allowed.ro-mounts=nvidia_driver_375.66 (3) docker.allowed.runtimes=nvidia (4)
1 Adds GPU devices to the Docker section. Enter the list of values separated by comma. You can find the list of devices using the ls /dev/nvidia*
command.2 Adds nvidia-docker
to volume-driver whitelist.3 Adds nvidia_driver_<version>
to read-only mounts whitelist.4 To use nvidia-docker-v2 as the GPU Docker plugin, add nvidia
to runtimes whitelist. -
Click Apply and confirm changes to YARN configuration by clicking Save.
-
In the Actions drop-down menu, select Restart, make sure the Apply configs from ADCM option is set to
true
and click Run. This will automatically update all the necessary configuration files on all hosts.
NOTE
In some cases, it might be required to share the NodeManager’s local directory to all users to avoid permission-related errors. |
Depending on your cluster configuration, some of the parameters can be optional for using GPUs and can be set to default values.
Parameter | Description | Default value |
---|---|---|
yarn.nodemanager.resource-plugins |
Set to |
— |
yarn.nodemanager.resource-plugins.gpu.allowed-gpu-devices |
A list of comma-separated GPU devices in the following format: |
auto |
yarn.nodemanager.resource-plugins.gpu.path-to-discovery-executables |
The path to the GPU discovery binary used by NodeManagers for gathering the information about GPUs if |
— |
yarn.nodemanager.resource-plugins.gpu.docker-plugin |
The Docker command plugin for GPU. By default, the Nvidia Docker v1.0 is used |
nvidia-docker-v1 |
yarn.nodemanager.resource-plugins.gpu.docker-plugin.nvidia-docker-v1.endpoint |
The Nvidia Docker plugin endpoint |
http://localhost:3476/v1.0/docker/cli |
yarn.nodemanager.linux-container-executor.cgroups.mount |
Use the CGroup devices controller to do per-GPU device isolation |
true |
Use GPU on YARN
Here are the examples of running YARN applications using the yarn jar command in a YARN or Docker container.
To run an application in a YARN container, run:
$ yarn jar <path/to/hadoop-yarn-applications-distributedshell.jar> \
-jar <path/to/hadoop-yarn-applications-distributedshell.jar> \
-shell_command /usr/local/nvidia/bin/nvidia-smi \
-container_resources memory-mb=3072,vcores=1,yarn.io/gpu=2 \
-num_containers 2
This command requests 2 containers, 3GB each, 1 vCore, and 2 GPUs.
To run an application in a Docker container, run:
$ yarn jar <path/to/hadoop-yarn-applications-distributedshell.jar> \
-jar <path/to/hadoop-yarn-applications-distributedshell.jar> \
-shell_env YARN_CONTAINER_RUNTIME_TYPE=docker \
-shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=<docker-image-name> \
-shell_command nvidia-smi \
-container_resources memory-mb=3072,vcores=1,yarn.io/gpu=2 \
-num_containers 2
This command requests 2 containers, 3GB each, 1 vCore, and 2 GPUs.