CapacityScheduler is a pluggable scheduler that you can use in a Hadoop framework. It improves multi-tenancy of a shared cluster by allocating a certain capacity of the entire cluster to each organization using this cluster.
CapacityScheduler enables each organization to get its own queue with a portion of the cluster capacity allocated specifically for that queue.
In a shared cluster, security becomes very important. For each queue, there is an access control list (ACL) containing users that can submit applications to individual queues. It also ensures that users cannot view or modify applications run by other users in other queues. Also per-queue and system administrator roles are supported.
CapacityScheduler supports hierarchical queues. A queue is hierarchical if subqueues can be created within this queue. The portion of the cluster resources allocated to the queue can be further distributed among its subqueues. CapacityScheduler has a predefined queue called
root. All queues in the system are the children of the
An added benefit is that an organization can exceed its queue capacity and use more cluster resources than it was assigned, yet only if the extra capacity is available and not being used by others. This gives organizations more flexibility in a cost-effective manner.
To use CapacityScheduler in YARN, start with assigning the appropriate scheduler class in yarn-site.xml:
<property> <name>yarn.resourcemanager.scheduler.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value> </property>
To set up queues, edit the capacity-scheduler.xml configuration file:
yarn.scheduler.capacity.root.queuesproperty, specify a list of comma-separated top-level queues.
To set up subqueues in a queue defined as <queue-path>, in the
yarn.scheduler.capacity.<queue-path>.queuesproperty of this queue, specify a list of subqueues.
To configure queue capacity in percentage (%), use the
yarn.scheduler.capacity.<queue-path>.capacityproperty of that queue. The sum of capacities for all queues (at each level) must be equal to 100.
To set up the maximum capacity for a queue in percentage, use the
yarn.scheduler.capacity.<queue-path>.maximum-capacityproperty. This property limits the elasticity for applications in the queue. It defaults to
-1, which disables this limitation.
Let us configure two top-level queues (descending directly from
sales queue, there are two subqueues:
The hierarchy of queues in the CapacityScheduler configuration looks as follows:
<property> <name>yarn.scheduler.capacity.root.queues</name> <value>sales, finance</value> </property> <property> <name>yarn.scheduler.capacity.root.sales.queues</name> <value>apac,emea</value> </property>
To give 70% of the queue capacity to
sales and 30% to
finance, add the following settings:
<property> <name>yarn.scheduler.capacity.root.sales.capacity</name> <value>70</value> </property> <property> <name>yarn.scheduler.capacity.root.finance.capacity</name> <value>30</value> </property>
To allocate 65% of the
sales queue capacity to
apac and 35% to
emea, add the following settings:
<property> <name>yarn.scheduler.capacity.root.sales.apac.capacity</name> <value>65</value> </property> <property> <name>yarn.scheduler.capacity.root.sales.emea.capacity</name> <value>35</value> </property>
To ensure that the
sales queue doesn’t use more than 80% of the cluster resources (even if the resources are available), add the following:
<property> <name>yarn.scheduler.capacity.root.sales.maximum-capacity</name> <value>80</value> </property>
If a user, for example,
yarn supplies some applications without specifying a queue and there is no
default queue, then you must specify a default queue for that user. The following snippet makes the
marketing queue the default one for the
<property> <name>yarn.scheduler.capacity.queue-mappings</name> <value>u:yarn:marketing</value> </property>
In this configuration:
umeans that this a user property (
gmeans a property for a group);
yarnis the user name;
marketingis the name of the default queue for the specified user.