HDFS service management via ADCM

Overview

The ADCM UI provides actions to manage the HDFS service and its components. For information on how to run service actions, refer to ADH service actions.

The actions available for the HDFS service are listed in the table below.

Action Description

Start balancer

Starts the HDFS Balancer

Stop balancer

Stops the HDFS Balancer

Add/Remove components

Running this action opens the component-host mapping interface where you can add, remove, and distribute HDFS components

DataNode Maintenance/Decommission

Allows you to put DataNodes into maintenance mode, decommission them, or reinstate DataNodes under maintenance. A decommissioned DataNode’s data will be replicated to other DataNodes. This mode can be used to safely delete a DataNode or take it down for long-term maintenance. For short-term decommissioning, use the maintenance mode. In the maintenance mode, the DataNode does not accept changes and does not replicate or delete blocks

Check disk balancer

Gets the current Disk balancer status from the specified DataNodes. To view the report, go to the Jobs page

Report disk balancer

Reports volume information from the specified DataNodes. To view the report, go to the Jobs page

Start disk balancer

Starts the disk balancer

Stop disk balancer

Stops the disk balancer

Start mover

Starts the Mover. When starting the mover, enter the directories whose storage policies must be ensured

Stop mover

Stops the Mover

Change internal nameservices

Allows you to change the internal nameservices. The value must be alphanumeric without underscores

Manage Ranger plugin

Enables or disables Ranger plugin for HDFS

Check

Runs service-specific tests to check the health of the service and its components

Start

Starts the service. When you run this action, the option Apply configs from ADCM is available. If it is set to true, all service configurations defined in ADCM settings will be applied on the service startup. Otherwise, the service starts without applying configurations from ADCM

Stop

Stops the service

Remove

Removes the service from the cluster. This action should be used to remove already installed services. Whereas the delete control can be used to remove a non-mapped service (a service which components have not been distributed among cluster hosts)

Restart

Restarts the service. When you run this action, the option Apply configs from ADCM is available. If it is set to true, all service configurations defined in ADCM settings will be applied during the service restart. Otherwise, the service restarts without applying configurations from ADCM ​.

The service supports the Rolling restart option that allows you to restart its DataNode components one by one (or in batches) rather than all at once. This feature helps to avoid the service downtime during restarts, keeping the entire cluster operable.

The Rolling restart option has the following parameters:

  • batch_size — the number of DataNode components to be restarted in one iteration. Using batches is effective only if rack awareness is configured and the minimum number of racks is . It is recommended to set the batch size less than or equal to the number of hosts per rack so that only hosts belonging to a single rack get restarted at a time, while others stay active. If rack awareness is not configured, it is recommended to set the batch size to 1.

  • batch_delay — the delay in seconds between restarting batches of components.

  • health_checks — indicates whether to perform health checks on the restarted components.

  • max_failed_batches_number — the maximum number of component batches allowed to fail during restart. Upon reaching this value, the Restart action fails.

HDFS components actions
Action Description

Check

Verifies whether all the component instances in the cluster work correctly

Restart

Restarts all the component instances in the cluster

Start

Starts all the component instances in the cluster

Stop

Stops all the component instances in the cluster

Balancer

The Balancer helps managing DataNodes' load in a cluster. You can start the balancer whenever there’s an uneven distribution of data between DataNodes, for example, if a new DataNode has been created. The balancer stops when the DataNodes' load is at the acceptable threshold or lower.

The threshold represents how much the load of a specific DataNode may diverge from the load of the whole cluster, specified in percentage of the disk space.

After you select the Start balancer action, fill in the following fields in the window that appears (or leave empty to use the default values):

  • Threshold — a percentage value between 1 and 100. The default value is 10%. Smaller values make a more balanced cluster but the balancing will take longer. If a value is too small and the DataNodes' load changes concurrently, the cluster may not be able to reach the balanced state.

  • Hosts to exclude — FDQN of the hosts, whose DataNodes should be ignored by the balancer.

  • Hosts to include — FDQN of the hosts, whose DataNodes should be included in the balancing process. By default, all hosts are included.

  • Source hosts — FDQN of the hosts, whose DataNodes require balancing in the first place. The balancer will move blocks from only those specified DataNodes. By default, all hosts count as source hosts.

  • Idle iterations — the number of iterations a balancer can remain idle before it stops. The default value is 5.

You can run balancer with additional parameters by using the balancer CLI command or by changing its parameters in the hdfs-site.xml configuration file.

Disk balancer

Disk balancer helps managing load in a single DataNode between directories. You can add data directories in the dfs.datanode.data.dir parameter of HDFS configuration.

You can run disk balancer with additional parameters by using the disk balancer CLI command or by changing its parameters in the hdfs-site.xml configuration file.

Mover

The Mover is a data migration tool that checks if the data in the specified directory complies with the storage policy and if it doesn’t, moves the replicas to a different storage in order to fulfill the storage policy requirement.

After changing a data storage policy, it is not applied automatically. Use the mover action to ensure the new data storage policy is fulfilled.

You can run mover with additional parameters by using the mover CLI command or by changing its parameters in the hdfs-site.xml configuration file.

Internal nameservice

Internal nameservice is an additional (internal) name for an HDFS cluster that allows to query another HDFS cluster from the current one. For example, to transfer data between clusters or create tasks.

You can query any nameservice specified in the dfs.internal.nameservices parameter of the hdfs-site.xml configuration file. This cluster’s DataNodes will report to all the nameservices in this list.

Found a mistake? Seleсt text and press Ctrl+Enter to report it