balancer

Runs a cluster balancing utility. An administrator can simply press Ctrl+C to stop the rebalancing process.

NOTE

The blockpool policy is stricter than the DataNode policy.

The usage is as follows:

$ hdfs balancer
    [-policy <policy>]
    [-threshold <threshold>]
    [-exclude [-f <hosts-file> | <comma-separated list of hosts>]]
    [-include [-f <hosts-file> | <comma-separated list of hosts>]]
    [-source [-f <hosts-file> | <comma-separated list of hosts>]]
    [-blockpools <comma-separated list of blockpool ids>]
    [-idleiterations <idleiterations>]
    [-runDuringUpgrade]
    [-asService]
Arguments

-policy <policy>

Possible values:

  • DataNode (default) — cluster is balanced if each DataNode is balanced;

  • Blockpool — cluster is balanced if each block pool in each DataNode is balanced.

threshold <threshold> | <comma-separated list of hosts>

A percentage of disk capacity. This overwrites the default threshold

-exclude -f <hosts-file> | <comma-separated list of hosts>

Excludes the specified DataNodes from being balanced by the balancer

-include -f <hosts-file> | <comma-separated list of hosts>

Includes only the specified DataNodes to be balanced by the balancer

-source -f <hosts-file> | <comma-separated list of hosts>

Picks only the specified DataNodes as source nodes

-blockpools <comma-separated list of blockpool ids>

Runs the balancer only on blockpools included in this list

-idleiterations <iterations>

The maximum number of idle iterations before exit. This overwrites the default idleiterations

-runDuringUpgrade

Specifies whether to run the balancer during an ongoing HDFS upgrade. This is not usually desired since it will not affect used space on over-utilized machines

-asService

Runs the balancer as a long-running service

-h, --help

Displays the tool usage and help information

Found a mistake? Seleсt text and press Ctrl+Enter to report it