
Runs a cluster balancing utility. An administrator can simply press Ctrl+C to stop the rebalancing process.


The blockpool policy is stricter than the DataNode policy.

The usage is as follows:

$ hdfs balancer
    [-policy <policy>]
    [-threshold <threshold>]
    [-exclude [-f <hosts-file> | <comma-separated list of hosts>]]
    [-include [-f <hosts-file> | <comma-separated list of hosts>]]
    [-source [-f <hosts-file> | <comma-separated list of hosts>]]
    [-blockpools <comma-separated list of blockpool ids>]
    [-idleiterations <idleiterations>]

-policy <policy>

Possible values:

  • DataNode (default) — cluster is balanced if each DataNode is balanced;

  • Blockpool — cluster is balanced if each block pool in each DataNode is balanced.

threshold <threshold> | <comma-separated list of hosts>

A percentage of disk capacity. This overwrites the default threshold

-exclude -f <hosts-file> | <comma-separated list of hosts>

Excludes the specified DataNodes from being balanced by the balancer

-include -f <hosts-file> | <comma-separated list of hosts>

Includes only the specified DataNodes to be balanced by the balancer

-source -f <hosts-file> | <comma-separated list of hosts>

Picks only the specified DataNodes as source nodes

-blockpools <comma-separated list of blockpool ids>

Runs the balancer only on blockpools included in this list

-idleiterations <iterations>

The maximum number of idle iterations before exit. This overwrites the default idleiterations


Specifies whether to run the balancer during an ongoing HDFS upgrade. This is not usually desired since it will not affect used space on over-utilized machines


Runs the balancer as a long-running service

-h, --help

Displays the tool usage and help information

Found a mistake? Seleсt text and press Ctrl+Enter to report it