Home
Arenadata Hyperwave
Services
HDFS
Service references
Configuration parameters

HDFS configuration parameters

Konstantin Alpashkin

Collapse content Expand content

To configure the service, use the following configuration parameters in ADCM.

NOTE

Some of the parameters become visible in the ADCM UI after the Advanced flag has been set.
The parameters that are set in the Custom group will overwrite the existing parameters even if they are read-only.

Credential Encryption

Parameter Description Default value

Encryption enable

Enables or disables the credential encryption feature. When enabled, HDFS stores configuration passwords and credentials required for interacting with other services in the encrypted form

false

Credential provider path

Path to a keystore file with secrets

jceks://file/etc/hadoop/conf/hadoop.jceks

Ranger plugin credential provider path

Path to a Ranger keystore file with secrets

jceks://file/etc/hadoop/conf/ranger-hdfs.jceks

Custom jceks

Set to true to use a custom JCEKS file. Set to false to use the default auto-generated JCEKS file

false

Password file name

Name of the file in the service’s classpath that stores passwords

hadoop_credstore_pass

Enable CORS

Parameter Description Default value

hadoop.http.cross-origin.enabled

Enables cross-origin support for all web services

true

hadoop.http.cross-origin.allowed-origins

Comma-separated list of origins that are allowed. Values prefixed with regex are interpreted as regular expressions. Values containing wildcards (*) are possible as well, here a regular expression is generated, the use is discouraged and support is only available for backward compatibility

hadoop.http.cross-origin.allowed-headers

Comma-separated list of allowed headers

X-Requested-With,Content-Type,Accept,Origin,WWW-Authenticate,Accept-Encoding,Transfer-Encoding

hadoop.http.cross-origin.allowed-methods

Comma-separated list of methods that are allowed

GET,PUT,POST,OPTIONS,HEAD,DELETE

hadoop.http.cross-origin.max-age

Number of seconds a pre-flighted request can be cached

1800

core_site.enable_cors.active

Enables CORS (Cross-Origin Resource Sharing)

true

hdfs-site.xml

Parameter Description Default value

dfs.client.block.write.replace-datanode-on-failure.enable

If there is a DataNode/network failure in the write pipeline, DFSClient will try to remove the failed DataNode from the pipeline and then continue writing with the remaining DataNodes. As a result, the number of DataNodes in the pipeline is decreased. The feature is to add new DataNodes to the pipeline. This is a site-wide property to enable/disable the feature. When the cluster size is extremely small, e.g. 3 nodes or less, cluster administrators may want to set the policy to NEVER in the default configuration file or disable this feature. Otherwise, users may experience an unusually high rate of pipeline failures since it is impossible to find new DataNodes for replacement. See also dfs.client.block.write.replace-datanode-on-failure.policy

true

dfs.client.block.write.replace-datanode-on-failure.policy

This property is used only if the value of dfs.client.block.write.replace-datanode-on-failure.enable is true. Possible values:

ALWAYS. Always adds a new DataNode, when an existing DataNode is removed.
NEVER. Never adds a new DataNode.
DEFAULT. Let r be the replication number. Let n be the number of existing DataNodes. Add a new DataNode only, if r is greater than or equal to 3 and either:
1. floor(r/2) is greater than or equal to n;
2. r is greater than n and the block is hflushed/appended.

DEFAULT

dfs.client.block.write.replace-datanode-on-failure.best-effort

This property is used only if the value of dfs.client.block.write.replace-datanode-on-failure.enable is true. Best effort means, that the client will try to replace a failed DataNode in write pipeline (provided that the policy is satisfied), however, it continues the write operation in case that the DataNode replacement also fails. Suppose, the DataNode replacement fails: false — an exception should be thrown so that the write will fail; true — the write should be resumed with the remaining DataNodes. Note, that setting this property to true allows writing to a pipeline with a smaller number of DataNodes. As a result, it increases the probability of data loss

false

dfs.client.block.write.replace-datanode-on-failure.min-replication

Minimum number of replications needed not to fail the write pipeline if new DataNodes can not be found to replace failed DataNodes (could be due to network failure) in the write pipeline. If the number of the remaining DataNodes in the write pipeline is greater than or equal to this property value, continue writing to the remaining nodes. Otherwise throw exception. If this is set to 0, an exception will be thrown, when a replacement can not be found. See also dfs.client.block.write.replace-datanode-on-failure.policy

dfs.balancer.dispatcherThreads

The size of the thread pool for the HDFS balancer block mover — dispatchExecutor

200

dfs.balancer.movedWinWidth

Time window in milliseconds for the HDFS balancer tracking blocks and its locations

5400000

dfs.balancer.moverThreads

The thread pool size for executing block moves — moverThreadAllocator

1000

dfs.balancer.max-size-to-move

Maximum number of bytes that can be moved by the balancer in a single thread

10737418240

dfs.balancer.getBlocks.min-block-size

Minimum block threshold size in bytes to ignore, when fetching a source block list

10485760

dfs.balancer.getBlocks.size

The total size in bytes of DataNode blocks to get, when fetching a source block list

2147483648

dfs.balancer.block-move.timeout

Maximum amount of time for a block to move (in milliseconds). If set greater than 0, the balancer will stop waiting for a block move completion after this time. In typical clusters, a 3-5 minute timeout is reasonable. If the timeout is set for a large proportion of block moves, this needs to be increased. It could also be that too much work is dispatched and many nodes are constantly exceeding the bandwidth limit as a result. In that case, other balancer parameters might need to be adjusted. It is disabled (0) by default

dfs.balancer.max-no-move-interval

If this specified amount of time has elapsed and no blocks have been moved out of a source DataNode, one more attempt will be made to move blocks out of this DataNode in the current Balancer iteration

60000

dfs.balancer.max-iteration-time

Maximum amount of time an iteration can be run by the Balancer. After this time the Balancer will stop the iteration, and re-evaluate the work needed to be done to balance the cluster. The default value is 20 minutes

1200000

dfs.blocksize

The default block size for new files (in bytes). You can use the following suffixes to define size units (case insensitive): k (kilo), m (mega), g (giga), t (tera), p (peta), e (exa). For example, 128k, 512m, 1g, etc. You can also specify the block size in bytes (such as 134217728 for 128 MB)

134217728

dfs.client.read.shortcircuit

Turns on short-circuit local reads

true

dfs.datanode.balance.max.concurrent.moves

Maximum number of threads for DataNode balancer pending moves. This value is reconfigurable via the dfsadmin -reconfig command

dfs.datanode.data.dir

Determines, where on the local filesystem a DFS data node should store its blocks. If multiple directories are specified, then data will be stored in all named directories, typically on different devices. The directories should be tagged with corresponding storage types (SSD/DISK/ARCHIVE/RAM_DISK) for HDFS storage policies. The default storage type will be DISK if the directory does not have a storage type tagged explicitly. Directories, that do not exist, will be created, if the local filesystem permission allows

/srv/hadoop-hdfs/data:DISK

dfs.disk.balancer.max.disk.throughputInMBperSec

Maximum disk bandwidth, used by the disk balancer during reads from a source disk. The unit is MB/sec

dfs.disk.balancer.block.tolerance.percent

The parameter specifies when a good enough value is reached for any copy step (in percents). For example, if set to 10 then getting close to 10% of the target value is considered as good enough. In other words, if the move operation is 20GB in size, if 18GB (20 * (1-10%)) can be moved, the entire operation is considered successful

dfs.disk.balancer.max.disk.errors

During a block move from a source to destination disk, there might be various errors. This parameter defines how many errors to tolerate before declaring a move between 2 disks (or a step) has failed

dfs.disk.balancer.plan.valid.interval

Maximum amount of time a disk balancer plan (a set of configurations that define the data volume to be redistributed between two disks) remains valid. This setting supports multiple time unit suffixes as described in dfs.heartbeat.interval. If no suffix is specified, then milliseconds are assumed

dfs.disk.balancer.plan.threshold.percent

Defines a data storage threshold in percents at which disks start participating in data redistribution or balancing activities

dfs.domain.socket.path

Path to a UNIX domain socket that will be used for communication between the DataNode and local HDFS clients. If the string _PORT is present in this path, it will be replaced by the TCP port of the DataNode. The parameter is optional

/var/lib/hadoop-hdfs/dn_socket

dfs.hosts

Names a file that contains a list of hosts allowed to connect to the NameNode. The full pathname of the file must be specified. If the value is empty, all hosts are permitted

/etc/hadoop/conf/dfs.hosts

dfs.mover.movedWinWidth

Minimum time interval for a block to be moved to another location again (in milliseconds)

5400000

dfs.mover.moverThreads

Sets the balancer mover thread pool size

1000

dfs.mover.retry.max.attempts

Maximum number of retries before the mover considers the move as failed

dfs.mover.max-no-move-interval

If this specified amount of time has elapsed and no block has been moved out of a source DataNode, one more attempt will be made to move blocks out of this DataNode in the current mover iteration

60000

dfs.namenode.name.dir

Determines where on the local filesystem the DFS name node should store the name table (fsimage). If multiple directories are specified, then the name table is replicated in all of the directories, for redundancy

/srv/hadoop-hdfs/name

dfs.namenode.checkpoint.dir

Determines where on the local filesystem the DFS secondary name node should store the temporary images to merge. If multiple directories are specified, then the image is replicated in all of the directories for redundancy

/srv/hadoop-hdfs/checkpoint

dfs.namenode.hosts.provider.classname

The class that provides access for host files. org.apache.hadoop.hdfs.server.blockmanagement.HostFileManager is used by default that loads files specified by dfs.hosts and dfs.hosts.exclude. If org.apache.hadoop.hdfs.server.blockmanagement.CombinedHostFileManager is used, it will load the JSON file defined in dfs.hosts. To change the class name, NameNode restart is required. dfsadmin -refreshNodes only refreshes the configuration files, used by the class

org.apache.hadoop.hdfs.server.blockmanagement.CombinedHostFileManager

dfs.namenode.rpc-bind-host

The actual address, the RPC Server will bind to. If this optional address is set, it overrides only the hostname portion of dfs.namenode.rpc-address. It can also be specified per NameNode or name service for HA/Federation. This is useful for making the NameNode listen on all interfaces by setting it to 0.0.0.0

0.0.0.0

dfs.permissions.superusergroup

Name of the group of super-users. The value should be a single group name

hadoop

dfs.replication

The default block replication. The actual number of replications can be specified, when the file is created. The default is used, if replication is not specified in create time

dfs.journalnode.http-address

The HTTP address of the JournalNode web UI

0.0.0.0:8480

dfs.journalnode.https-address

The HTTPS address of the JournalNode web UI

0.0.0.0:8481

dfs.journalnode.rpc-address

The RPC address of the JournalNode web UI

0.0.0.0:8485

dfs.datanode.http.address

The address of the DataNode HTTP server

0.0.0.0:9864

dfs.datanode.https.address

The address of the DataNode HTTPS server

0.0.0.0:9865

dfs.datanode.address

The address of the DataNode for data transfer

0.0.0.0:9866

dfs.datanode.ipc.address

The IPC address of the DataNode

0.0.0.0:9867

dfs.namenode.http-address

The address and the base port to access the dfs NameNode web UI

0.0.0.0:9870

dfs.namenode.https-address

The secure HTTPS address of the NameNode

0.0.0.0:9871

dfs.ha.automatic-failover.enabled

Defines whether automatic failover is enabled

true

dfs.ha.fencing.methods

A list of scripts or Java classes that will be used to fence the Active NameNode during a failover

shell(/bin/true)

dfs.journalnode.edits.dir

The directory where to store journal edit files

/srv/hadoop-hdfs/journalnode

dfs.namenode.shared.edits.dir

The directory on shared storage between the multiple NameNodes in an HA cluster. This directory will be written by the active and read by the standby in order to keep the namespaces synchronized. This directory does not need to be listed in dfs.namenode.edits.dir. It should be left empty in a non-HA cluster

---

dfs.internal.nameservices

A unique nameservices identifier for a cluster or federation. For a single cluster, specify the name that will be used as an alias. For HDFS federation, specify, separated by commas, all namespaces associated with this cluster. This option allows you to use an alias instead of an IP address or FQDN for some commands, for example: hdfs dfs -ls hdfs://<dfs.internal.nameservices>. The value must be alphanumeric without underscores

—

dfs.block.access.token.enable

If set to true, access tokens are used as capabilities for accessing DataNodes. If set to false, no access tokens are checked on accessing DataNodes

false

dfs.namenode.kerberos.principal

The NameNode service principal. This is typically set to nn/_HOST@REALM.TLD. Each NameNode will substitute _HOST with its own fully qualified hostname during the startup. The _HOST placeholder allows using the same configuration setting on both NameNodes in an HA setup

nn/_HOST@REALM

dfs.namenode.keytab.file

The keytab file used by each NameNode daemon to login as its service principal. The principal name is configured with dfs.namenode.kerberos.principal

/etc/security/keytabs/nn.service.keytab

dfs.namenode.kerberos.internal.spnego.principal

HTTP Kerberos principal name for the NameNode

HTTP/_HOST@REALM

dfs.web.authentication.kerberos.principal

Kerberos principal name for the WebHDFS

HTTP/_HOST@REALM

dfs.web.authentication.kerberos.keytab

Kerberos keytab file for WebHDFS

/etc/security/keytabs/HTTP.service.keytab

dfs.journalnode.kerberos.principal

The JournalNode service principal. This is typically set to jn/_HOST@REALM.TLD. Each JournalNode will substitute _HOST with its own fully qualified hostname at startup. The _HOST placeholder allows using the same configuration setting on all JournalNodes

jn/_HOST@REALM

dfs.journalnode.keytab.file

The keytab file used by each JournalNode daemon to login as its service principal. The principal name is configured with dfs.journalnode.kerberos.principal

/etc/security/keytabs/jn.service.keytab

dfs.journalnode.kerberos.internal.spnego.principal

The server principal used by the JournalNode HTTP Server for SPNEGO authentication when Kerberos security is enabled. This is typically set to HTTP/_HOST@REALM.TLD. The SPNEGO server principal begins with the prefix HTTP/ by convention. If the value is *, the web server will attempt to login with every principal specified in the keytab file dfs.web.authentication.kerberos.keytab. For most deployments this can be set to ${dfs.web.authentication.kerberos.principal} that is use the value of dfs.web.authentication.kerberos.principal

HTTP/_HOST@REALM

dfs.datanode.data.dir.perm

Permissions for the directories on the local filesystem where the DFS DataNode stores its blocks. The permissions can either be octal or symbolic

700

dfs.datanode.kerberos.principal

The DataNode service principal. This is typically set to dn/_HOST@REALM.TLD. Each DataNode will substitute _HOST with its own fully qualified host name at startup. The _HOST placeholder allows using the same configuration setting on all DataNodes

dn/_HOST@REALM.TLD

dfs.datanode.keytab.file

The keytab file used by each DataNode daemon to login as its service principal. The principal name is configured with dfs.datanode.kerberos.principal

/etc/security/keytabs/dn.service.keytab

dfs.http.policy

Defines if HTTPS (SSL) is supported on HDFS. This configures the HTTP endpoint for HDFS daemons. The following values are supported: HTTP_ONLY — the service is provided only via http; HTTPS_ONLY — the service is provided only via https; HTTP_AND_HTTPS — the service is provided both via http and https

HTTP_ONLY

dfs.data.transfer.protection

A comma-separated list of SASL protection values used for secured connections to the DataNode when reading or writing block data. The possible values are:

authentication — provides only authentication; no integrity or privacy;
integrity — authentication and integrity are enabled;
privacy — authentication, integrity and privacy are enabled.

If dfs.encrypt.data.transfer=true, then it supersedes the setting for dfs.data.transfer.protection and enforces that all connections must use a specialized encrypted SASL handshake. This property is ignored for connections to a DataNode listening on a privileged port. In this case, it is assumed that the use of a privileged port establishes sufficient trust

—

dfs.encrypt.data.transfer

Defines whether or not actual block data that is read/written from/to HDFS should be encrypted on the wire. This only needs to be set on the NameNodes and DataNodes, clients will deduce this automatically. It is possible to override this setting per connection by specifying custom logic via dfs.trustedchannel.resolver.class

false

dfs.encrypt.data.transfer.algorithm

This value may be set to either 3des or rc4. If nothing is set, then the configured JCE default on the system is used (usually 3DES). It is widely believed that 3DES is more secure, but RC4 is substantially faster. Note that if AES is supported by both the client and server, then this encryption algorithm will only be used to initially transfer keys for AES

3des

dfs.encrypt.data.transfer.cipher.suites

This value can be either undefined or AES/CTR/NoPadding. If defined, then dfs.encrypt.data.transfer uses the specified cipher suite for data encryption. If not defined, then only the algorithm specified in dfs.encrypt.data.transfer.algorithm is used

—

dfs.encrypt.data.transfer.cipher.key.bitlength

The key bitlength negotiated by dfsclient and datanode for encryption. This value may be set to either 128, 192, or 256

128

ignore.secure.ports.for.testing

Allows skipping HTTPS requirements in the SASL mode

false

dfs.client.https.need-auth

Whether SSL client certificate authentication is required

false

Federation

Parameter Description Default value

Federation nameservice

The name of the federation nameservice

ns-fed

Import configuration

Auto-generated configuration of imported clusters

—

Federation configuration

Auto-generated federation parameters

—

External clusters configuration

This section allows you to manually import an ADH cluster to a federation. To import a cluster, specify the following parameters:

namenodes — specify the NameNodes of the imported cluster (NameNode ID and RPC address);
nameservice — nameservice ID of the imported cluster;
proxy_provider — implementation of the failover proxy provider used for HDFS Router access in the high availability mode. Use the default implementation org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider, which is sufficient for most cases.

—

Proxy provider

Class implementing the failover proxy provider used for Router HA

org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider

dfs.federation.router.rpc-address

RPC address to handle client request to the federation

0.0.0.0:8888

dfs.federation.router.admin-address

RPC address to handle admin requests

0.0.0.0:8111

dfs.federation.router.http-address

HTTP address to handle web requests to HDFS Router (web UI, WebHDFS REST API)

0.0.0.0:50071

dfs.federation.router.https-address

HTTPS address to handle web requests to HDFS Router (web UI, WebHDFS REST API)

0.0.0.0:50071

dfs.federation.router.store.driver.zk.parent-path

Parent znode path in ZooKeeper used by StateStoreZooKeeperImpl

/hdfs-federation

dfs.federation.router.store.serializer

Class used to serialize/deserialize state store records

org.apache.hadoop.hdfs.server.federation.store.driver.impl.StateStoreSerializerPBImpl

dfs.federation.router.store.driver.class

Implementation of the federation state store. The default implementation uses ZooKeeper as a state store

org.apache.hadoop.hdfs.server.federation.store.driver.impl.StateStoreZooKeeperImpl

dfs.federation.router.file.resolver.client.class

Class responsible for resolving paths to subclusters within a federation

org.apache.hadoop.hdfs.server.federation.resolver.MountTableResolver

dfs.federation.router.monitor.namenode

Identifier of the NameNodes to monitor and send heartbeats

—

dfs.nameservice.id

Specifies which nameservice ID the client should use by default when connecting to a federation

<current-hdfs-nameservice-id>

hdfs-rbf-site.xml

Parameter Description Default value

dfs.federation.router.default.nameserviceId

Nameservice ID of the default subcluster, to which HDFS Router forwards requests if no specific mount point is set

—

dfs.federation.router.default.nameservice.enable

Enables reading and writing files to the default subcluster

true

dfs.federation.router.rpc.enable

Allows HDFS Router to handle RPC requests from clients

true

dfs.federation.router.rpc-bind-host

Address of the RPC server to bind to. If this optional address is set, it overrides only the hostname portion of dfs.federation.router.rpc-address

—

dfs.federation.router.handler.count

Number of threads for HDFS Router to handle RPC requests from clients

dfs.federation.router.handler.queue.size

Size of the queue to handle RPC client requests

100

dfs.federation.router.reader.count

Number of readers for HDFS Router to handle RPC client requests

dfs.federation.router.reader.queue.size

Size of the queue for readers to handle RPC client requests

100

dfs.federation.router.connection.creator.queue-size

Size of async connection creator queue

100

dfs.federation.router.connection.pool-size

Size of the pool of connections from HDFS Router to NameNodes

dfs.federation.router.connection.min-active-ratio

Minimum ratio of active connections from HDFS Router to NameNodes

0.5f

dfs.federation.router.connection.clean.ms

Interval in milliseconds to check if the connection pool should remove unused connections

10000

dfs.federation.router.enable.multiple.socket

Enables/disables the use of multiple sockets for accessing NameNodes

false

dfs.federation.router.max.concurrency.per.connection

Maximum number of requests a single connection can handle concurrently

dfs.federation.router.connection.pool.clean.ms

Interval in milliseconds to check, if the connection manager should remove unused connection pools

60000

dfs.federation.router.metrics.enable

Enables/disables generating HDFS Router metrics

true

dfs.federation.router.dn-report.time-out

Timeout for getDatanodeReport() in milliseconds

1000

dfs.federation.router.dn-report.cache-expire

Expiration time in seconds for a DataNode report

10s

dfs.federation.router.enable.get.dn.usage

If set to true, the getNodeUsage() method in RBFMetrics returns an up-to-date result

true

dfs.federation.router.metrics.class

Class to monitor the RPC system in HDFS Router

org.apache.hadoop.hdfs.server.federation.metrics.FederationRPCPerformanceMonitor

dfs.federation.router.admin.enable

Allows the RPC admin service in HDFS Router to handle client requests

true

dfs.federation.router.admin-bind-host

Address for the RPC admin server to bind to

—

dfs.federation.router.admin.handler.count

Number of threads for HDFS Router to handle admin RPC requests

dfs.federation.router.admin.mount.check.enable

If set to true, modifying the mount table includes a destination check

false

dfs.federation.router.http-bind-host

Address the HTTP server will bind to. If this optional address is set, it overrides only the hostname portion of dfs.federation.router.http-address

—

dfs.federation.router.https-bind-host

Address the HTTPS server will bind to. If this optional address is set, it overrides only the hostname portion of dfs.federation.router.https-address

—

dfs.federation.router.http.enable

Enables/disables handling client requests to HDFS Router over HTTP

true

dfs.federation.router.fs-limits.max-component-length

Maximum number of bytes (in UTF-8 encoding) in each component of a path for HDFS Router. Multiple size unit suffixes are supported (case-insensitive). Acts similarly to dfs.namenode.fs-limits.max-component-length on the NameNode side. Setting 0 disables the check

dfs.federation.router.namenode.resolver.client.class

Class to resolve NameNode membership in a subcluster

org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver

dfs.federation.router.store.enable

Enables the HDFS Router access to the state store

true

dfs.federation.router.store.connection.test

Specifies how often to check the connection to the state store in milliseconds

60000

dfs.federation.router.store.driver.zk.async.max.threads

Maximum number of threads for StateStoreZooKeeperImpl in the async mode. Currently, the only supported class is org.apache.hadoop.hdfs.server.federation.store.driver.impl.StateStoreZooKeeperImpl. The default value -1 means that StateStoreZooKeeperImpl is working in the synchronous mode. Use a positive integer value to enable the async mode

-1

dfs.federation.router.heartbeat.enable

Enables HDFS Router heartbeats to the state store

true

dfs.federation.router.heartbeat.interval

Interval in milliseconds at which HDFS Router sends heartbeats to the state store

5000

dfs.federation.router.health.monitor.timeout

Timeout for HDFS Router to obtain HAServiceStatus from a NameNode

30s

dfs.federation.router.namenode.heartbeat.enable

If set to true, gets NameNode heartbeats and sends them to the state store. If not specified, takes the value of dfs.federation.router.heartbeat.enable

true

dfs.federation.router.namenode.heartbeat.jmx.interval

Interval in milliseconds at which HDFS Router requests JMX reports from a NameNode. If set to 0, requests JMX reports every time a NameNode report is requested. If a negative value is used, disables gathering JMX reports from NameNodes

dfs.federation.router.store.router.expiration

Expiration time in milliseconds for a state record

dfs.federation.router.store.router.expiration.deletion

Time in milliseconds before an expired router state is deleted. If an expired router state record exists longer than the time specified, it will be deleted. If set to a negative value, the deletion is disabled

-1

dfs.federation.router.safemode.enable

Enables the HDFS Router safe mode

true

dfs.federation.router.safemode.extension

Time for HDFS Router to run in safe mode after startup. The parameter supports multiple time unit suffixes. If no suffix is specified, then milliseconds are assumed

30s

dfs.federation.router.safemode.expiration

Time during which HDFS Router cannot access the state store to enter the safe mode. The parameter supports multiple time unit suffixes. If no suffix is specified, then milliseconds are assumed

dfs.federation.router.safemode.checkperiod

Interval to check for HDFS Router’s safe mode. The parameter supports multiple time unit suffixes. If no suffix is specified, then milliseconds are assumed

dfs.federation.router.monitor.namenode.nameservice.resolution-enabled

Used by HDFS Router to resolve NameNodes. Determines if the given monitored NameNode address is a domain name which needs to be resolved

false

dfs.federation.router.monitor.namenode.nameservice.resolver.impl

Nameservice resolver implementation used by HDFS Router. Effective in combination with dfs.federation.router.monitor.namenode.nameservices.resolution-enabled=true

—

dfs.federation.router.monitor.localnamenode.enable

If set to true, HDFS Router monitors the NameNode on the local machine

false

dfs.federation.router.mount-table.max-cache-size

Maximum number of entries in the mount table cache

10000

dfs.federation.router.mount-table.cache.enable

Enables/disables the mount table cache. Disabling the cache is recommended when a large number of unique paths are queried

true

dfs.federation.router.quota.enable

Enables the quota system for HDFS Router. When enabled, setting or clearing a sub-cluster’s quota directly is not recommended, since the Router Admin server will override the sub-cluster’s quotas

false

dfs.federation.router.quota-cache.update.interval

Interval for updating the quota usage cache in HDFS Router. This property is effective only if dfs.federation.router.quota.enable=true. The parameter supports multiple time unit suffixes. If no suffix is specified, then milliseconds are assumed

60s

dfs.federation.router.client.thread-size

Maximum number of threads for RouterClient to execute concurrent requests

dfs.federation.router.client.retry.max.attempts

Maximum retry attempts for RouterClient when communicating with HDFS Router

dfs.federation.router.client.reject.overload

Setting true rejects client requests when a router runs out of RPC client threads

false

dfs.federation.router.client.allow-partial-listing

Defines whether HDFS Router can return a partial list of files in a multi-destination mount point when one of the subclusters is unavailable. Setting true may return a partial list of files, if a subcluster is down. Using false will fail the request otherwise

true

dfs.federation.router.client.mount-status.time-out

Timeout for HDFS Router when listing folders containing mount points. During this process, HDFS Router has to check the mount table and then check permissions in the subcluster. If the timeout expires, returns default values

dfs.federation.router.connect.timeout

Timeout for HDFS Router to connect to a subcluster

dfs.federation.router.keytab.file

The keytab file used by HDFS Router to log in as its service principal. The principal name is configured with dfs.federation.router.kerberos.principal

—

dfs.federation.router.kerberos.principal

The HDFS Router service principal. This is typically set to router/_HOST@REALM.TLD. Each HDFS Router will substitute _HOST with its own FQDN at startup. The _HOST placeholder allows using the same configuration setting on all HDFS Routers in an HA setup

—

dfs.federation.router.kerberos.principal.hostname

Host name of the HDFS Router containing this configuration file. This value differs for each machine. Defaults to the current host name

—

dfs.federation.router.kerberos.internal.spnego.principal

Server principal used by HDFS Router for web UI SPNEGO authentication when Kerberos is enabled. This is typically set to HTTP/_HOST@REALM.TLD. The SPNEGO server principal begins with the HTTP/ prefix by convention. If the value is *, the web server will attempt to log in with every principal specified in the keytab file (dfs.web.authentication.kerberos.keytab)

—

dfs.federation.router.mount-table.cache.update

Set to true to enable MountTableRefreshService. This service updates mount table cache immediately after modifying mount table entries. If this service is not enabled, mount table cache is refreshed periodically by StateStoreCacheUpdateService

false

dfs.federation.router.mount-table.cache.update.timeout

Time to wait till all the admin servers finish their mount table cache update. This setting supports multiple time unit suffixes

dfs.federation.router.mount-table.cache.update.client.max.time

The remote Router mount table cache is updated through RouterClient (RPC calls). For better performance, RouterClient connections are cached for a limited time. This parameter defines the maximum time a connection can be cached. The parameter supports multiple time unit suffixes. If no suffix is specified, then milliseconds are assumed

dfs.federation.router.secret.manager.class

Class implementing state store for managing delegation tokens

org.apache.hadoop.hdfs.server.federation.router.security.token.ZKDelegationTokenSecretManagerImpl

dfs.federation.router.top.num.token.realowners

Number of top owners of delegation tokens to report in HDFS Router’s JMX metrics, ordered by the number of issued tokens. If set to 0, the top owners list is disabled

dfs.federation.router.fairness.policy.controller.class

Fairness policy controller class

org.apache.hadoop.hdfs.server.federation.router.fairness.BasicFairnessPolicy

dfs.federation.router.fairness.acquire.timeout

Maximum time to wait for a permit

dfs.federation.router.federation.rename.bandwidth

Maximum bandwidth for cross-namespace rename operations

dfs.federation.router.federation.rename.map

Maximum number of concurrent rename maps to use for copy

dfs.federation.router.federation.rename.delay

Delay in milliseconds to retry a rename job

1000

dfs.federation.router.federation.rename.diff

Threshold of the diff entries used in the incremental copy stage

dfs.federation.router.federation.rename.option

Action to run when renaming across namespaces. Possible values are NONE and DISTCP

NONE

dfs.federation.router.federation.rename.force.close.open.file

Enables force-closing of all open files when there are no diffs in the DIFF_DISTCP stage

true

dfs.federation.router.federation.rename.trash

Controls the "trash" behavior when performing a cross-namespace rename. Supported values:

trash — moves the source path to trash;
delete — deletes the source path directly;
skip — skips both trash and deletion.

trash

dfs.federation.router.observer.read.default

Enables observer reads (served by standby or observer NameNodes) for all nameservices. This parameter can be inverted for individual namespaces by adding them to dfs.federation.router.observer.read.overrides

false

dfs.federation.router.observer.read.overrides

Comma-separated list of namespaces, for which to invert the default observer read behavior (dfs.federation.router.observer.read.default)

—

dfs.federation.router.observer.federated.state.propagation.maxsize

Maximum size of the federated state to send in an RPC header. Sending federated state removes the need to run msync on every read call, though at the cost of a larger header. The tradeoff between the larger header and frequent msync operations should consider the number of active namespaces and the latency of the msync requests

dfs.federation.router.observer.state.id.refresh.period

Interval to refresh namespace stateID via an active NameNode. This ensures the namespace stateID is refreshed, while an observer NameNode may be out-of-sync. Setting a negative value disables the auto-refresh

15s

zk-dt-secret-manager.zkConnectionString

ZooKeeper connection string for ZKDelegationTokenSecretManagerImpl

—

zk-dt-secret-manager.zkAuthType

Authentication type for connecting to ZooKeeper

—

httpfs-site.xml

Parameter Description Default value

httpfs.http.administrators

The ACL for the admins. This configuration is used to control who can access the default servlets for HttpFS server. The value should be a comma-separated list of users and groups. The user list comes first and is separated by a space, followed by the group list, for example: user1,user2 group1,group2. Both users and groups are optional, so you can define only users, or groups, or both of them. Notice that in all these cases you should always use the leading space in the groups list. Using the asterisk grants access to all users and groups

hadoop.http.temp.dir

The HttpFS temp directory

${hadoop.tmp.dir}/httpfs

httpfs.ssl.enabled

Defines whether SSL is enabled. Default is false, that is disabled

false

httpfs.hadoop.config.dir

The location of the Hadoop configuration directory

/etc/hadoop/conf

httpfs.hadoop.authentication.type

Defines the authentication mechanism used by httpfs for its HTTP clients. Valid values are simple and kerberos. If simple is used, clients must specify the username with the user.name query string parameter. If kerberos is used, HTTP clients must use HTTP SPNEGO or delegation tokens

simple

httpfs.hadoop.authentication.kerberos.keytab

The Kerberos keytab file with the credentials for the HTTP Kerberos principal used by httpfs in the HTTP endpoint. httpfs.authentication.kerberos.keytab is deprecated. Instead, use hadoop.http.authentication.kerberos.keytab

/etc/security/keytabs/httpfs.service.keytab

httpfs.hadoop.authentication.kerberos.principal

The HTTP Kerberos principal used by HttpFS in the HTTP endpoint. The HTTP Kerberos principal MUST start with HTTP/ as per Kerberos HTTP SPNEGO specification. httpfs.authentication.kerberos.principal is deprecated. Instead, use hadoop.http.authentication.kerberos.principal

HTTP/${httpfs.hostname}@${kerberos.realm}

ranger-hdfs-audit.xml

Parameter Description Default value

xasecure.audit.destination.solr.batch.filespool.dir

Spool directory path

/srv/ranger/{service-name}_plugin/audit_solr_spool

xasecure.audit.destination.solr.urls

A URL of the Solr server to store audit events. Leave this property value empty or set it to NONE when using ZooKeeper to connect to Solr

—

xasecure.audit.destination.solr.zookeepers

Specifies the ZooKeeper connection string for the Solr destination

—

xasecure.audit.destination.solr.force.use.inmemory.jaas.config

Whether to use in-memory JAAS configuration file to connect to Solr

—

xasecure.audit.is.enabled

Enables Ranger audit

true

xasecure.audit.jaas.Client.loginModuleControlFlag

Specifies whether the success of the module is required, requisite, sufficient, or optional

—

xasecure.audit.jaas.Client.loginModuleName

Name of the authenticator class

—

xasecure.audit.jaas.Client.option.keyTab

Name of the keytab file to get the principal’s secret key

—

xasecure.audit.jaas.Client.option.principal

Name of the principal to be used

—

xasecure.audit.jaas.Client.option.serviceName

Name of a user or a service that wants to log in

—

xasecure.audit.jaas.Client.option.storeKey

Set this to true if you want the keytab or the principal’s key to be stored in the subject’s private credentials

false

xasecure.audit.jaas.Client.option.useKeyTab

Set this to true if you want the module to get the principal’s key from the keytab

false

ranger-hdfs-security.xml

Parameter

Description

Default value

ranger.plugin.hdfs.policy.rest.url

The URL to Ranger Admin

—

ranger.plugin.hdfs.service.name

The name of the Ranger service containing policies for this instance

—

ranger.plugin.hdfs.policy.cache.dir

The directory where Ranger policies are cached after successful retrieval from the source

/srv/ranger/hdfs/policycache

ranger.plugin.hdfs.policy.pollIntervalMs

Defines how often to poll for changes in policies

30000

ranger.plugin.hdfs.policy.rest.client.connection.timeoutMs

The HDFS Plugin RangerRestClient connection timeout (in milliseconds)

120000

ranger.plugin.hdfs.policy.rest.client.read.timeoutMs

The HDFS Plugin RangerRestClient read timeout (in milliseconds)

30000

ranger.plugin.hdfs.policy.rest.ssl.config.file

Path to the RangerRestClient SSL config file for the HDFS plugin

/etc/hadoop/conf/ranger-hdfs-policymgr-ssl.xml

xasecure.add-hadoop-authorization

Whether to add Hadoop authorization

true

ranger.plugin.hive.service.name

Hive service name in Ranger

—

ranger.plugin.hive.policy.source.impl

Policy client implementation class

org.apache.ranger.admin.client.RangerAdminRESTClient

ranger.plugin.hdfs.hive.resource.mappings.file.location

Path to the persistent mappings file

/srv/ranger/hdfs/hive-resource-mappings

ranger.plugin.hdfs.hive.resource.mappings.refresh.interval.ms

Interval for refreshing mappings from Ranger Admin in milliseconds

30000

ranger.plugin.hdfs.hive.resource.mappings.file.flush.interval.ms

Interval for flushing mappings to disk in milliseconds

60000

httpfs-env.sh

Parameter Description Default value

Sources

A list of sources which will be written into httpfs-env.sh

—

HADOOP_CONF_DIR

Hadoop configuration directory

/etc/hadoop/conf

HADOOP_LOG_DIR

Path to the directory that contains application logs (.log files) and startup logs (.out files)

${HTTPFS_LOG}

HADOOP_PID_DIR

PID file directory location

${HTTPFS_TEMP}

HTTPFS_SSL_ENABLED

Defines if SSL is enabled for httpfs

false

HTTPFS_SSL_KEYSTORE_FILE

Path to the keystore file

admin

HTTPFS_SSL_KEYSTORE_PASS

The password to access the keystore

admin

Final HTTPFS_ENV_OPTS

Final value of the HTTPFS_ENV_OPTS parameter in httpfs-env.sh

—

hadoop-env.sh

Parameter Description Default value

Sources

A list of sources that will be written into hadoop-env.sh

—

HDFS_NAMENODE_OPTS

NameNode Heap Memory. Sets initial (-Xms) and maximum (-Xmx) Java heap memory size and environment options for the NameNode

-Xms1G -Xmx8G

HDFS_DATANODE_OPTS

DataNode Heap Memory. Sets initial (-Xms) and maximum (-Xmx) Java heap memory size and environment options for the DataNode

-Xms700m -Xmx8G

HDFS_HTTPFS_OPTS

HttpFS Heap Memory. Sets initial (-Xms) and maximum (-Xmx) Java heap memory size and environment options for the httpfs server

-Xms700m -Xmx8G

HDFS_JOURNALNODE_OPTS

JournalNode Heap Memory. Sets initial (-Xms) and maximum (-Xmx) Java heap memory size and environment options for the JournalNode

-Xms700m -Xmx8G

HDFS_ZKFC_OPTS

ZKFC Heap Memory. Sets initial (-Xms) and maximum (-Xmx) Java heap memory size and environment options for ZKFC

-Xms500m -Xmx8G

Final HADOOP_ENV_OPTS

Final value of the HADOOP_ENV_OPTS parameter in hadoop-env.sh

—

ssl-server.xml

Parameter

Description

Default value

ssl.server.truststore.location

The truststore to be used by NameNodes and DataNodes

—

ssl.server.truststore.password

The password to the truststore

—

ssl.server.truststore.type

The truststore file format

jks

ssl.server.truststore.reload.interval

The truststore reload check interval (in milliseconds)

10000

ssl.server.keystore.location

Path to the keystore file used by NameNodes and DataNodes

—

ssl.server.keystore.password

The password to the keystore

—

ssl.server.keystore.keypassword

The password to the key in the keystore

—

ssl.server.keystore.type

The keystore file format

—

Lists of decommissioned and in maintenance hosts

Parameter Description Default value

DECOMMISSIONED

When an administrator decommissions a DataNode, the DataNode will first be transitioned into DECOMMISSION_INPROGRESS state. After all blocks belonging to that DataNode are fully replicated elsewhere based on each block replication factor, the DataNode will be transitioned to DECOMMISSIONED state. After that, the administrator can shutdown the node to perform long-term repair and maintenance that could take days or weeks. After the machine has been repaired, the machine can be recommissioned back to the cluster

—

IN_MAINTENANCE

Sometimes administrators only need to take DataNodes down for minutes/hours to perform short-term repair/maintenance. For such scenarios, the HDFS block replication overhead, incurred by decommission, might not be necessary and a light-weight process is desirable. And that is what maintenance state is used for. When an administrator puts a DataNode in the maintenance state, the DataNode will first be transitioned to ENTERING_MAINTENANCE state. As long as all blocks belonging to that DataNode, are minimally replicated elsewhere, the DataNode will immediately be transitioned to IN_MAINTENANCE state. After the maintenance has completed, the administrator can take the DataNode out of the maintenance state. In addition, maintenance state supports the timeout that allows administrators to configure the maximum duration, in which a DataNode is allowed to stay in the maintenance state. After the timeout, the DataNode will be transitioned out of maintenance state automatically by HDFS without human intervention

—

Other

Parameter

Description

Default value

Additional nameservices

Additional (internal) names for an HDFS cluster that allows querying another HDFS cluster from the current one

—

Custom core-site.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file core-site.xml

—

Custom hdfs-site.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file hdfs-site.xml

—

Custom httpfs-site.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file httpfs-site.xml

—

Ranger plugin enabled

Whether or not Ranger plugin is enabled

—

Custom ranger-hdfs-audit.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ranger-hdfs-audit.xml

—

Custom ranger-hdfs-security.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ranger-hdfs-security.xml

—

Custom ranger-hdfs-policymgr-ssl.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ranger-hdfs-policymgr-ssl.xml

—

Custom httpfs-env.sh

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file httpfs-env.sh

—

Custom hadoop-env.sh

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file hadoop-env.sh

—

Custom ssl-server.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ssl-server.xml

—

Custom ssl-client.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ssl-client.xml

—

Topology script

The topology script used in HDFS

—

Topology data

An otional text file to map host names to the rack number for topology script. Stored to /etc/hadoop/conf/topology.data

—

Custom log4j.properties

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file log4j.properties

log4j.properties

Custom httpfs-log4j.properties

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file httpfs-log4j.properties

httpfs-log4j.properties

HDFS DataNode component

Monitoring
Parameter	Description	Default value
Java agent path	Path to the JMX Prometheus Java agent	/usr/lib/adh-utils/jmx/jmx_prometheus_javaagent.jar
Prometheus metrics port	Port on which to display HDFS DataNode metrics in the Prometheus format	9202
Mapping config path	Path to the metrics mapping configuration file	/etc/hadoop/conf/jmx_hdfs_datanode_metric_config.yml
Mapping config	Metrics mapping configuration file	hdfs-mapping-config.yml

HDFS JournalNode component

Monitoring
Parameter	Description	Default value
Java agent path	Path to the JMX Prometheus Java agent	/usr/lib/adh-utils/jmx/jmx_prometheus_javaagent.jar
Prometheus metrics port	Port on which to display HDFS JournalNode metrics in the Prometheus format	9203
Mapping config path	Path to the metrics mapping configuration file	/etc/hadoop/conf/jmx_hdfs_journalnode_metric_config.yml
Mapping config	Metrics mapping configuration file	hdfs-mapping-config.yml

HDFS NameNode component

Monitoring
Parameter	Description	Default value
Java agent path	Path to the JMX Prometheus Java agent	/usr/lib/adh-utils/jmx/jmx_prometheus_javaagent.jar
Prometheus metrics port	Port on which to display HDFS NameNode metrics in the Prometheus format	9201
Mapping config path	Path to the metrics mapping configuration file	/etc/hadoop/conf/jmx_hdfs_namenode_metric_config.yml
Mapping config	Metrics mapping configuration file	hdfs-mapping-config.yml

Monitoring authentication
Parameter	Description	Default value
Username	Username for basic authentication	—
Password	Password for basic authentication	—

NOTE

When the Monitoring authentication parameter group is enabled, access to metrics becomes restricted, and Prometheus uses the specified credentials for data collection.

Found a mistake? Seleсt text and press Ctrl+Enter to report it