HDFS configuration parameters
To configure the service, use the following configuration parameters in ADCM.
|
NOTE
|
| Parameter | Description | Default value |
|---|---|---|
Encryption enable |
Enables or disables the credential encryption feature. When enabled, HDFS stores configuration passwords and credentials required for interacting with other services in the encrypted form |
false |
Credential provider path |
Path to a keystore file with secrets |
jceks://file/etc/hadoop/conf/hadoop.jceks |
Ranger plugin credential provider path |
Path to a Ranger keystore file with secrets |
jceks://file/etc/hadoop/conf/ranger-hdfs.jceks |
Custom jceks |
Set to |
false |
Password file name |
Name of the file in the service’s classpath that stores passwords |
hadoop_credstore_pass |
| Parameter | Description | Default value |
|---|---|---|
hadoop.http.cross-origin.enabled |
Enables cross-origin support for all web services |
true |
hadoop.http.cross-origin.allowed-origins |
Comma-separated list of origins that are allowed. Values prefixed with |
* |
hadoop.http.cross-origin.allowed-headers |
Comma-separated list of allowed headers |
X-Requested-With,Content-Type,Accept,Origin,WWW-Authenticate,Accept-Encoding,Transfer-Encoding |
hadoop.http.cross-origin.allowed-methods |
Comma-separated list of methods that are allowed |
GET,PUT,POST,OPTIONS,HEAD,DELETE |
hadoop.http.cross-origin.max-age |
Number of seconds a pre-flighted request can be cached |
1800 |
core_site.enable_cors.active |
Enables CORS (Cross-Origin Resource Sharing) |
true |
| Parameter | Description | Default value |
|---|---|---|
dfs.client.block.write.replace-datanode-on-failure.enable |
If there is a DataNode/network failure in the write pipeline, DFSClient will try to remove the failed DataNode from the pipeline and then continue writing with the remaining DataNodes.
As a result, the number of DataNodes in the pipeline is decreased.
The feature is to add new DataNodes to the pipeline.
This is a site-wide property to enable/disable the feature.
When the cluster size is extremely small, e.g. 3 nodes or less, cluster administrators may want to set the policy to |
true |
dfs.client.block.write.replace-datanode-on-failure.policy |
This property is used only if the value of
|
DEFAULT |
dfs.client.block.write.replace-datanode-on-failure.best-effort |
This property is used only if the value of |
false |
dfs.client.block.write.replace-datanode-on-failure.min-replication |
Minimum number of replications needed not to fail the write pipeline if new DataNodes can not be found to replace failed DataNodes (could be due to network failure) in the write pipeline.
If the number of the remaining DataNodes in the write pipeline is greater than or equal to this property value, continue writing to the remaining nodes.
Otherwise throw exception.
If this is set to |
0 |
dfs.balancer.dispatcherThreads |
The size of the thread pool for the HDFS balancer block mover — dispatchExecutor |
200 |
dfs.balancer.movedWinWidth |
Time window in milliseconds for the HDFS balancer tracking blocks and its locations |
5400000 |
dfs.balancer.moverThreads |
The thread pool size for executing block moves — moverThreadAllocator |
1000 |
dfs.balancer.max-size-to-move |
Maximum number of bytes that can be moved by the balancer in a single thread |
10737418240 |
dfs.balancer.getBlocks.min-block-size |
Minimum block threshold size in bytes to ignore, when fetching a source block list |
10485760 |
dfs.balancer.getBlocks.size |
The total size in bytes of DataNode blocks to get, when fetching a source block list |
2147483648 |
dfs.balancer.block-move.timeout |
Maximum amount of time for a block to move (in milliseconds).
If set greater than |
0 |
dfs.balancer.max-no-move-interval |
If this specified amount of time has elapsed and no blocks have been moved out of a source DataNode, one more attempt will be made to move blocks out of this DataNode in the current Balancer iteration |
60000 |
dfs.balancer.max-iteration-time |
Maximum amount of time an iteration can be run by the Balancer.
After this time the Balancer will stop the iteration, and re-evaluate the work needed to be done to balance the cluster.
The default value is |
1200000 |
dfs.blocksize |
The default block size for new files (in bytes).
You can use the following suffixes to define size units (case insensitive): |
134217728 |
dfs.client.read.shortcircuit |
Turns on short-circuit local reads |
true |
dfs.datanode.balance.max.concurrent.moves |
Maximum number of threads for DataNode balancer pending moves.
This value is reconfigurable via the |
50 |
dfs.datanode.data.dir |
Determines, where on the local filesystem a DFS data node should store its blocks.
If multiple directories are specified, then data will be stored in all named directories, typically on different devices.
The directories should be tagged with corresponding storage types ( |
/srv/hadoop-hdfs/data:DISK |
dfs.disk.balancer.max.disk.throughputInMBperSec |
Maximum disk bandwidth, used by the disk balancer during reads from a source disk. The unit is MB/sec |
10 |
dfs.disk.balancer.block.tolerance.percent |
The parameter specifies when a good enough value is reached for any copy step (in percents).
For example, if set to |
10 |
dfs.disk.balancer.max.disk.errors |
During a block move from a source to destination disk, there might be various errors. This parameter defines how many errors to tolerate before declaring a move between 2 disks (or a step) has failed |
5 |
dfs.disk.balancer.plan.valid.interval |
Maximum amount of time a disk balancer plan (a set of configurations that define the data volume to be redistributed between two disks) remains valid.
This setting supports multiple time unit suffixes as described in |
1d |
dfs.disk.balancer.plan.threshold.percent |
Defines a data storage threshold in percents at which disks start participating in data redistribution or balancing activities |
10 |
dfs.domain.socket.path |
Path to a UNIX domain socket that will be used for communication between the DataNode and local HDFS clients.
If the string |
/var/lib/hadoop-hdfs/dn_socket |
dfs.hosts |
Names a file that contains a list of hosts allowed to connect to the NameNode. The full pathname of the file must be specified. If the value is empty, all hosts are permitted |
/etc/hadoop/conf/dfs.hosts |
dfs.mover.movedWinWidth |
Minimum time interval for a block to be moved to another location again (in milliseconds) |
5400000 |
dfs.mover.moverThreads |
Sets the balancer mover thread pool size |
1000 |
dfs.mover.retry.max.attempts |
Maximum number of retries before the mover considers the move as failed |
10 |
dfs.mover.max-no-move-interval |
If this specified amount of time has elapsed and no block has been moved out of a source DataNode, one more attempt will be made to move blocks out of this DataNode in the current mover iteration |
60000 |
dfs.namenode.name.dir |
Determines where on the local filesystem the DFS name node should store the name table (fsimage). If multiple directories are specified, then the name table is replicated in all of the directories, for redundancy |
/srv/hadoop-hdfs/name |
dfs.namenode.checkpoint.dir |
Determines where on the local filesystem the DFS secondary name node should store the temporary images to merge. If multiple directories are specified, then the image is replicated in all of the directories for redundancy |
/srv/hadoop-hdfs/checkpoint |
dfs.namenode.hosts.provider.classname |
The class that provides access for host files.
|
org.apache.hadoop.hdfs.server.blockmanagement.CombinedHostFileManager |
dfs.namenode.rpc-bind-host |
The actual address, the RPC Server will bind to.
If this optional address is set, it overrides only the hostname portion of |
0.0.0.0 |
dfs.permissions.superusergroup |
Name of the group of super-users. The value should be a single group name |
hadoop |
dfs.replication |
The default block replication. The actual number of replications can be specified, when the file is created. The default is used, if replication is not specified in create time |
3 |
dfs.journalnode.http-address |
The HTTP address of the JournalNode web UI |
0.0.0.0:8480 |
dfs.journalnode.https-address |
The HTTPS address of the JournalNode web UI |
0.0.0.0:8481 |
dfs.journalnode.rpc-address |
The RPC address of the JournalNode web UI |
0.0.0.0:8485 |
dfs.datanode.http.address |
The address of the DataNode HTTP server |
0.0.0.0:9864 |
dfs.datanode.https.address |
The address of the DataNode HTTPS server |
0.0.0.0:9865 |
dfs.datanode.address |
The address of the DataNode for data transfer |
0.0.0.0:9866 |
dfs.datanode.ipc.address |
The IPC address of the DataNode |
0.0.0.0:9867 |
dfs.namenode.http-address |
The address and the base port to access the dfs NameNode web UI |
0.0.0.0:9870 |
dfs.namenode.https-address |
The secure HTTPS address of the NameNode |
0.0.0.0:9871 |
dfs.ha.automatic-failover.enabled |
Defines whether automatic failover is enabled |
true |
dfs.ha.fencing.methods |
A list of scripts or Java classes that will be used to fence the Active NameNode during a failover |
shell(/bin/true) |
dfs.journalnode.edits.dir |
The directory where to store journal edit files |
/srv/hadoop-hdfs/journalnode |
dfs.namenode.shared.edits.dir |
The directory on shared storage between the multiple NameNodes in an HA cluster.
This directory will be written by the active and read by the standby in order to keep the namespaces synchronized.
This directory does not need to be listed in |
--- |
dfs.internal.nameservices |
A unique nameservices identifier for a cluster or federation. For a single cluster, specify the name that will be used as an alias. For HDFS federation, specify, separated by commas, all namespaces associated with this cluster. This option allows you to use an alias instead of an IP address or FQDN for some commands, for example: |
— |
dfs.block.access.token.enable |
If set to |
false |
dfs.namenode.kerberos.principal |
The NameNode service principal.
This is typically set to |
nn/_HOST@REALM |
dfs.namenode.keytab.file |
The keytab file used by each NameNode daemon to login as its service principal.
The principal name is configured with |
/etc/security/keytabs/nn.service.keytab |
dfs.namenode.kerberos.internal.spnego.principal |
HTTP Kerberos principal name for the NameNode |
HTTP/_HOST@REALM |
dfs.web.authentication.kerberos.principal |
Kerberos principal name for the WebHDFS |
HTTP/_HOST@REALM |
dfs.web.authentication.kerberos.keytab |
Kerberos keytab file for WebHDFS |
/etc/security/keytabs/HTTP.service.keytab |
dfs.journalnode.kerberos.principal |
The JournalNode service principal.
This is typically set to |
jn/_HOST@REALM |
dfs.journalnode.keytab.file |
The keytab file used by each JournalNode daemon to login as its service principal.
The principal name is configured with |
/etc/security/keytabs/jn.service.keytab |
dfs.journalnode.kerberos.internal.spnego.principal |
The server principal used by the JournalNode HTTP Server for SPNEGO authentication when Kerberos security is enabled.
This is typically set to |
HTTP/_HOST@REALM |
dfs.datanode.data.dir.perm |
Permissions for the directories on the local filesystem where the DFS DataNode stores its blocks. The permissions can either be octal or symbolic |
700 |
dfs.datanode.kerberos.principal |
The DataNode service principal.
This is typically set to |
dn/_HOST@REALM.TLD |
dfs.datanode.keytab.file |
The keytab file used by each DataNode daemon to login as its service principal.
The principal name is configured with |
/etc/security/keytabs/dn.service.keytab |
dfs.http.policy |
Defines if HTTPS (SSL) is supported on HDFS.
This configures the HTTP endpoint for HDFS daemons.
The following values are supported: |
HTTP_ONLY |
dfs.data.transfer.protection |
A comma-separated list of SASL protection values used for secured connections to the DataNode when reading or writing block data. The possible values are:
If |
— |
dfs.encrypt.data.transfer |
Defines whether or not actual block data that is read/written from/to HDFS should be encrypted on the wire.
This only needs to be set on the NameNodes and DataNodes, clients will deduce this automatically.
It is possible to override this setting per connection by specifying custom logic via |
false |
dfs.encrypt.data.transfer.algorithm |
This value may be set to either |
3des |
dfs.encrypt.data.transfer.cipher.suites |
This value can be either undefined or |
— |
dfs.encrypt.data.transfer.cipher.key.bitlength |
The key bitlength negotiated by dfsclient and datanode for encryption.
This value may be set to either |
128 |
ignore.secure.ports.for.testing |
Allows skipping HTTPS requirements in the SASL mode |
false |
dfs.client.https.need-auth |
Whether SSL client certificate authentication is required |
false |
| Parameter | Description | Default value |
|---|---|---|
httpfs.http.administrators |
The ACL for the admins.
This configuration is used to control who can access the default servlets for HttpFS server.
The value should be a comma-separated list of users and groups.
The user list comes first and is separated by a space, followed by the group list, for example: |
* |
hadoop.http.temp.dir |
The HttpFS temp directory |
${hadoop.tmp.dir}/httpfs |
httpfs.ssl.enabled |
Defines whether SSL is enabled.
Default is |
false |
httpfs.hadoop.config.dir |
The location of the Hadoop configuration directory |
/etc/hadoop/conf |
httpfs.hadoop.authentication.type |
Defines the authentication mechanism used by httpfs for its HTTP clients.
Valid values are |
simple |
httpfs.hadoop.authentication.kerberos.keytab |
The Kerberos keytab file with the credentials for the HTTP Kerberos principal used by httpfs in the HTTP endpoint.
|
/etc/security/keytabs/httpfs.service.keytab |
httpfs.hadoop.authentication.kerberos.principal |
The HTTP Kerberos principal used by HttpFS in the HTTP endpoint.
The HTTP Kerberos principal MUST start with |
HTTP/${httpfs.hostname}@${kerberos.realm} |
| Parameter | Description | Default value |
|---|---|---|
xasecure.audit.destination.solr.batch.filespool.dir |
Spool directory path |
/srv/ranger/hdfs_plugin/audit_solr_spool |
xasecure.audit.destination.solr.urls |
A URL of the Solr server to store audit events.
Leave this property value empty or set it to |
— |
xasecure.audit.destination.solr.zookeepers |
Specifies the ZooKeeper connection string for the Solr destination |
— |
xasecure.audit.destination.solr.force.use.inmemory.jaas.config |
Whether to use in-memory JAAS configuration file to connect to Solr |
— |
xasecure.audit.is.enabled |
Enables Ranger audit |
true |
xasecure.audit.jaas.Client.loginModuleControlFlag |
Specifies whether the success of the module is |
— |
xasecure.audit.jaas.Client.loginModuleName |
Name of the authenticator class |
— |
xasecure.audit.jaas.Client.option.keyTab |
Name of the keytab file to get the principal’s secret key |
— |
xasecure.audit.jaas.Client.option.principal |
Name of the principal to be used |
— |
xasecure.audit.jaas.Client.option.serviceName |
Name of a user or a service that wants to log in |
— |
xasecure.audit.jaas.Client.option.storeKey |
Set this to |
false |
xasecure.audit.jaas.Client.option.useKeyTab |
Set this to |
false |
| Parameter | Description | Default value |
|---|---|---|
ranger.plugin.hdfs.policy.rest.url |
The URL to Ranger Admin |
— |
ranger.plugin.hdfs.service.name |
The name of the Ranger service containing policies for this instance |
— |
ranger.plugin.hdfs.policy.cache.dir |
The directory where Ranger policies are cached after successful retrieval from the source |
/srv/ranger/hdfs/policycache |
ranger.plugin.hdfs.policy.pollIntervalMs |
Defines how often to poll for changes in policies |
30000 |
ranger.plugin.hdfs.policy.rest.client.connection.timeoutMs |
The HDFS Plugin RangerRestClient connection timeout (in milliseconds) |
120000 |
ranger.plugin.hdfs.policy.rest.client.read.timeoutMs |
The HDFS Plugin RangerRestClient read timeout (in milliseconds) |
30000 |
ranger.plugin.hdfs.policy.rest.ssl.config.file |
Path to the RangerRestClient SSL config file for the HDFS plugin |
/etc/hadoop/conf/ranger-hdfs-policymgr-ssl.xml |
| Parameter | Description | Default value |
|---|---|---|
Sources |
A list of sources which will be written into httpfs-env.sh |
— |
HADOOP_CONF_DIR |
Hadoop configuration directory |
/etc/hadoop/conf |
HADOOP_LOG_DIR |
Location of the log directory |
${HTTPFS_LOG} |
HADOOP_PID_DIR |
PID file directory location |
${HTTPFS_TEMP} |
HTTPFS_SSL_ENABLED |
Defines if SSL is enabled for httpfs |
false |
HTTPFS_SSL_KEYSTORE_FILE |
Path to the keystore file |
admin |
HTTPFS_SSL_KEYSTORE_PASS |
The password to access the keystore |
admin |
Final HTTPFS_ENV_OPTS |
Final value of the |
— |
| Parameter | Description | Default value |
|---|---|---|
Sources |
A list of sources that will be written into hadoop-env.sh |
— |
HDFS_NAMENODE_OPTS |
NameNode Heap Memory. Sets initial (-Xms) and maximum (-Xmx) Java heap memory size and environment options for the NameNode |
-Xms1G -Xmx8G |
HDFS_DATANODE_OPTS |
DataNode Heap Memory. Sets initial (-Xms) and maximum (-Xmx) Java heap memory size and environment options for the DataNode |
-Xms700m -Xmx8G |
HDFS_HTTPFS_OPTS |
HttpFS Heap Memory. Sets initial (-Xms) and maximum (-Xmx) Java heap memory size and environment options for the httpfs server |
-Xms700m -Xmx8G |
HDFS_JOURNALNODE_OPTS |
JournalNode Heap Memory. Sets initial (-Xms) and maximum (-Xmx) Java heap memory size and environment options for the JournalNode |
-Xms700m -Xmx8G |
HDFS_ZKFC_OPTS |
ZKFC Heap Memory. Sets initial (-Xms) and maximum (-Xmx) Java heap memory size and environment options for ZKFC |
-Xms500m -Xmx8G |
Final HADOOP_ENV_OPTS |
Final value of the |
— |
| Parameter | Description | Default value |
|---|---|---|
ssl.server.truststore.location |
The truststore to be used by NameNodes and DataNodes |
— |
ssl.server.truststore.password |
The password to the truststore |
— |
ssl.server.truststore.type |
The truststore file format |
jks |
ssl.server.truststore.reload.interval |
The truststore reload check interval (in milliseconds) |
10000 |
ssl.server.keystore.location |
Path to the keystore file used by NameNodes and DataNodes |
— |
ssl.server.keystore.password |
The password to the keystore |
— |
ssl.server.keystore.keypassword |
The password to the key in the keystore |
— |
ssl.server.keystore.type |
The keystore file format |
— |
| Parameter | Description | Default value |
|---|---|---|
DECOMMISSIONED |
When an administrator decommissions a DataNode, the DataNode will first be transitioned into |
— |
IN_MAINTENANCE |
Sometimes administrators only need to take DataNodes down for minutes/hours to perform short-term repair/maintenance.
For such scenarios, the HDFS block replication overhead, incurred by decommission, might not be necessary and a light-weight process is desirable.
And that is what maintenance state is used for.
When an administrator puts a DataNode in the maintenance state, the DataNode will first be transitioned to |
— |
| Parameter | Description | Default value |
|---|---|---|
Additional nameservices |
Additional (internal) names for an HDFS cluster that allows querying another HDFS cluster from the current one |
— |
Custom core-site.xml |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file core-site.xml |
— |
Custom hdfs-site.xml |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file hdfs-site.xml |
— |
Custom httpfs-site.xml |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file httpfs-site.xml |
— |
Ranger plugin enabled |
Whether or not Ranger plugin is enabled |
— |
Custom ranger-hdfs-audit.xml |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ranger-hdfs-audit.xml |
— |
Custom ranger-hdfs-security.xml |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ranger-hdfs-security.xml |
— |
Custom ranger-hdfs-policymgr-ssl.xml |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ranger-hdfs-policymgr-ssl.xml |
— |
Custom httpfs-env.sh |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file httpfs-env.sh |
— |
Custom hadoop-env.sh |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file hadoop-env.sh |
— |
Custom ssl-server.xml |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ssl-server.xml |
— |
Custom ssl-client.xml |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ssl-client.xml |
— |
Topology script |
The topology script used in HDFS |
— |
Topology data |
An otional text file to map host names to the rack number for topology script. Stored to /etc/hadoop/conf/topology.data |
— |
Custom log4j.properties |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file log4j.properties |
|
Custom httpfs-log4j.properties |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file httpfs-log4j.properties |
| Parameter | Description | Default value |
|---|---|---|
Java agent path |
Path to the JMX Prometheus Java agent |
/usr/lib/adh-utils/jmx/jmx_prometheus_javaagent.jar |
Prometheus metrics port |
Port on which to display HDFS DataNode metrics in the Prometheus format |
9202 |
Mapping config path |
Path to the metrics mapping configuration file |
/etc/hadoop/conf/jmx_hdfs_datanode_metric_config.yml |
Mapping config |
Metrics mapping configuration file |
| Parameter | Description | Default value |
|---|---|---|
Java agent path |
Path to the JMX Prometheus Java agent |
/usr/lib/adh-utils/jmx/jmx_prometheus_javaagent.jar |
Prometheus metrics port |
Port on which to display HDFS JournalNode metrics in the Prometheus format |
9203 |
Mapping config path |
Path to the metrics mapping configuration file |
/etc/hadoop/conf/jmx_hdfs_journalnode_metric_config.yml |
Mapping config |
Metrics mapping configuration file |
| Parameter | Description | Default value |
|---|---|---|
Java agent path |
Path to the JMX Prometheus Java agent |
/usr/lib/adh-utils/jmx/jmx_prometheus_javaagent.jar |
Prometheus metrics port |
Port on which to display HDFS NameNode metrics in the Prometheus format |
9201 |
Mapping config path |
Path to the metrics mapping configuration file |
/etc/hadoop/conf/jmx_hdfs_namenode_metric_config.yml |
Mapping config |
Metrics mapping configuration file |