Configuration parameters

This topic describes the parameters that can be configured for ADH services via ADCM. To read about the configuring process, refer to the relevant articles: Online installation, Offline installation.

NOTE
Some of the parameters become visible in the ADCM UI after the Advanced flag being set.

Airflow

Airflow environment
Parameter Description Default value

airflow_dir

The Airflow home directory

/srv/airflow/home

db_dir

The location of Metastore DB

/srv/airflow/metastore

airflow.cfg
Parameter Description Default value

db_user

The user to connect to Metadata DB

airflow

db_password

The password to connect to Metadata DB

 — 

db_root_password

The root password to connect to Metadata DB

 — 

db_port

The port to connect to Metadata DB

3307

server_port

The port to run the web server

8080

flower_port

The port that Celery Flower runs on

5555

worker_port

When you start an Airflow Worker, Airflow starts a tiny web server subprocess to serve the Workers local log files to the Airflow main web server, which then builds pages and sends them to users. This defines the port, on which the logs are served. The port must be free and accessible from the main web server to connect to the Workers

8793

redis_port

The port for running Redis

6379

fernet_key

The secret key to save connection passwords in the database

 — 

security

Defines which security module to use. For example, kerberos

 — 

keytab

The path to the keytab file

 — 

reinit_frequency

Sets the ticket renewal frequency

3600

principal

The Kerberos principal

ssl_active

Defines if SSL is active for Airflow

false

web_server_ssl_cert

The path to SSL certificate

/etc/ssl/certs/host_cert.cert

web_server_ssl_key

The path to SSL certificate key

/etc/ssl/host_cert.key

Logging level

Specifies the logging level for Airflow activity

INFO

Logging level for Flask-appbuilder UI

Specifies the logging level for Flask-appbuilder UI

WARNING

cfg_properties_template

The Jinja template to initialize environment variables for Airflow

External database
Parameter Description Default value

Database type

The external database type. Possible values: PostgreSQL, MySQL/MariaDB

MySQL/MariaDB

Hostname

The external database host

 — 

Custom port

The external database port

 — 

Airflow database name

The external database name

airflow

flink-conf.yaml
Parameter Description Default value

jobmanager.rpc.port

The RPC port through which the JobManager is reachable. In the high availability mode, this value is ignored and the port number to connect to JobManager is generated by ZooKeeper

6123

taskmanager.network.bind-policy

The automatic address binding policy used by the TaskManager

name

parallelism.default

The system-wide default parallelism level for all execution environments

1

taskmanager.numberOfTaskSlots

The number of task slots that each TaskManager offers. Each slot runs one parallel pipeline

1

taskmanager.heap.size

The heap size for the TaskManager JVM

1024m

jobmanager.heap.size

The heap size for the JobManager JVM

1024m

security.kerberos.login.use-ticket-cache

Indicates whether to read from the Kerberos ticket cache

false

security.kerberos.login.keytab

The absolute path to the Kerberos keytab file that stores user credentials

 — 

security.kerberos.login.principal

Flink Kerberos principal

 — 

security.kerberos.login.contexts

A comma-separated list of login contexts to provide the Kerberos credentials to

 — 

security.ssl.rest.enabled

Turns on SSL for external communication via REST endpoints

false

security.ssl.rest.keystore

The Java keystore file with SSL key and certificate to be used by Flink’s external REST endpoints

 — 

security.ssl.rest.truststore

The truststore file containing public CA certificates to verify the peer for Flink’s external REST endpoints

 — 

security.ssl.rest.keystore-password

The secret to decrypt the keystore file for Flink external REST endpoints

 — 

security.ssl.rest.truststore-password

The password to decrypt the truststore for Flink’s external REST endpoints

 — 

security.ssl.rest.key-password

The secret to decrypt the key in the keystore for Flink’s external REST endpoints

 — 

Logging level

Defines the logging level for Flink activity

INFO

high-availability

Defines the High Availability (HA) mode used for cluster execution

 — 

high-availability.zookeeper.quorum

The ZooKeeper quorum to use when running Flink in the HA mode with ZooKeeper

 — 

high-availability.storageDir

A file system path (URI) where Flink persists metadata in the HA mode

 — 

high-availability.zookeeper.path.root

The root path for Flink ZNode in Zookeeper

/flink

high-availability.cluster-id

The ID of the Flink cluster used to separate multiple Flink clusters from each other

 — 

zookeeper.sasl.disable

Defines the SASL authentication in Zookeeper

false

Other
Parameter Description Default value

Custom flink-conf.yaml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file flink-conf.yaml

 — 

log4j.properties

The contents of the log4j.properties configuration file

log4j-cli.properties

The contents of the log4j-cli.properties configuration file

HBase

hbase-site.xml
Parameter Description Default value

hbase.balancer.period

The time period to run the Region balancer in Master

300000

hbase.client.pause

General client pause value. Used mostly as value to wait before running a retry of a failed get, region lookup, etc. See hbase.client.retries.number for description of how this pause works with retries

100

hbase.client.max.perregion.tasks

The maximum number of concurrent mutation tasks the Client will maintain to a single Region. That is, if there is already hbase.client.max.perregion.tasks writes in progress for this Region, new puts won’t be sent to this Region, until some writes finishes

1

hbase.client.max.perserver.tasks

The maximum number of concurrent mutation tasks a single HTable instance will send to a single Region Server

2

hbase.client.max.total.tasks

The maximum number of concurrent mutation tasks, a single HTable instance will send to the cluster

100

hbase.client.retries.number

The maximum number of retries. It is used as maximum for all retryable operations, such as: getting a cell value, starting a row update, etc. Retry interval is a rough function based on hbase.client.pause. See the constant RETRY_BACKOFF for how the backup ramps up. Change this setting and hbase.client.pause to suit your workload

15

hbase.client.scanner.timeout.period

The Client scanner lease period in milliseconds

60000

hbase.cluster.distributed

The cluster mode. Possible values are: false — for standalone mode and pseudo-distributed setups with managed ZooKeeper; true — for fully-distributed mode with unmanaged ZooKeeper Quorum. If false, the startup will run all HBase and ZooKeeper daemons together in the one JVM, if true — one JVM instance per daemon

true

hbase.hregion.majorcompaction

The time interval between Major compactions in milliseconds. Set to 0 to disable time-based automatic Major compactions. User-requested and size-based Major compactions will still run. This value is multiplied by hbase.hregion.majorcompaction.jitter to cause compaction to start at a somewhat-random time during a given time frame

604800000

hbase.hregion.max.filesize

The maximum file size. If the total size of some Region HFiles has grown to exceed this value, the Region is split in two. There are two options of how this option works: the first is when any store size exceeds the threshold — then split, and the other is if overall Region size exceeds the threshold — then split. It can be configured by hbase.hregion.split.overallfiles

10737418240

hbase.hstore.blockingStoreFiles

If more than this number of StoreFiles exists in any Store (one StoreFile is written per flush of MemStore), updates are blocked for this Region, until a compaction is completed, or until hbase.hstore.blockingWaitTime is exceeded

16

hbase.hstore.blockingWaitTime

The time for which a Region will block updates after reaching the StoreFile limit, defined by hbase.hstore.blockingStoreFiles. After this time is elapsed, the Region will stop blocking updates, even if a compaction has not been completed

90000

hbase.hstore.compaction.max

The maximum number of StoreFiles that will be selected for a single Minor compaction, regardless of the number of eligible StoreFiles. Effectively, the value of hbase.hstore.compaction.max controls the time it takes for a single compaction to complete. Setting it larger means that more StoreFiles are included in a compaction. For most cases, the default value is appropriate

10

hbase.hstore.compaction.min

The minimum number of StoreFiles that must be eligible for compaction before compaction can run. The goal of tuning hbase.hstore.compaction.min is to avoid a situation with too many tiny StoreFiles to compact. Setting this value to 2 would cause a Minor compaction each time you have two StoreFiles in a Store, and this is probably not appropriate. If you set this value too high, all the other values will need to be adjusted accordingly. For most cases, the default value is appropriate. In the previous versions of HBase, the parameter hbase.hstore.compaction.min was called hbase.hstore.compactionThreshold

3

hbase.hstore.compaction.min.size

A StoreFile, smaller than this size, will always be eligible for Minor compaction. StoreFiles this size or larger are evaluated by hbase.hstore.compaction.ratio to determine, if they are eligible. Because this limit represents the "automatic include" limit for all StoreFiles smaller than this value, this value may need to be reduced in write-heavy environments, where many files in the 1-2 MB range are being flushed, because every StoreFile will be targeted for compaction and the resulting StoreFiles may still be under the minimum size and require further compaction. If this parameter is lowered, the ratio check is triggered more quickly. This addressed some issues seen in earlier versions of HBase, but changing this parameter is no longer necessary in most situations

134217728

hbase.hstore.compaction.ratio

For Minor compaction, this ratio is used to determine, whether a given StoreFile that is larger than hbase.hstore.compaction.min.size, is eligible for compaction. Its effect is to limit compaction of large StoreFile. The value of hbase.hstore.compaction.ratio is expressed as a floating-point decimal

1.2F

hbase.hstore.compaction.ratio.offpeak

The compaction ratio used during off-peak compactions if the off-peak hours are also configured. Expressed as a floating-point decimal. This allows for more aggressive (or less aggressive, if you set it lower than hbase.hstore.compaction.ratio) compaction during a given time period. The value is ignored if off-peak is disabled (default). This works the same as hbase.hstore.compaction.ratio

5.0F

hbase.hstore.compactionThreshold

If more than this number of StoreFiles exists in any Store (one StoreFile is written per flush of MemStore), a compaction is run to rewrite all StoreFiles into a single StoreFile. Larger values delay the compaction, but when compaction does occur, it takes longer to complete

3

hbase.hstore.flusher.count

The number of flush threads. With fewer threads, the MemStore flushes will be queued. With more threads, the flushes will be executed in parallel, increasing the load on HDFS, and potentially causing more compactions

2

hbase.hstore.time.to.purge.deletes

The amount of time to delay purging of delete markers with future timestamps. If unset or set to 0, all the delete markers, including those with future timestamps, are purged during the next Major compaction. Otherwise, a delete marker is kept until the Major compaction that occurs after the marker timestamp plus the value of this setting (in milliseconds)

0

hbase.master.ipc.address

HMaster RPC

0.0.0.0

hbase.normalizer.period

The period at which the Region normalizer runs on Master (in milliseconds)

300000

hbase.regionserver.compaction.enabled

Enables/disables compactions by setting true/false. You can further switch compactions dynamically with the compaction_switch shell command

true

hbase.regionserver.ipc.address

Region Server RPC

0.0.0.0

hbase.regionserver.regionSplitLimit

The limit for the number of Regions, after which no more Region splitting should take place. This is not hard limit for the number of Regions, but acts as a guideline for the Region Server to stop splitting after a certain limit

1000

hbase.rootdir

The directory shared by Region Servers and into which HBase persists. The URL should be fully-qualified to include the filesystem scheme. For example, to specify the HDFS directory /hbase where the HDFS instance NameNode is running at namenode.example.org on port 9000, set this value to: hdfs://namenode.example.org:9000/hbase

 — 

hbase.zookeeper.quorum

A comma-separated list of servers in the ZooKeeper ensemble. For example, host1.mydomain.com,host2.mydomain.com,host3.mydomain.com. By default, this is set to localhost for local and pseudo-distributed modes of operation. For a fully-distributed setup, this should be set to a full list of ZooKeeper ensemble servers. If HBASE_MANAGES_ZK is set in hbase-env.sh, this is the list of servers, which HBase will start/stop ZooKeeper on, as part of cluster start/stop. Client-side, the list of ensemble members is put together with the hbase.zookeeper.property.clientPort config and is passed to the ZooKeeper constructor as the connection string parameter

 — 

zookeeper.session.timeout

The ZooKeeper session timeout in milliseconds. It is used in two different ways. First, this value is processed by the ZooKeeper Client that HBase uses to connect to the ensemble. It is also used by HBase, when it starts a ZooKeeper Server (in that case the timeout is passed as the maxSessionTimeout). See more details in the ZooKeeper documentation. For example, if an HBase Region Server connects to a ZooKeeper ensemble that is also managed by HBase, then the session timeout will be the one specified by this configuration. But a Region Server that connects to an ensemble managed with a different configuration will be subjected to the maxSessionTimeout of that ensemble. So, even though HBase might propose using 90 seconds, the ensemble can have a max timeout, lower than this, and it will take precedence. The current default maxSessionTimeout that ZooKeeper ships with is 40 seconds, which is lower than HBase

90000

zookeeper.znode.parent

The root znode for HBase in ZooKeeper. All of the HBase ZooKeeper files configured with a relative path will go under this node. By default, all of the HBase ZooKeeper file paths are configured with a relative path, so they will all go under this directory unless changed

/hbase

hbase.rest.port

The port used by HBase Rest Servers

60080

hbase.zookeeper.property.authProvider.1

Specifies the ZooKeeper authentication method

hbase.security.authentication

Set the value to true to run HBase RPC with strong authentication

false

hbase.security.authentication.ui

Enables Kerberos authentication to HBase web UI with SPNEGO

 — 

hbase.security.authentication.spnego.kerberos.principal

The Kerberos principal for SPNEGO authentication

 — 

hbase.security.authentication.spnego.kerberos.keytab

The path to the Kerberos keytab file with principals to be used for SPNEGO authentication

 — 

hbase.security.authorization

Set the value to true to run HBase RPC with strong authorization

false

hbase.master.kerberos.principal

The Kerberos principal used to run the HMaster process

 — 

hbase.master.keytab.file

Full path to the Kerberos keytab file to use for logging in the configured HMaster server principal

 — 

hbase.regionserver.kerberos.principal

The Kerberos principal name that should be used to run the HRegionServer process

 — 

hbase.regionserver.keytab.file

Full path to the Kerberos keytab file to use for logging in the configured HRegionServer server principal

 — 

hbase.rest.authentication.type

REST Gateway Kerberos authentication type

 — 

hbase.rest.authentication.kerberos.principal

REST Gateway Kerberos principal

 — 

hbase.rest.authentication.kerberos.keytab

REST Gateway Kerberos principal

 — 

hbase.thrift.keytab.file

Thrift Kerberos keytab

 — 

hbase.rest.keytab.file

HBase REST gateway Kerberos keytab

 — 

hbase.rest.kerberos.principal

HBase REST gateway Kerberos principal

 — 

hbase.thrift.kerberos.principal

Thrift Kerberos principal

 — 

hbase.thrift.security.qop

Defines authentication, integrity, and confidentiality checking. Supported values:

  • auth-conf — authentication, integrity, and confidentiality checking;

  • auth-int — authentication and integrity checking;

  • auth — authentication checking only.

 — 

phoenix.queryserver.keytab.file

The path to the Kerberos keytab file

 — 

phoenix.queryserver.kerberos.principal

The Kerberos principal to use when authenticating. If phoenix.queryserver.kerberos.http.principal is not defined, this principal specified will be also used to both authenticate SPNEGO connections and to connect to HBase

 — 

phoenix.queryserver.kerberos.keytab

The full path to the Kerberos keytab file to use for logging in the configured HMaster server principal

 — 

phoenix.queryserver.http.keytab.file

The keytab file to use for authenticating SPNEGO connections. This configuration must be specified if phoenix.queryserver.kerberos.http.principal is configured. phoenix.queryserver.keytab.file will be used if this property is undefined

 — 

phoenix.queryserver.http.kerberos.principal

The Kerberos principal to use when authenticating SPNEGO connections. phoenix.queryserver.kerberos.principal will be used if this property is undefined

phoenix.queryserver.kerberos.http.principal

Deprecated, use phoenix.queryserver.http.kerberos.principal instead

 — 

hbase.ssl.enabled

Defines whether SSL is enabled for web UIs

false

hadoop.ssl.enabled

Defines whether SSL is enabled for Hadoop RPC

false

ssl.server.keystore.location

The path to the keystore file

 — 

ssl.server.keystore.password

The password to the keystore

 — 

ssl.server.truststore.location

The path to the truststore to be used

 — 

ssl.server.truststore.password

The password to the truststore

 — 

ssl.server.keystore.keypassword

The password to the key in the keystore

 — 

hbase.rest.ssl.enabled

Defines whether SSL is enabled for HBase REST server

false

hbase.rest.ssl.keystore.store

The path to the keystore used by HBase REST server

 — 

hbase.rest.ssl.keystore.password

The password to the keystore

 — 

hbase.rest.ssl.keystore.keypassword

The password to the key in the keystore

 — 

HBASE heap memory settings
Parameter Description Default value

HBASE Regionserver Heap Memory

Sets initial (-Xms) and maximum (-Xmx) Java heap size for HBase Region server

-Xms700m -Xmx9G

HBASE Master Heap Memory

Sets initial (-Xms) and maximum (-Xmx) Java heap size for HBase Master

-Xms700m -Xmx9G

Phoenix Queryserver Heap Memory

Sets initial (-Xms) and maximum (-Xmx) Java heap size for Phoenix Query server

-Xms700m -Xmx8G

HBASE Thrift2 server Heap Memory

Sets initial (-Xms) and maximum (-Xmx) Java heap size for HBase Thrift2 server

-Xms700m -Xmx8G

HBASE Rest server Heap Memory

Sets initial (-Xms) and maximum (-Xmx) Java heap size for HBase Rest server

-Xms200m -Xmx8G

ranger-hbase-audit.xml
Parameter Description Default value

xasecure.audit.destination.solr.batch.filespool.dir

The spool directory path

/srv/ranger/hdfs_plugin/audit_solr_spool

xasecure.audit.destination.solr.urls

Leave this property value empty or set it to NONE when using ZooKeeper to connect to Solr

 — 

xasecure.audit.destination.solr.zookeepers

Specifies the ZooKeeper connection string for the Solr destination

 — 

xasecure.audit.destination.solr.force.use.inmemory.jaas.config

Uses in-memory JAAS configuration file to connect to Solr

 — 

xasecure.audit.is.enabled

Enables Ranger audit

true

xasecure.audit.jaas.Client.loginModuleControlFlag

Specifies whether the success of the module is required, requisite, sufficient, or optional

 — 

xasecure.audit.jaas.Client.loginModuleName

The name of the authenticator class

 — 

xasecure.audit.jaas.Client.option.keyTab

The name of the keytab file to get the principal’s secret key

 — 

xasecure.audit.jaas.Client.option.principal

The name of the principal to be used

 — 

xasecure.audit.jaas.Client.option.serviceName

Represents a user or a service that wants to log in

 — 

xasecure.audit.jaas.Client.option.storeKey

Set this to true if you want the keytab or the principal’s key to be stored in the subject’s private credentials

false

xasecure.audit.jaas.Client.option.useKeyTab

Set this to true if you want the module to get the principal’s key from the keytab

false

ranger-hbase-security.xml
Parameter Description Default value

ranger.plugin.hbase.policy.rest.url

The URL to Ranger Admin

 — 

ranger.plugin.hbase.service.name

The name of the Ranger service containing policies for this instance

 — 

ranger.plugin.hbase.policy.cache.dir

The directory where Ranger policies are cached after successful retrieval from the source

/srv/ranger/hbase/policycache

ranger.plugin.hbase.policy.pollIntervalMs

Defines how often to poll for changes in policies

30000

ranger.plugin.hbase.policy.rest.client.connection.timeoutMs

The HBase Plugin RangerRestClient connection timeout (in milliseconds)

120000

ranger.plugin.hbase.policy.rest.client.read.timeoutMs

The HBase Plugin RangerRestClient read timeout (in milliseconds)

30000

ranger.plugin.hbase.policy.rest.ssl.config.file

The path to the RangerRestClient SSL config file for HBase plugin

/etc/hbase/conf/ranger-hbase-policymgr-ssl.xml

ranger-hbase-policymgr-ssl.xml
Parameter Description Default value

xasecure.policymgr.clientssl.keystore

The path to the keystore file used by Ranger

 — 

xasecure.policymgr.clientssl.keystore.credential.file

The path to the keystore credentials file

/etc/hbase/conf/ranger-hbase.jceks

xasecure.policymgr.clientssl.truststore.credential.file

The path to the truststore credentials file

/etc/hbase/conf/ranger-hbase.jceks

xasecure.policymgr.clientssl.truststore

The path to the truststore file used by Ranger

 — 

xasecure.policymgr.clientssl.keystore.password

The password to the keystore file

 — 

xasecure.policymgr.clientssl.truststore.password

The password to the truststore file

 — 

Other
Parameter Description Default value

Custom hbase-site.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file hbase-site.xml

 — 

Custom hbase-env.sh

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file hbase-env.sh

 — 

Ranger plugin enabled

Whether or not Ranger plugin is enabled

false

Custom ranger-hbase-audit.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ranger-hbase-audit.xml

 — 

Custom ranger-hbase-security.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ranger-hbase-security.xml

 — 

Custom ranger-hbase-security.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ranger-hbase-security.xml

 — 

Custom ranger-hbase-policymgr-ssl.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ranger-hbase-policymgr-ssl.xml

 — 

Custom log4j.properties

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file log4j.properties

Custom hadoop-metrics2-hbase.properties

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file hadoop-metrics2-hbase.properties

HDFS

core-site.xml
Parameter Description Default value

fs.defaultFS

The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The URI scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The URI authority is used to determine the host, port, etc. for a filesystem

 — 

fs.trash.checkpoint.interval

The number of minutes between trash checkpoints. Should be smaller or equal to fs.trash.interval. Every time the checkpointer runs, it creates a new checkpoint out of current and removes checkpoints, created more than fs.trash.interval minutes ago

60

fs.trash.interval

The number of minutes, after which the checkpoint gets deleted. If set to 0, the trash feature is disabled

1440

hadoop.tmp.dir

The base for other temporary directories

/tmp/hadoop-${user.name}

hadoop.zk.address

A comma-separated list of pairs <Host>:<Port>. Each corresponds to a ZooKeeper to be used by the Resource Manager for storing Resource Manager state

 — 

io.file.buffer.size

The buffer size for sequence files. The size of this buffer should probably be a multiple of hardware page size (4096 on Intel x86), and it determines, how much data is buffered during read and write operations

131072

net.topology.script.file.name

The script name, that should be invoked to resolve DNS names to NetworkTopology names. Example: the script would take host.foo.bar as an argument, and return /rack1 as the output

 — 

ha.zookeeper.quorum

A list of ZooKeeper Server addresses, separated by commas, that are to be used by the ZKFailoverController in automatic failover

 — 

ipc.client.fallback-to-simple-auth-allowed

When a client is configured to attempt a secure connection, but attempts to connect to an insecure server, that server may instuct the client to switch to SASL SIMPLE (unsecure) authentication. This setting controls whether or not the client will accept this instruction from the server. When set to false (default), the client does not allow the fallback to SIMPLE authentication and will abort the connection

false

hadoop.security.authentication

Defines the authentication type. Possible values: simple — no authentication, kerberos — enables the authentication by Kerberos

simple

hadoop.security.authorization

Enables RPC service-level authorization

false

hadoop.rpc.protection

Specifies RPC protection. Possible values:

  • authentication — authentication only;

  • integrity — performs the integrity check in addition to authentication;

  • privacy — encrypts the data in addition to integrity.

authentication

hadoop.security.auth_to_local

The value is a string containing new line characters. See Kerberos documentation for more information about the format

 — 

hadoop.http.authentication.type

Defines authentication used for the HTTP web-consoles. The supported values are: simple, kerberos, [AUTHENTICATION_HANDLER-CLASSNAME]

simple

hadoop.http.authentication.kerberos.principal

Indicates the Kerberos principal to be used for HTTP endpoint when using the kerberos authentication. The principal short name adhere to HTTP per Kerberos HTTP SPNEGO specification

HTTP/localhost@$LOCALHOST

hadoop.http.authentication.kerberos.keytab

The location of the keytab file with the credentials for the Kerberos principal used for the HTTP endpoint

/etc/security/keytabs/HTTP.service.keytab

ha.zookeeper.acl

ACLs for all znodes

 — 

hadoop.http.filter.initializers

Add to this property the org.apache.hadoop.security.AuthenticationFilterInitializer initializer class

 — 

hadoop.http.authentication.signature.secret.file

The signature secret file for signing the authentication tokens. If not set, a random secret is generated during the startup. The same secret should be used for all nodes in the cluster, JobTracker, NameNode, DataNode and TastTracker. This file should be readable only by the Unix user running the daemons

/etc/security/http_secret

hadoop.http.authentication.cookie.domain

The domain to use for the HTTP cookie that stores the authentication token. In order for authentication to work properly across all nodes in the cluster, the domain must be correctly set. There is no default value, the HTTP cookie will not have a domain working only with the hostname issuing the HTTP cookie

 — 

hadoop.ssl.require.client.cert

Defines whether client certificates are required

false

hadoop.ssl.hostname.verifier

The host name verifier to provide for HttpsURLConnections. Valid values are: DEFAULT, STRICT, STRICT_IE6, DEFAULT_AND_LOCALHOST, and ALLOW_ALL

DEFAULT

hadoop.ssl.keystores.factory.class

The KeyStoresFactory implementation to use

org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory

hadoop.ssl.server.conf

A resource file from which the SSL server keystore information will be extracted. This file is looked up in the classpath, typically it should be located in Hadoop conf/ directory

ssl-server.xml

hadoop.ssl.client.conf

A resource file from which the SSL client keystore information will be extracted. This file is looked up in the classpath, typically it should be located in Hadoop conf/ directory

ssl-client.xml

User managed hadoop.security.auth_to_local

Disable automatic generation of hadoop.security.auth_to_local

false

hdfs-site.xml
Parameter Description Default value

dfs.client.block.write.replace-datanode-on-failure.enable

If there is a DataNode/network failure in the write pipeline, DFSClient will try to remove the failed DataNode from the pipeline and then continue writing with the remaining DataNodes. As a result, the number of DataNodes in the pipeline is decreased. The feature is to add new DataNodes to the pipeline. This is a site-wide property to enable/disable the feature. When the cluster size is extremely small, e.g. 3 nodes or less, cluster administrators may want to set the policy to NEVER in the default configuration file or disable this feature. Otherwise, users may experience an unusually high rate of pipeline failures since it is impossible to find new DataNodes for replacement. See also dfs.client.block.write.replace-datanode-on-failure.policy

true

dfs.client.block.write.replace-datanode-on-failure.policy

This property is used only if the value of dfs.client.block.write.replace-datanode-on-failure.enable is true. Possible values:

  • ALWAYS. Always adds a new DataNode, when an existing DataNode is removed.

  • NEVER. Never adds a new DataNode.

  • DEFAULT. Let r be the replication number. Let n be the number of existing DataNodes. Add a new DataNode only, if r is greater than or equal to 3 and either:

    1. floor(r/2) is greater than or equal to n;

    2. r is greater than n and the block is hflushed/appended.

DEFAULT

dfs.client.block.write.replace-datanode-on-failure.best-effort

This property is used only if the value of dfs.client.block.write.replace-datanode-on-failure.enable is true. Best effort means, that the client will try to replace a failed DataNode in write pipeline (provided that the policy is satisfied), however, it continues the write operation in case that the DataNode replacement also fails. Suppose, the DataNode replacement fails: false — an exception should be thrown so that the write will fail; true — the write should be resumed with the remaining DataNodes. Note, that setting this property to true allows writing to a pipeline with a smaller number of DataNodes. As a result, it increases the probability of data loss

false

dfs.client.block.write.replace-datanode-on-failure.min-replication

The minimum number of replications needed not to fail the write pipeline if new DataNodes can not be found to replace failed DataNodes (could be due to network failure) in the write pipeline. If the number of the remaining DataNodes in the write pipeline is greater than or equal to this property value, continue writing to the remaining nodes. Otherwise throw exception. If this is set to 0, an exception will be thrown, when a replacement can not be found. See also dfs.client.block.write.replace-datanode-on-failure.policy

0

dfs.balancer.dispatcherThreads

The size of the thread pool for the HDFS balancer block mover — dispatchExecutor

200

dfs.balancer.movedWinWidth

The time window in milliseconds for the HDFS balancer tracking blocks and its locations

5400000

dfs.balancer.moverThreads

The thread pool size for executing block moves — moverThreadAllocator

1000

dfs.balancer.max-size-to-move

The maximum number of bytes that can be moved by the balancer in a single thread

10737418240

dfs.balancer.getBlocks.min-block-size

The minimum block threshold size in bytes to ignore, when fetching a source block list

10485760

dfs.balancer.getBlocks.size

The total size in bytes of DataNode blocks to get, when fetching a source block list

2147483648

dfs.balancer.block-move.timeout

The maximum amount of time for a block to move (in milliseconds). If set greater than 0, the balancer will stop waiting for a block move completion after this time. In typical clusters, a 3-5 minute timeout is reasonable. If the timeout is set for a large proportion of block moves, this needs to be increased. It could also be that too much work is dispatched and many nodes are constantly exceeding the bandwidth limit as a result. In that case, other balancer parameters might need to be adjusted. It is disabled (0) by default

0

dfs.balancer.max-no-move-interval

If this specified amount of time has elapsed and no blocks have been moved out of a source DataNode, one more attempt will be made to move blocks out of this DataNode in the current Balancer iteration

60000

dfs.balancer.max-iteration-time

The maximum amount of time an iteration can be run by the Balancer. After this time the Balancer will stop the iteration, and re-evaluate the work needed to be done to balance the cluster. The default value is 20 minutes

1200000

dfs.blocksize

The default block size for new files (in bytes). You can use the following suffixes to define size units (case insensitive): k (kilo), m (mega), g (giga), t (tera), p (peta), e (exa). For example, 128k, 512m, 1g, etc. You can also specify the block size in bytes (such as 134217728 for 128 MB)

134217728

dfs.client.read.shortcircuit

Turns on short-circuit local reads

true

dfs.datanode.balance.max.concurrent.moves

The maximum number of threads for DataNode balancer pending moves. This value is reconfigurable via the dfsadmin -reconfig command

50

dfs.datanode.data.dir

Determines, where on the local filesystem a DFS data node should store its blocks. If multiple directories are specified, then data will be stored in all named directories, typically on different devices. The directories should be tagged with corresponding storage types (SSD/DISK/ARCHIVE/RAM_DISK) for HDFS storage policies. The default storage type will be DISK if the directory does not have a storage type tagged explicitly. Directories, that do not exist, will be created, if the local filesystem permission allows

/srv/hadoop-hdfs/data:DISK

dfs.disk.balancer.max.disk.throughputInMBperSec

The maximum disk bandwidth, used by the disk balancer during reads from a source disk. The unit is MB/sec

10

dfs.disk.balancer.block.tolerance.percent

The parameter specifies when a good enough value is reached for any copy step (in percents). For example, if set to to 10 then getting close to 10% of the target value is considered as good enough. In other words, if the move operation is 20GB in size, if 18GB (20 * (1-10%)) can be moved, the entire operation is considered successful

10

dfs.disk.balancer.max.disk.errors

During a block move from a source to destination disk, there might be various errors. This parameter defines how many errors to tolerate before declaring a move between 2 disks (or a step) has failed

5

dfs.disk.balancer.plan.valid.interval

The maximum amount of time a disk balancer plan (a set of configurations that define the data volume to be redistributed between two disks) remains valid. This setting supports multiple time unit suffixes as described in dfs.heartbeat.interval. If no suffix is specified, then milliseconds are assumed

1d

dfs.disk.balancer.plan.threshold.percent

Defines a data storage threshold in percents at which disks start participating in data redistribution or balancing activities

10

dfs.domain.socket.path

The path to a UNIX domain socket that will be used for communication between the DataNode and local HDFS clients. If the string _PORT is present in this path, it will be replaced by the TCP port of the DataNode. The parameter is optional

/var/lib/hadoop-hdfs/dn_socket

dfs.hosts

Names a file that contains a list of hosts allowed to connect to the NameNode. The full pathname of the file must be specified. If the value is empty, all hosts are permitted

/etc/hadoop/conf/dfs.hosts

dfs.mover.movedWinWidth

The minimum time interval for a block to be moved to another location again (in milliseconds)

5400000

dfs.mover.moverThreads

Sets the balancer mover thread pool size

1000

dfs.mover.retry.max.attempts

The maximum number of retries before the mover considers the move as failed

10

dfs.mover.max-no-move-interval

If this specified amount of time has elapsed and no block has been moved out of a source DataNode, one more attempt will be made to move blocks out of this DataNode in the current mover iteration

60000

dfs.namenode.name.dir

Determines where on the local filesystem the DFS name node should store the name table (fsimage). If multiple directories are specified, then the name table is replicated in all of the directories, for redundancy

/srv/hadoop-hdfs/name

dfs.namenode.checkpoint.dir

Determines where on the local filesystem the DFS secondary name node should store the temporary images to merge. If multiple directories are specified, then the image is replicated in all of the directories for redundancy

/srv/hadoop-hdfs/checkpoint

dfs.namenode.hosts.provider.classname

The class that provides access for host files. org.apache.hadoop.hdfs.server.blockmanagement.HostFileManager is used by default that loads files specified by dfs.hosts and dfs.hosts.exclude. If org.apache.hadoop.hdfs.server.blockmanagement.CombinedHostFileManager is used, it will load the JSON file defined in dfs.hosts. To change the class name, NameNode restart is required. dfsadmin -refreshNodes only refreshes the configuration files, used by the class

org.apache.hadoop.hdfs.server.blockmanagement.CombinedHostFileManager

dfs.namenode.rpc-bind-host

The actual address, the RPC Server will bind to. If this optional address is set, it overrides only the hostname portion of dfs.namenode.rpc-address. It can also be specified per NameNode or name service for HA/Federation. This is useful for making the NameNode listen on all interfaces by setting it to 0.0.0.0

0.0.0.0

dfs.permissions.superusergroup

The name of the group of super-users. The value should be a single group name

hadoop

dfs.replication

The default block replication. The actual number of replications can be specified, when the file is created. The default is used, if replication is not specified in create time

3

dfs.journalnode.http-address

The HTTP address of the JournalNode web UI

0.0.0.0:8480

dfs.journalnode.https-address

The HTTPS address of the JournalNode web UI

0.0.0.0:8481

dfs.journalnode.rpc-address

The RPC address of the JournalNode web UI

0.0.0.0:8485

dfs.datanode.http.address

The address of the DataNode HTTP server

0.0.0.0:9864

dfs.datanode.https.address

The address of the DataNode HTTPS server

0.0.0.0:9865

dfs.datanode.address

The address of the DataNode for data transfer

0.0.0.0:9866

dfs.datanode.ipc.address

The IPC address of the DataNode

0.0.0.0:9867

dfs.namenode.http-address

The address and the base port to access the dfs NameNode web UI

0.0.0.0:9870

dfs.namenode.https-address

The secure HTTPS address of the NameNode

0.0.0.0:9871

dfs.ha.automatic-failover.enabled

Defines whether automatic failover is enabled

true

dfs.ha.fencing.methods

A list of scripts or Java classes that will be used to fence the Active NameNode during a failover

shell(/bin/true)

dfs.journalnode.edits.dir

The directory where to store journal edit files

/srv/hadoop-hdfs/journalnode

dfs.namenode.shared.edits.dir

The directory on shared storage between the multiple NameNodes in an HA cluster. This directory will be written by the active and read by the standby in order to keep the namespaces synchronized. This directory does not need to be listed in dfs.namenode.edits.dir. It should be left empty in a non-HA cluster

---

dfs.internal.nameservices

A comma-separated list of nameservices that belong to this cluster. The value must be alpanumeric without underscores. DataNode will report to all the nameservices in this list

 — 

dfs.block.access.token.enable

If set to true, access tokens are used as capabilities for accessing DataNodes. If set to false, no access tokens are checked on accessing DataNodes

false

dfs.namenode.kerberos.principal

The NameNode service principal. This is typically set to nn/_HOST@REALM.TLD. Each NameNode will substitute _HOST with its own fully qualified hostname during the startup. The _HOST placeholder allows using the same configuration setting on both NameNodes in an HA setup

nn/_HOST@REALM

dfs.namenode.keytab.file

The keytab file used by each NameNode daemon to login as its service principal. The principal name is configured with dfs.namenode.kerberos.principal

/etc/security/keytabs/nn.service.keytab

dfs.namenode.kerberos.internal.spnego.principal

HTTP Kerberos principal name for the NameNode

HTTP/_HOST@REALM

dfs.web.authentication.kerberos.principal

Kerberos principal name for the WebHDFS

HTTP/_HOST@REALM

dfs.web.authentication.kerberos.keytab

Kerberos keytab file for WebHDFS

/etc/security/keytabs/HTTP.service.keytab

dfs.journalnode.kerberos.principal

The JournalNode service principal. This is typically set to jn/_HOST@REALM.TLD. Each JournalNode will substitute _HOST with its own fully qualified hostname at startup. The _HOST placeholder allows using the same configuration setting on all JournalNodes

jn/_HOST@REALM

dfs.journalnode.keytab.file

The keytab file used by each JournalNode daemon to login as its service principal. The principal name is configured with dfs.journalnode.kerberos.principal

/etc/security/keytabs/jn.service.keytab

dfs.journalnode.kerberos.internal.spnego.principal

The server principal used by the JournalNode HTTP Server for SPNEGO authentication when Kerberos security is enabled. This is typically set to HTTP/_HOST@REALM.TLD. The SPNEGO server principal begins with the prefix HTTP/ by convention. If the value is *, the web server will attempt to login with every principal specified in the keytab file dfs.web.authentication.kerberos.keytab. For most deployments this can be set to ${dfs.web.authentication.kerberos.principal} that is use the value of dfs.web.authentication.kerberos.principal

HTTP/_HOST@REALM

dfs.datanode.data.dir.perm

Permissions for the directories on the local filesystem where the DFS DataNode stores its blocks. The permissions can either be octal or symbolic

700

dfs.datanode.kerberos.principal

The DataNode service principal. This is typically set to dn/_HOST@REALM.TLD. Each DataNode will substitute _HOST with its own fully qualified host name at startup. The _HOST placeholder allows using the same configuration setting on all DataNodes

dn/_HOST@REALM.TLD

dfs.datanode.keytab.file

The keytab file used by each DataNode daemon to login as its service principal. The principal name is configured with dfs.datanode.kerberos.principal

/etc/security/keytabs/dn.service.keytab

dfs.http.policy

Defines if HTTPS (SSL) is supported on HDFS. This configures the HTTP endpoint for HDFS daemons. The following values are supported: HTTP_ONLY — the service is provided only via http; HTTPS_ONLY — the service is provided only via https; HTTP_AND_HTTPS — the service is provided both via http and https

HTTP_ONLY

dfs.data.transfer.protection

A comma-separated list of SASL protection values used for secured connections to the DataNode when reading or writing block data. The possible values are:

  • authentication — provides only authentication; no integrity or privacy;

  • integrity — authentication and integrity are enabled;

  • privacy — authentication, integrity and privacy are enabled.

If dfs.encrypt.data.transfer=true, then it supersedes the setting for dfs.data.transfer.protection and enforces that all connections must use a specialized encrypted SASL handshake. This property is ignored for connections to a DataNode listening on a privileged port. In this case, it is assumed that the use of a privileged port establishes sufficient trust

 — 

dfs.encrypt.data.transfer

Defines whether or not actual block data that is read/written from/to HDFS should be encrypted on the wire. This only needs to be set on the NameNodes and DataNodes, clients will deduce this automatically. It is possible to override this setting per connection by specifying custom logic via dfs.trustedchannel.resolver.class

false

dfs.encrypt.data.transfer.algorithm

This value may be set to either 3des or rc4. If nothing is set, then the configured JCE default on the system is used (usually 3DES). It is widely believed that 3DES is more secure, but RC4 is substantially faster. Note that if AES is supported by both the client and server, then this encryption algorithm will only be used to initially transfer keys for AES

3des

dfs.encrypt.data.transfer.cipher.suites

This value can be either undefined or AES/CTR/NoPadding. If defined, then dfs.encrypt.data.transfer uses the specified cipher suite for data encryption. If not defined, then only the algorithm specified in dfs.encrypt.data.transfer.algorithm is used

 — 

dfs.encrypt.data.transfer.cipher.key.bitlength

The key bitlength negotiated by dfsclient and datanode for encryption. This value may be set to either 128, 192, or 256

128

ignore.secure.ports.for.testing

Allows to skip HTTPS requirements in the SASL mode

false

dfs.client.https.need-auth

Whether SSL client certificate authentication is required

false

httpfs-site.xml
Parameter Description Default value

httpfs.http.administrators

The ACL for the admins. This configuration is used to control who can access the default servlets for HttpFS server. The value should be a comma-separated list of users and groups. The user list comes first and is separated by a space, followed by the group list, for example: user1,user2 group1,group2. Both users and groups are optional, so you can define only users, or groups, or both of them. Notice that in all these cases you should always use the leading space in the groups list. Using the asterisk grants access to all users and groups

*

hadoop.http.temp.dir

The HttpFS temp directory

${hadoop.tmp.dir}/httpfs

httpfs.ssl.enabled

Defines whether SSL is enabled. Default is false, that is disabled

false

httpfs.hadoop.config.dir

The location of the Hadoop configuration directory

/etc/hadoop/conf

httpfs.hadoop.authentication.type

Defines the authentication mechanism used by httpfs for its HTTP clients. Valid values are simple and kerberos. If simple is used, clients must specify the username with the user.name query string parameter. If kerberos is used, HTTP clients must use HTTP SPNEGO or delegation tokens

simple

httpfs.hadoop.authentication.kerberos.keytab

The Kerberos keytab file with the credentials for the HTTP Kerberos principal used by httpfs in the HTTP endpoint. httpfs.authentication.kerberos.keytab is deprecated. Instead, use hadoop.http.authentication.kerberos.keytab

/etc/security/keytabs/httpfs.service.keytab

httpfs.hadoop.authentication.kerberos.principal

The HTTP Kerberos principal used by HttpFS in the HTTP endpoint. The HTTP Kerberos principal MUST start with HTTP/ as per Kerberos HTTP SPNEGO specification. httpfs.authentication.kerberos.principal is deprecated. Instead, use hadoop.http.authentication.kerberos.principal

HTTP/${httpfs.hostname}@${kerberos.realm}

ranger-hdfs-audit.xml
Parameter Description Default value

xasecure.audit.destination.solr.batch.filespool.dir

The spool directory path

/srv/ranger/hdfs_plugin/audit_solr_spool

xasecure.audit.destination.solr.urls

Leave this property value empty or set it to NONE when using ZooKeeper to connect to Solr

 — 

xasecure.audit.destination.solr.zookeepers

Specifies the ZooKeeper connection string for the Solr destination

 — 

xasecure.audit.destination.solr.force.use.inmemory.jaas.config

Uses in-memory JAAS configuration file to connect to Solr

 — 

xasecure.audit.is.enabled

Enables Ranger audit

true

xasecure.audit.jaas.Client.loginModuleControlFlag

Specifies whether the success of the module is required, requisite, sufficient, or optional

 — 

xasecure.audit.jaas.Client.loginModuleName

The name of the authenticator class

 — 

xasecure.audit.jaas.Client.option.keyTab

The name of the keytab file to get the principal’s secret key

 — 

xasecure.audit.jaas.Client.option.principal

The name of the principal to be used

 — 

xasecure.audit.jaas.Client.option.serviceName

Represents a user or a service that wants to log in

 — 

xasecure.audit.jaas.Client.option.storeKey

Set this to true if you want the keytab or the principal’s key to be stored in the subject’s private credentials

false

xasecure.audit.jaas.Client.option.useKeyTab

Set this to true if you want the module to get the principal’s key from the keytab

false

ranger-hdfs-security.xml
Parameter Description Default value

ranger.plugin.hdfs.policy.rest.url

The URL to Ranger Admin

 — 

ranger.plugin.hdfs.service.name

The name of the Ranger service containing policies for this instance

 — 

ranger.plugin.hdfs.policy.cache.dir

The directory where Ranger policies are cached after successful retrieval from the source

/srv/ranger/hdfs/policycache

ranger.plugin.hdfs.policy.pollIntervalMs

Defines how often to poll for changes in policies

30000

ranger.plugin.hdfs.policy.rest.client.connection.timeoutMs

The HDFS Plugin RangerRestClient connection timeout (in milliseconds)

120000

ranger.plugin.hdfs.policy.rest.client.read.timeoutMs

The HDFS Plugin RangerRestClient read timeout (in milliseconds)

30000

ranger.plugin.hdfs.policy.rest.ssl.config.file

The path to the RangerRestClient SSL config file for the HDFS plugin

/etc/hadoop/conf/ranger-hdfs-policymgr-ssl.xml

httpfs-env.sh
Parameter Description Default value

HADOOP_CONF_DIR

Hadoop configuration directory

/etc/hadoop/conf

HADOOP_LOG_DIR

Location of the log directory

${HTTPFS_LOG}

HADOOP_PID_DIR

PID file directory location

${HTTPFS_TEMP}

HTTPFS_SSL_ENABLED

Defines if SSL is enabled for httpfs

false

HTTPFS_SSL_KEYSTORE_FILE

The path to the keystore file

admin

HTTPFS_SSL_KEYSTORE_PASS

The password to access the keystore

admin

HDFS heap memory settings
Parameter Description Default value

NameNode Heap Memory

Sets initial (-Xms) and maximum (-Xmx) Java heap size for NameNode

-Xms1G -Xmx8G

DataNode Heap Memory

Sets initial (-Xms) and maximum (-Xmx) Java heap size for DataNode

-Xms700m -Xmx8G

HttpFS Heap Memory

Sets initial (-Xms) and maximum (-Xmx) Java heap size for httpfs

-Xms700m -Xmx8G

JournalNode Heap Memory

Sets initial (-Xms) and maximum (-Xmx) Java heap size for JournalNode

-Xms700m -Xmx8G

ZKFC Heap Memory

Sets initial (-Xms) and maximum (-Xmx) Java heap size for ZKFC

-Xms500m -Xmx8G

ssl-server.xml
Parameter Description Default value

ssl.server.truststore.location

The truststore to be used by NameNodes and DataNodes

 — 

ssl.server.truststore.password

The password to the truststore

 — 

ssl.server.truststore.type

The truststore file format

jks

ssl.server.truststore.reload.interval

The truststore reload check interval (in milliseconds)

10000

ssl.server.keystore.location

The path to the keystore file used by NameNodes and DataNodes

 — 

ssl.server.keystore.password

The password to the keystore

 — 

ssl.server.keystore.keypassword

The password to the key in the keystore

 — 

ssl.server.keystore.type

The keystore file format

 — 

ssl-client.xml
Parameter Description Default value

ssl.client.truststore.location

The truststore to be used by NameNodes and DataNodes

 — 

ssl.client.truststore.password

The password to the truststore

 — 

ssl.client.truststore.location

The truststore to be used by NameNodes and DataNodes

 — 

ssl.client.truststore.type

The truststore file format

jks

ssl.client.truststore.reload.interval

The truststore reload check interval (in milliseconds)

10000

ssl.client.keystore.location

The path to the keystore file used by NameNodes and DataNodes

 — 

ssl.client.keystore.password

The password to the keystore

 — 

ssl.client.keystore.keypassword

The password to the key in the keystore

 — 

ssl.client.keystore.type

The keystore file format

 — 

Lists of decommissioned and in maintenance hosts
Parameter Description Default value

DECOMMISSIONED

When an administrator decommissions a DataNode, the DataNode will first be transitioned into DECOMMISSION_INPROGRESS state. After all blocks belonging to that DataNode are fully replicated elsewhere based on each block replication factor, the DataNode will be transitioned to DECOMMISSIONED state. After that, the administrator can shutdown the node to perform long-term repair and maintenance that could take days or weeks. After the machine has been repaired, the machine can be recommissioned back to the cluster

 — 

IN_MAINTENANCE

Sometimes administrators only need to take DataNodes down for minutes/hours to perform short-term repair/maintenance. For such scenarios, the HDFS block replication overhead, incurred by decommission, might not be necessary and a light-weight process is desirable. And that is what maintenance state is used for. When an administrator puts a DataNode in the maintenance state, the DataNode will first be transitioned to ENTERING_MAINTENANCE state. As long as all blocks belonging to that DataNode, are minimally replicated elsewhere, the DataNode will immediately be transitioned to IN_MAINTENANCE state. After the maintenance has completed, the administrator can take the DataNode out of the maintenance state. In addition, maintenance state supports the timeout that allows administrators to configure the maximum duration, in which a DataNode is allowed to stay in the maintenance state. After the timeout, the DataNode will be transitioned out of maintenance state automatically by HDFS without human intervention

 — 

Other
Parameter Description Default value

Custom core-site.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file core-site.xml

 — 

Custom hdfs-site.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file hdfs-site.xml

 — 

Custom httpfs-site.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file httpfs-site.xml

 — 

Ranger plugin enabled

Whether or not Ranger plugin is enabled

 — 

Custom ranger-hdfs-audit.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ranger-hdfs-audit.xml

 — 

Custom ranger-hdfs-security.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ranger-hdfs-security.xml

 — 

Custom ranger-hdfs-policymgr-ssl.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ranger-hdfs-policymgr-ssl.xml

 — 

Custom httpfs-env.sh

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file httpfs-env.sh

 — 

Custom ssl-server.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ssl-server.xml

 — 

Custom ssl-client.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ssl-client.xml

 — 

Topology script

The topology script used in HDFS

 — 

Topology data

An otional text file to map host names to the rack number for topology script. Stored to /etc/hadoop/conf/topology.data

 — 

Custom log4j.properties

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file log4j.properties

Custom httpfs-log4j.properties

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file httpfs-log4j.properties

Hive

External database
Parameter Description Default value

Database type

The type of the external database server. Possible values: MySQL, PostgreSQL

MySQL

Hostname

The external database hostname

 — 

Custom port

The external database port. Leave empty for using the default port

 — 

Hive database name

The external database name

hive

hive-env.sh
Parameter Description Default value

HADOOP_CLASSPATH

A colon-delimited list of directories, files, or wildcard locations that include all necessary classes

/etc/tez/conf/:/usr/lib/tez/:/usr/lib/tez/lib/

HIVE_HOME

The Hive home directory

/usr/lib/hive

METASTORE_PORT

The Hive Metastore port

9083

Hive heap memory settings
Parameter Description Default value

HiveServer2 Heap Memory

Sets initial (-Xms) and maximum (-Xmx) Java heap size for HiveServer2

-Xms256m -Xmx256m

Hive Metastore Heap Memory

Sets initial (-Xms) and maximum (-Xmx) Java heap size for Hive Metastore

-Xms256m -Xmx256m

hive-site.xml
Parameter Description Default value

hive.cbo.enable

When set to true, enables the cost-based optimizer that uses the Calcite framework

true

hive.compute.query.using.stats

When set to true, Hive will answer a few queries like min, max, and count(1) purely using statistics stored in the Metastore. For basic statistics collection, set the configuration property hive.stats.autogather to true. For more advanced statistics collection, run the ANALYZE TABLE queries

false

hive.execution.engine

Selects the execution engine. Supported values are: mr (Map Reduce, default), tez (Tez execution, for Hadoop 2 only), or spark (Spark execution, for Hive 1.1.0 onward)

Tez

hive.log.explain.output

When enabled, logs the EXPLAIN EXTENDED output for the query at log4j INFO level and in the HiveServer2 web UI (Drilldown → Query Plan). Starting Hive 3.1.0, this configuration property only logs as the log4j INFO. To log the EXPLAIN EXTENDED output in WebUI/Drilldown/Query Plan in Hive 3.1.0 and later, use hive.server2.webui.explain.output

true

hive.metastore.event.db.notification.api.auth

Defines whether the Metastore should perform the authorization against database notification related APIs such as get_next_notification. If set to true, then only the superusers in proxy settings have the permission

false

hive.metastore.uris

The Metastore URI used to access metadata in a remote metastore setup. For a remote metastore, you should specify the Thrift metastore server URI: thrift://<hostname>:<port> where <hostname> is a name or IP address of the Thrift metastore server, <port> is the port, on which the Thrift server is listening

 — 

hive.metastore.warehouse.dir

The absolute HDFS file path of the default database for the warehouse, that is local to the cluster

/apps/hive/warehouse

hive.server2.enable.doAs

Impersonate the connected user

false

hive.stats.fetch.column.stats

Annotation of the operator tree with statistics information requires column statistics. Column statistics are fetched from the Metastore. Fetching column statistics for each needed column can be expensive, when the number of columns is high. This flag can be used to disable fetching of column statistics from the Metastore

 — 

hive.tez.container.size

By default, Tez will spawn containers of the size of a mapper. This parameter can be used to overwrite the default value

 — 

hive.support.concurrency

Defines whether Hive should support concurrency or not. A ZooKeeper instance must be up and running for the default Hive Lock Manager to support read/write locks

false

hive.txn.manager

Set this to org.apache.hadoop.hive.ql.lockmgr.DbTxnManager as part of turning on Hive transactions. The default DummyTxnManager replicates pre-Hive-0.13 behavior and provides no transactions

 — 

javax.jdo.option.ConnectionUserName

The metastore database user name

APP

javax.jdo.option.ConnectionPassword

The password for the metastore user name

 — 

javax.jdo.option.ConnectionURL

The JDBC connection URI used to access the data stored in the local Metastore setup. Use the following connection URI: jdbc:<datastore type>://<node name>:<port>/<database name> where:

  • <node name> is the host name or IP address of the data store;

  • <data store type> is the type of the data store;

  • <port> is the port on which the data store listens for remote procedure calls (RPC);

  • <database name> is the name of the database.

For example, the following URI specifies a local metastore that uses MySQL as a data store: jdbc:mysql://hostname23:3306/metastore

 — 

hive.server2.transport.mode

Sets the transport mode

tcp

hive.server2.thrift.http.port

The port number for Thrift Server2 to listen on

10001

hive.server2.thrift.http.path

The HTTP endpoint of the Thrift Server2 service

cliservice

hive.server2.authentication.kerberos.principal

Hive server Kerberos principal

hive/_HOST@EXAMPLE.COM

hive.server2.authentication.kerberos.keytab

The path to the Kerberos keytab file containing the Hive server service principal

/etc/security/keytabs/hive.service.keytab

hive.server2.authentication.spnego.principal

The SPNEGO Kerberos principal

HTTP/_HOST@EXAMPLE.COM

hive.server2.webui.spnego.principal

The SPNEGO Kerberos principal to access Web UI

 — 

hive.server2.webui.spnego.keytab

The SPNEGO Kerberos keytab file to access Web UI

 — 

hive.server2.webui.use.spnego

Defines whether to use Kerberos SPNEGO for Web UI access

false

hive.server2.authentication.spnego.keytab

The path to SPNEGO principal

/etc/security/keytabs/HTTP.service.keytab

hive.server2.authentication

Sets the authentication mode

NONE

hive.metastore.sasl.enabled

If true, the Metastore Thrift interface will be secured with SASL. Clients must authenticate with Kerberos

false

hive.metastore.kerberos.principal

The service principal for the metastore Thrift server. The _HOST token will be automatically replaced with the appropriate host name

hive/_HOST@EXAMPLE.COM

hive.metastore.kerberos.keytab.file

The path to the Kerberos keytab file containing the metastore Thrift server’s service principal

/etc/security/keytabs/hive.service.keytab

hive.server2.use.SSL

Defines whether to use SSL for HiveServer2

false

hive.server2.keystore.path

The keystore to be used by Hive

 — 

hive.server2.keystore.password

The password to the Hive keystore

 — 

hive.server2.truststore.path

The truststore to be used by Hive

 — 

hive.server2.webui.use.ssl

Defines whether to use SSL for the Hive web UI

false

hive.server2.webui.keystore.path

The path to the keystore file used to access the Hive web UI

 — 

hive.server2.webui.keystore.password

The password to the keystore file used to access the Hive web UI

 — 

hive.server2.support.dynamic.service.discovery

Defines whether to support dynamic service discovery via ZooKeeper

false

hive.zookeeper.quorum

A comma-separated list of ZooKeeper servers (<host>:<port>) running in the cluster

zookeeper:2181

hive.server2.zookeeper.namespace

Specifies the root namespace on ZooKeeper

hiveserver2

ranger-hive-audit.xml
Parameter Description Default value

xasecure.audit.destination.solr.batch.filespool.dir

The spool directory path

/srv/ranger/hdfs_plugin/audit_solr_spool

xasecure.audit.destination.solr.urls

Leave this property value empty or set it to NONE when using ZooKeeper to connect to Solr

 — 

xasecure.audit.destination.solr.zookeepers

Specifies the ZooKeeper connection string for the Solr destination

 — 

xasecure.audit.destination.solr.force.use.inmemory.jaas.config

Uses in-memory JAAS configuration file to connect to Solr

 — 

xasecure.audit.is.enabled

Enables Ranger audit

true

xasecure.audit.jaas.Client.loginModuleControlFlag

Specifies whether the success of the module is required, requisite, sufficient, or optional

 — 

xasecure.audit.jaas.Client.loginModuleName

The name of the authenticator class

 — 

xasecure.audit.jaas.Client.option.keyTab

The name of the keytab file to get the principal’s secret key

 — 

xasecure.audit.jaas.Client.option.principal

The name of the principal to be used

 — 

xasecure.audit.jaas.Client.option.serviceName

Represents a user or a service that wants to log in

 — 

xasecure.audit.jaas.Client.option.storeKey

Set this to true if you want the keytab or the principal’s key to be stored in the subject’s private credentials

false

xasecure.audit.jaas.Client.option.useKeyTab

Set this to true if you want the module to get the principal’s key from the keytab

false

ranger-hive-security.xml
Parameter Description Default value

ranger.plugin.hive.policy.rest.url

The URL to Ranger Admin

 — 

ranger.plugin.hive.service.name

The name of the Ranger service containing policies for this instance

 — 

ranger.plugin.hive.policy.cache.dir

The directory where Ranger policies are cached after successful retrieval from the source

/srv/ranger/hive/policycache

ranger.plugin.hive.policy.pollIntervalMs

Defines how often to poll for changes in policies

30000

ranger.plugin.hive.policy.rest.client.connection.timeoutMs

The Hive Plugin RangerRestClient connection timeout (in milliseconds)

120000

ranger.plugin.hive.policy.rest.client.read.timeoutMs

The Hive Plugin RangerRestClient read timeout (in milliseconds)

30000

xasecure.hive.update.xapolicies.on.grant.revoke

Controls Hive Ranger policy update from SQL Grant/Revoke commands

true

ranger.plugin.hive.policy.rest.ssl.config.file

The path to the RangerRestClient SSL config file for the Hive plugin

/etc/hive/conf/ranger-hive-policymgr-ssl.xml

ranger-hive-policymgr-ssl.xml
Parameter Description Default value

xasecure.policymgr.clientssl.keystore

The path to the keystore file used by Ranger

 — 

xasecure.policymgr.clientssl.keystore.credential.file

The path to the keystore credentials file

/etc/hive/conf/ranger-hive.jceks

xasecure.policymgr.clientssl.truststore.credential.file

The path to the truststore credentials file

/etc/hive/conf/ranger-hive.jceks

xasecure.policymgr.clientssl.truststore

The path to the truststore file used by Ranger

 — 

xasecure.policymgr.clientssl.keystore.password

The password to the keystore file

 — 

xasecure.policymgr.clientssl.truststore.password

The password to the truststore file

 — 

tez-site.xml
Parameter Description Default value

tez.am.resource.memory.mb

The amount of memory in MB, that YARN will allocate to the Tez Application Master. The size increases with the size of the DAG

 — 

tez.history.logging.service.class

Enables Tez to use the Timeline Server for History Logging

org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService

tez.lib.uris

HDFS paths containing the Tez JAR files

${fs.defaultFS}/apps/tez/tez-0.9.2.tar.gz

tez.task.resource.memory.mb

The amount of memory used by launched tasks in TEZ containers. Usually this value is set in the DAG

 — 

tez.tez-ui.history-url.base

The URL where the Tez UI is hosted

 — 

tez.use.cluster.hadoop-libs

Specifies, whether Tez will use the cluster Hadoop libraries

true

nginx.conf
Parameter Description Default value

ssl_certificate

The path to the SSL certificate for NGINX

/etc/ssl/certs/host_cert.cert

ssl_certificate_key

The path to the SSL certificate key for NGINX

/etc/ssl/host_cert.key

Other
Parameter Description Default value

ACID Transactions

Defines whether to enable ACID transactions

false

Custom hive-site.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file hive-site.xml

 — 

Custom hive-env.sh

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file hive-env.sh

 — 

Ranger plugin enabled

Whether or not Ranger plugin is enabled

false

Custom ranger-hive-audit.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ranger-hive-audit.xml

 — 

Custom ranger-hive-security.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ranger-hive-security.xml

 — 

Custom ranger-hive-policymgr-ssl.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ranger-hive-policymgr-ssl.xml

 — 

Custom tez-site.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file tez-site.xml

 — 

MySQL

root user
Parameter Description Default value

Password

The root password

 — 

Solr

solr-env.sh
Parameter Description Default value

SOLR_HOME

The location for index data and configs

/srv/solr/server

SOLR_AUTH_TYPE

Specifies the authentication type for Solr

 — 

SOLR_AUTHENTICATION_OPTS

Solr authentication options

 — 

GC_TUNE

JVM parameters for Solr

-XX:-UseLargePages

SOLR_SSL_KEY_STORE:

The path to the Solr keystore file (.jks)

 — 

SOLR_SSL_KEY_STORE_PASSWORD

The password to the Solr keystore file

 — 

SOLR_SSL_TRUST_STORE

The path to the Solr truststore file (.jks)

 — 

SOLR_SSL_TRUST_STORE_PASSWORD

The password to the Solr truststore file

 — 

SOLR_SSL_NEED_CLIENT_AUTH

Defines if client authentication is enabled

false

SOLR_SSL_WANT_CLIENT_AUTH

Enables clients to authenticate (but not requires)

false

SOLR_SSL_CLIENT_HOSTNAME_VERIFICATION

Defines whether to enable hostname verification

false

SOLR_HOST

Specifies the host name of the Solr server

 — 

External zookeeper
Parameter Description Default value

ZK_HOST

Comma-separated locations of all servers in the ensemble and the ports on which they communicate. You can put ZooKeeper chroot at the end of your ZK_HOST connection string. For example, host1.mydomain.com:2181,host2.mydomain.com:2181,host3.mydomain.com:2181/solr

 — 

Solr server heap memory settings
Parameter Description Default value

Solr Server Heap Memory

Sets initial (-Xms) and maximum (-Xmx) Java heap size for Solr Server

-Xms512m -Xmx512m

ranger-solr-audit.xml
Parameter Description Default value

xasecure.audit.solr.solr_url

A path to a Solr collection to store audit logs

 — 

xasecure.audit.solr.async.max.queue.size

The maximum size of internal queue used for storing audit logs

1

xasecure.audit.solr.async.max.flush.interval.ms

The maximum time interval between flushes to disk (in milliseconds)

100

ranger-solr-security.xml
Parameter Description Default value

ranger.plugin.solr.policy.rest.url

The URL to Ranger Admin

 — 

ranger.plugin.solr.service.name

The name of the Ranger service containing policies for this instance

 — 

ranger.plugin.solr.policy.cache.dir

The directory where Ranger policies are cached after successful retrieval from the source

/srv/ranger/yarn/policycache

ranger.plugin.solr.policy.pollIntervalMs

Defines how often to poll for changes in policies

30000

ranger.plugin.solr.policy.rest.client.connection.timeoutMs

The Solr Plugin RangerRestClient connection timeout (in milliseconds)

120000

ranger.plugin.solr.policy.rest.client.read.timeoutMs

The Solr Plugin RangerRestClient read timeout (in milliseconds)

30000

ranger-solr-policymgr-ssl.xml
Parameter Description Default value

xasecure.policymgr.clientssl.keystore

The path to the keystore file used by Ranger

 — 

xasecure.policymgr.clientssl.keystore.credential.file

The path to the keystore credentials file

/etc/solr/conf/ranger-solr.jceks

xasecure.policymgr.clientssl.truststore.credential.file

The path to the truststore credentials file

/etc/solr/conf/ranger-solr.jceks

xasecure.policymgr.clientssl.truststore

The path to the truststore file used by Ranger

 — 

xasecure.policymgr.clientssl.keystore.password

The password to the keystore file

 — 

xasecure.policymgr.clientssl.truststore.password

The password to the truststore file

 — 

Other
Parameter Description Default value

solr.xml

The content of solr.xml

Custom solr-env.sh

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file solr-env.sh

 — 

Ranger plugin enabled

Enables the Ranger plugin

false

Spark

Common
Parameter Description Default value

Dynamic allocation (spark.dynamicAllocation.enabled)

Defines whether to use dynamic resource allocation that scales the number of executors, registered with this application, up and down, based on the workload

false

spark-defaults.conf
Parameter Description Default value

spark.yarn.archive

The archive containing needed Spark JARs for distribution to the YARN cache. If set, this configuration replaces spark.yarn.jars and the archive is used in all the application containers. The archive should contain JAR files in its root directory. The archive can also be hosted on HDFS to speed up file distribution

hdfs:///apps/spark/spark-yarn-archive.tgz

spark.master

The cluster manager to connect to

yarn

spark.dynamicAllocation.enabled

Defines whether to use dynamic resource allocation that scales the number of executors, registered with this application, up and down, based on the workload

false

spark.shuffle.service.enabled

Enables the external shuffle service. This service preserves the shuffle files written by executors so that executors can be safely removed, or so that shuffle fetches can continue in the event of executor failure. The external shuffle service must be set up in order to enable it

false

spark.eventLog.enabled

Defines whether to log Spark events, useful for reconstructing the Web UI after the application has finished

false

spark.eventLog.dir

The base directory where Spark events are logged, if spark.eventLog.enabled=true. Within this base directory, Spark creates a sub-directory for each application, and logs the events specific to the application in this directory. You may want to set this to a unified location like an HDFS directory so history files can be read by the History Server

hdfs:///var/log/spark/apps

spark.serializer

The class to use for serializing objects that will be sent over the network or need to be cached in serialized form. The default of Java serialization works with any Serializable Java object but is quite slow, so we recommend using org.apache.spark.serializer.KryoSerializer and configuring Kryo serialization when speed is necessary. Can be any subclass of org.apache.spark.Serializer

org.apache.spark.serializer.KryoSerializer

spark.dynamicAllocation.executorIdleTimeout

If dynamic allocation is enabled and an executor has been idle for more than this duration, the executor will be removed. For more details, see Spark documentation

120s

spark.dynamicAllocation.cachedExecutorIdleTimeout

If dynamic allocation is enabled and an executor which has cached data blocks has been idle for more than this duration, the executor will be removed. For more details, see Spark documentation

600s

spark.history.provider

The name of the class that implements the application history backend. Currently there is only one implementation provided with Spark that looks for application logs stored in the file system

org.apache.spark.deploy.history.FsHistoryProvider

spark.history.fs.cleaner.enabled

Specifies whether the History Server should periodically clean up event logs from storage

true

spark.history.store.path

A local directory where to cache application history data. If set, the History Server will store application data on disk instead of keeping it in memory. The data written to disk will be re-used in case of the History Server restart

/var/log/spark/history

spark.ssl.enabled

Defines whether to use SSL for Spark

false

spark.ssl.protocol

TLS protocol to be used. The protocol must be supported by JVM

TLSv1.2

spark.ssl.ui.port

The port where the SSL service will listen on

4040

spark.ssl.historyServer.port

The port to access History Server web UI

18082

spark.ssl.keyPassword

The password to the private key in the key store

 — 

spark.ssl.keyStore

The path to the keystore file

 — 

spark.ssl.keyStoreType

The type of the keystore

JKS

spark.ssl.trustStorePassword

The password to the truststore used by Spark

 — 

spark.ssl.trustStore

The path to the truststore file

 — 

spark.ssl.trustStoreType

The type of the truststore

JKS

spark.history.kerberos.enabled

Indicates whether the History Server should use Kerberos to login. This is required if the History Server is accessing HDFS files on a secure Hadoop cluster

false

spark.acls.enable

Enables Spark ACL

false

spark.modify.acls

Defines who has access to modify a running Spark application

spark,hdfs

spark.modify.acls.groups

A comma-separated list of user groups that have modify access to the Spark application

spark,hdfs

spark.history.ui.acls.enable

Specifies whether ACLs should be checked to authorize users viewing the applications in the History Server. If enabled, access control checks are performed regardless of what the individual applications had set for spark.ui.acls.enable. If disabled, no access control checks are made for any application UIs available through the History Server

false

spark.history.ui.admin.acls

A comma-separated list of users that have view access to all the Spark applications in History Server

spark,hdfs,dr.who

spark.history.ui.admin.acls.groups

A comma-separated list of groups that have view access to all the Spark applications in History Server

spark,hdfs,dr.who

spark.ui.view.acls

A comma-separated list of users that have view access to the Spark application. By default, only the user that started the Spark job has view access. Using * as a value means that any user can have view access to this Spark job

spark,hdfs,dr.who

spark.ui.view.acls.groups

A comma-separated list of groups that have view access to the Spark web UI to view the Spark Job details. This can be used if you have a set of administrators or developers or users who can monitor the Spark job submitted. Using * in the list means any user in any group can view the Spark job details on the Spark web UI. The user groups are obtained from the instance of the groups mapping provider specified by spark.user.groups.mapping

spark,hdfs,dr.who

Spark heap memory settings
Parameter Description Default value

Spark History Server Heap Memory

Sets initial (-Xms) and maximum (-Xmx) Java heap size for Spark History Server

1G

Spark Thrift Server Heap Memory

Sets initial (-Xms) and maximum (-Xmx) Java heap size for Spark Thrift Server

1G

Livy Server Heap Memory

Sets initial (-Xms) and maximum (-Xmx) Java heap size for Livy Server

-Xms300m -Xmx4G

livy.conf
Parameter Description Default value

livy.server.host

The host address to start the Livy server. By default, Livy will bind to all network interfaces

0.0.0.0

livy.server.port

The port to run the Livy server

8998

livy.spark.master

The Spark master to use for Livy sessions

yarn-cluster

livy.impersonation.enabled

Defines if Livy should impersonate users when creating a new session

true

livy.server.csrf-protection.enabled

Defines whether to enable the csrf protection. If enabled, clients should add the X-Requested-By HTTP header for POST/DELETE/PUT/PATCH HTTP methods

true

livy.repl.enable-hive-context

Defines whether to enable HiveContext in the Livy interpreter. If set to true, hive-site.xml and the Livy server classpath will be detected on user request automatically

true

livy.server.recovery.mode

Sets the recovery mode for Livy

recovery

livy.server.recovery.state-store

Defines where Livy should store the state for recovery

filesystem

livy.server.recovery.state-store.url

For the filesystem state store, the path of the state store directory. Do not use a filesystem that does not support atomic rename (for example, S3). For example: file:///tmp/livy or hdfs:///. For ZooKeeper, specify the address to the ZooKeeper servers. For example: host1:port1,host2:port2

/livy-recovery

livy.server.auth.type

Sets the Livy authentication type

 — 

livy.server.access_control.enabled

Defines whether to enable the access control for a Livy server. If set to true, then all the incoming requests will be checked if the requested user has permission

false

livy.server.access_control.users

Users allowed to access Livy. By default, any user is allowed to access Livy. If a user wants to limit the access, the user should list all the permitted users separated by a comma

livy,hdfs,spark

livy.superusers

A list of comma-separated users that have the permissions to change other user’s submitted session, like submitting statements, deleting session, and so on

livy,hdfs,spark

livy.keystore

A path to the keystore file. The path can be absolute or relative to the directory in which the process is started

 — 

livy.keystore.password

The password to access the keystore

 — 

livy.key-password

The password to access the key in the keystore

 — 

Other
Parameter Description Default value

Custom spark-defaults.conf

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file spark-defaults.conf

 — 

spark-env.sh

Enter the contents for the spark-env.sh file that is used to initialize environment variables on worker nodes

spark-env.sh

Custom livy.conf

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file livy.conf

 — 

livy-env.sh

Enter the contents for the livy-env.sh file that is used to prepare the environment for Livy startup

livy-env.sh

thriftserver-env.sh

Enter the contents for the thriftserver-env.sh file that is used to prepare the environment for Thrift server startup

thriftserver-env.sh

spark-history-env.sh

Enter the contents for the spark-history-env.sh file that is used to prepare the environment for History Server startup

spark-history-env.sh

Spark3

Common
Parameter Description Default value

Dynamic allocation (spark.dynamicAllocation.enabled)

Defines whether to use dynamic resource allocation that scales the number of executors, registered with this application, up and down, based on the workload

false

spark-defaults.conf
Parameter Description Default value

spark.yarn.archive

The archive containing all the required Spark JARs for distribution to the YARN cache. If set, this configuration replaces spark.yarn.jars and the archive is used in all the application containers. The archive should contain JAR files in its root directory. The archive can also be hosted on HDFS to speed up file distribution

hdfs:///apps/spark/spark3-yarn-archive.tgz

spark.master

The cluster manager to connect to

yarn

spark.dynamicAllocation.enabled

Defines whether to use dynamic resource allocation that scales the number of executors, registered with this application, up and down, based on the workload

false

spark.shuffle.service.enabled

Enables the external shuffle service. This service preserves the shuffle files written by executors so that executors can be safely removed, or so that shuffle fetches can continue in the event of executor failure. The external shuffle service must be set up in order to enable it

false

spark.eventLog.enabled

Defines whether to log Spark events, useful for reconstructing the Web UI after the application has finished

false

spark.eventLog.dir

The base directory where Spark events are logged, if spark.eventLog.enabled=true. Within this base directory, Spark creates a sub-directory for each application, and logs the events specific to the application in this directory. You may want to set this to a unified location like an HDFS directory so history files can be read by the History Server

hdfs:///var/log/spark/apps

spark.dynamicAllocation.executorIdleTimeout

If dynamic allocation is enabled and an executor has been idle for more than this duration, the executor will be removed. For more details, see Spark documentation

120s

spark.dynamicAllocation.cachedExecutorIdleTimeout

If dynamic allocation is enabled and an executor which has cached data blocks has been idle for more than this duration, the executor will be removed. For more details, see Spark documentation

600s

spark.history.provider

The name of the class that implements the application history backend. Currently there is only one implementation provided with Spark that looks for application logs stored in the file system

org.apache.spark.deploy.history.FsHistoryProvider

spark.history.fs.cleaner.enabled

Specifies whether the History Server should periodically clean up event logs from storage

true

spark.history.store.path

A local directory where to cache application history data. If set, the History Server will store application data on disk instead of keeping it in memory. The data written to disk will be re-used in case of the History Server restart

/var/log/spark/history

spark.history.kerberos.enabled

Indicates whether the History Server should use Kerberos to login. This is required if the History Server is accessing HDFS files on a secure Hadoop cluster

false

spark.acls.enable

A comma-separated list of users that have modify access to the Spark application

spark,hdfs

spark.modify.acls

Defines who has access to modify a running Spark application

spark,hdfs

spark.modify.acls.groups

A comma-separated list of user groups that have modify access to the Spark application

spark,hdfs

spark.history.ui.acls.enable

Specifies whether ACLs should be checked to authorize users viewing the applications in the History Server. If enabled, access control checks are performed regardless of what the individual applications had set for spark.ui.acls.enable. If disabled, no access control checks are made for any application UIs available through the History Server

false

spark.history.ui.admin.acls

A comma-separated list of users that have view access to all the Spark applications in History Server

spark,hdfs,dr.who

spark.history.ui.admin.acls.groups

A comma-separated list of groups that have view access to all the Spark applications in History Server

spark,hdfs,dr.who

spark.ui.view.acls

A comma-separated list of users that have view access to the Spark application. By default, only the user that started the Spark job has view access. Using * as a value means that any user can have view access to this Spark job

spark,hdfs,dr.who

spark.ui.view.acls.groups

A comma-separated list of groups that have view access to the Spark web UI to view the Spark Job details. This can be used if you have a set of administrators or developers or users who can monitor the Spark job submitted. Using * in the list means any user in any group can view the Spark job details on the Spark web UI. The user groups are obtained from the instance of the groups mapping provider specified by spark.user.groups.mapping

spark,hdfs,dr.who

Other
Parameter Description Default value

Custom spark-defaults.conf

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file spark-defaults.conf

 — 

spark-env.sh

Enter the contents for the spark-env.sh file that is used to initialize environment variables on worker nodes

spark-env.sh

Sqoop

sqoop-site.xml
Parameter Description Default value

sqoop.metastore.client.autoconnect.url

The connection string to use when connecting to a job-management metastore. If not set, uses ~/.sqoop/

 — 

sqoop.metastore.server.location

The path to the shared metastore database files. If not set, uses ~/.sqoop/

/srv/sqoop/metastore.db

sqoop.metastore.server.port

The port that this metastore should listen on

16100

sqoop-metastore-env.sh
Parameter Description Default value

HADOOP_OPTS

Sets initial (-Xms) and maximum (-Xmx) Java heap size for Sqoop

-Xms800M -Xmx10G

Other
Parameter Description Default value

Custom sqoop-site.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file sqoop-site.xml

 — 

Custom sqoop-metastore-env.sh

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file sqoop-metastore-env.sh

 — 

YARN

mapred-site.xml
Parameter Description Default value

mapreduce.application.classpath

The CLASSPATH for MapReduce applications. A comma-separated list of CLASSPATH entries. If mapreduce.application.framework is set, then this must specify the appropriate CLASSPATH for that archive, and the name of the archive must be present in the CLASSPATH. If mapreduce.app-submission.cross-platform is false, platform-specific environment variable expansion syntax would be used to construct the default CLASSPATH entries. If mapreduce.app-submission.cross-platform is true, platform-agnostic default CLASSPATH for MapReduce applications would be used:

{{HADOOP_MAPRED_HOME}}/share/hadoop/mapreduce/*, {{HADOOP_MAPRED_HOME}}/share/hadoop/mapreduce/lib/*

Parameter expansion marker will be replaced by NodeManager on container launch, based on the underlying OS accordingly

/etc/hadoop/conf/*:/usr/lib/hadoop/*:/usr/lib/hadoop/lib/*:/usr/lib/hadoop-hdfs/*:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-yarn/*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-mapreduce/*:/usr/lib/hadoop-mapreduce/lib/*

mapreduce.cluster.local.dir

The local directory where MapReduce stores intermediate data files. May be a comma-separated list of directories on different devices in order to spread disk I/O. Directories that do not exist, are ignored

/srv/hadoop-yarn/mr-local

mapreduce.framework.name

The runtime framework for executing MapReduce jobs. Can be one of local, classic, or yarn

yarn

mapreduce.jobhistory.address

MapReduce JobHistory Server IPC (<host>:<port>)

 — 

mapreduce.jobhistory.bind-host

Setting the value to 0.0.0.0 will cause the MapReduce daemons to listen on all addresses and interfaces of the hosts in the cluster

0.0.0.0

mapreduce.jobhistory.webapp.address

MapReduce JobHistory Server Web UI (<host>:<port>)

 — 

mapreduce.map.env

Environment variables for the map task processes added by a user, specified as a comma separated list. Example: VAR1=value1,VAR2=value2

HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce

mapreduce.reduce.env

Environment variables for the reduce task processes added by a user, specified as a comma separated list. Example: VAR1=value1,VAR2=value2

HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce

yarn.app.mapreduce.am.env

Environment variables for the MapReduce App Master processes added by a user. Examples:

  • A=foo. This sets the environment variable A to foo.

  • B=$B:c. This inherits the tasktracker B environment variable.

HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce

yarn.app.mapreduce.am.staging-dir

The staging directory used while submitting jobs

/user

mapreduce.jobhistory.keytab

The location of the Kerberos keytab file for the MapReduce JobHistory Server

/etc/security/keytabs/mapreduce-historyserver.service.keytab

mapreduce.jobhistory.principal

Kerberos principal name for the MapReduce JobHistory Server

mapreduce-historyserver/_HOST@REALM

mapreduce.jobhistory.http.policy

Configures the HTTP endpoint for JobHistoryServer web UI. The following values are supported:

  • HTTP_ONLY — provides service only via HTTP;

  • HTTPS_ONLY — provides service only via HTTPS.

HTTP_ONLY

mapreduce.jobhistory.webapp.https.address

The HTTPS address where MapReduce JobHistory Server WebApp is running

0.0.0.0:19890

mapreduce.shuffle.ssl.enabled

Defines whether to use SSL for for the Shuffle HTTP endpoints

false

ranger-yarn-audit.xml
Parameter Description Default value

xasecure.audit.destination.solr.batch.filespool.dir

The spool directory path

/srv/ranger/hdfs_plugin/audit_solr_spool

xasecure.audit.destination.solr.urls

Leave this property value empty or set it to NONE when using ZooKeeper to connect to Solr

 — 

xasecure.audit.destination.solr.zookeepers

Specifies the ZooKeeper connection string for the Solr destination

 — 

xasecure.audit.destination.solr.force.use.inmemory.jaas.config

Uses in-memory JAAS configuration file to connect to Solr

 — 

xasecure.audit.is.enabled

Enables Ranger audit

true

xasecure.audit.jaas.Client.loginModuleControlFlag

Specifies whether the success of the module is required, requisite, sufficient, or optional

 — 

xasecure.audit.jaas.Client.loginModuleName

The name of the authenticator class

 — 

xasecure.audit.jaas.Client.option.keyTab

The name of the keytab file to get the principal’s secret key

 — 

xasecure.audit.jaas.Client.option.principal

The name of the principal to be used

 — 

xasecure.audit.jaas.Client.option.serviceName

Represents a user or a service that wants to log in

 — 

xasecure.audit.jaas.Client.option.storeKey

Set this to true if you want the keytab or the principal’s key to be stored in the subject’s private credentials

false

xasecure.audit.jaas.Client.option.useKeyTab

Set this to true if you want the module to get the principal’s key from the keytab

false

ranger-yarn-security.xml
Parameter Description Default value

ranger.plugin.yarn.policy.rest.url

The URL to Ranger Admin

 — 

ranger.plugin.yarn.service.name

The name of the Ranger service containing policies for this instance

 — 

ranger.plugin.yarn.policy.cache.dir

The directory where Ranger policies are cached after successful retrieval from the source

/srv/ranger/yarn/policycache

ranger.plugin.yarn.policy.pollIntervalMs

Defines how often to poll for changes in policies

30000

ranger.plugin.yarn.policy.rest.client.connection.timeoutMs

The YARN Plugin RangerRestClient connection timeout (in milliseconds)

120000

ranger.plugin.yarn.policy.rest.client.read.timeoutMs

The YARN Plugin RangerRestClient read timeout (in milliseconds)

30000

ranger.add-yarn-authorization

Set true to use only Ranger ACLs (i.e. ignore YARN ACLs)

false

ranger.plugin.yarn.policy.rest.ssl.config.file

The path to the RangerRestClient SSL config file for the YARN plugin

/etc/yarn/conf/ranger-yarn-policymgr-ssl.xml

yarn-site.xml
Parameter Description Default value

yarn.application.classpath

The CLASSPATH for YARN applications. A comma-separated list of CLASSPATH entries. When this value is empty, the following default CLASSPATH for YARN applications would be used.

  • For Linux:

    $HADOOP_CONF_DIR, $HADOOP_COMMON_HOME/share/hadoop/common/*, $HADOOP_COMMON_HOME/share/hadoop/common/lib/*, $HADOOP_HDFS_HOME/share/hadoop/hdfs/*, $HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*, $HADOOP_YARN_HOME/share/hadoop/yarn/*, $HADOOP_YARN_HOME/share/hadoop/yarn/lib/*
  • For Windows:

    %HADOOP_CONF_DIR%, %HADOOP_COMMON_HOME%/share/hadoop/common/*, %HADOOP_COMMON_HOME%/share/hadoop/common/lib/*, %HADOOP_HDFS_HOME%/share/hadoop/hdfs/*, %HADOOP_HDFS_HOME%/share/hadoop/hdfs/lib/*, %HADOOP_YARN_HOME%/share/hadoop/yarn/*, %HADOOP_YARN_HOME%/share/hadoop/yarn/lib/*
/etc/hadoop/conf/*:/usr/lib/hadoop/*:/usr/lib/hadoop/lib/*:/usr/lib/hadoop-hdfs/*:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-yarn/*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-mapreduce/*:/usr/lib/hadoop-mapreduce/lib/*

yarn.cluster.max-application-priority

Defines the maximum application priority in a cluster. Leaf Queue-level priority: each leaf queue provides default priority by the administrator. The queue default priority will be used for any application submitted without a specified priority. $HADOOP_HOME/etc/hadoop/capacity-scheduler.xml is the configuration file for queue-level priority

0

yarn.log.server.url

The URL for log aggregation Server

 — 

yarn.log-aggregation-enable

Whether to enable log aggregation. Log aggregation collects logs from each container and moves these logs onto a file system, for example HDFS, after the application processing completes. Users can configure the yarn.nodemanager.remote-app-log-dir and yarn.nodemanager.remote-app-log-dir-suffix properties to determine, where these logs are moved to. Users can access the logs via the Application Timeline Server

true

yarn.log-aggregation.retain-seconds

Defines how long to keep aggregation logs before deleting them. The value of -1 disables logs saving. Be careful: setting this value too small will spam the NameNode

172800

yarn.nodemanager.local-dirs

The list of directories to store localized. An application localized file directory will be found in: ${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}. Individual containers work directories, called container_${contid}, will be subdirectories of this

/srv/hadoop-yarn/nm-local

yarn.node-labels.enabled

Enables node labels feature

true

yarn.node-labels.fs-store.root-dir

The URI for NodeLabelManager. The default value is /tmp/hadoop-yarn-${user}/node-labels/ in the local filesystem

hdfs:///system/yarn/node-labels

yarn.timeline-service.bind-host

The actual address the server will bind to. If this optional address is set, the RPC and Webapp servers will bind to this address and the port, specified in yarn.timeline-service.address and yarn.timeline-service.webapp.address, respectively. This is most useful for making the service listen to all interfaces by setting to 0.0.0.0

0.0.0.0

yarn.timeline-service.leveldb-timeline-store.path

Stores file name for leveldb Timeline store

/srv/hadoop-yarn/leveldb-timeline-store

yarn.nodemanager.address

The address of the container manager in the NodeManager

0.0.0.0:8041

yarn.nodemanager.aux-services

A comma-separated list of services, where service name should only contain a-zA-Z0-9_ and cannot start with numbers

mapreduce_shuffle,spark2_shuffle,spark_shuffle

yarn.nodemanager.aux-services.mapreduce_shuffle.class

The auxiliary service class to use

org.apache.hadoop.mapred.ShuffleHandler

yarn.nodemanager.aux-services.spark2_shuffle.class

The class name of YarnShuffleService — an external shuffle service for Spark 2 on YARN

org.apache.spark.network.yarn.YarnShuffleService

yarn.nodemanager.aux-services.spark2_shuffle.classpath

The path to YarnShuffleService — an external shuffle service for Spark 2 on YARN

/usr/lib/spark/yarn/lib/*

yarn.nodemanager.aux-services.spark_shuffle.class

The class name of YarnShuffleService — an external shuffle service for Spark 3 on YARN

org.apache.spark.network.yarn.YarnShuffleService

yarn.nodemanager.aux-services.spark_shuffle.classpath

The path to YarnShuffleService — an external shuffle service for Spark 3 on YARN

/usr/lib/spark3/yarn/lib/*

yarn.nodemanager.recovery.enabled

Enables the NodeManager to recover after starting

true

yarn.nodemanager.recovery.dir

The local filesystem directory, in which the NodeManager will store state, when recovery is enabled

/srv/hadoop-yarn/nm-recovery

yarn.nodemanager.remote-app-log-dir

Defines a directory for logs aggregation

/logs

yarn.nodemanager.resource-plugins

Enables additional discovery/isolation of resources on the NodeManager. By default, this parameters is empty. Acceptable values: yarn.io/gpu, yarn.io/fpga

 — 

yarn.nodemanager.resource-plugins.gpu.path-to-discovery-executables

When yarn.nodemanager.resource.gpu.allowed-gpu-devices=auto, YARN NodeManager needs to run GPU discovery binary (now only support nvidia-smi) to get GPU-related information. When value is empty (default), YARN NodeManager will try to locate discovery executable itself. An example of the config value is: /usr/local/bin/nvidia-smi

/usr/bin/nvidia-smi

yarn.nodemanager.resource.detect-hardware-capabilities

Enables auto-detection of node capabilities such as memory and CPU

true

yarn.nodemanager.vmem-check-enabled

Whether virtual memory limits will be enforced for containers

false

yarn.resource-types

The resource types to be used for scheduling. Use resource-types.xml to specify details about the individual resource types

 — 

yarn.resourcemanager.bind-host

The actual address, the server will bind to. If this optional address is set, the RPC and Webapp servers will bind to this address and the port, specified in yarn.resourcemanager.address and yarn.resourcemanager.webapp.address, respectively. This is most useful for making Resource Manager listen to all interfaces by setting to 0.0.0.0

0.0.0.0

yarn.resourcemanager.cluster-id

The name of the cluster. In the High Availability mode, this parameter is used to ensure that Resource Manager participates in leader election for this cluster and ensures that it does not affect other clusters

 — 

yarn.resource-types.memory-mb.increment-allocation

The FairScheduler grants memory equal to increments of this value. If you submit a task with a resource request which is not a multiple of memory-mb.increment-allocation, the request will be rounded up to the nearest increment

1024

yarn.resource-types.vcores.increment-allocation

The FairScheduler grants vcores in increments of this value. If you submit a task with resource request, that is not a multiple of vcores.increment-allocation, the request will be rounded up to the nearest increment

1

yarn.resourcemanager.ha.enabled

Enables Resource Manager High Availability. When enabled:

  • The Resource Manager starts in the Standby mode by default, and transitions to the Active mode when prompted to.

  • The nodes in the Resource Manager ensemble are listed in yarn.resourcemanager.ha.rm-ids.

  • The id of each Resource Manager either comes from yarn.resourcemanager.ha.id, if yarn.resourcemanager.ha.id is explicitly specified, or can be figured out by matching yarn.resourcemanager.address.{id} with local address.

  • The actual physical addresses come from the configs of the pattern {rpc-config}.{id}.

false

yarn.resourcemanager.ha.rm-ids

The list of Resource Manager nodes in the cluster when the High Availability is enabled. See description of yarn.resourcemanager.ha.enabled for full details on how this is used

 — 

yarn.resourcemanager.hostname

The host name of the Resource Manager

 — 

yarn.resourcemanager.leveldb-state-store.path

The Local path, where the Resource Manager state will be stored, when using org.apache.hadoop.yarn.server.resourcemanager.recovery.LeveldbRMStateStore as the value for yarn.resourcemanager.store.class

/srv/hadoop-yarn/leveldb-state-store

yarn.resourcemanager.monitor.capacity.queue-management.monitoring-interval

The time between invocations of this QueueManagementDynamicEditPolicy policy (in milliseconds)

1500

yarn.resourcemanager.reservation-system.enable

Enables the ReservationSystem in the ResourceManager

false

yarn.resourcemanager.reservation-system.planfollower.time-step

The frequency of the PlanFollower timer (in milliseconds). A large value is expected

1000

Resource scheduler

The type of a pluggable scheduler for Hadoop. Available values: CapacityScheduler and FairScheduler. CapacityScheduler allows for multiple-tenants to securely share a large cluster such that their applications are allocated resources in a timely manner under constraints of allocated capacities. FairScheduler allows YARN applications to share resources in large clusters fairly

CapacityScheduler

yarn.resourcemanager.scheduler.monitor.enable

Enables a set of periodic monitors (specified in yarn.resourcemanager.scheduler.monitor.policies) that affect the Scheduler

false

yarn.resourcemanager.scheduler.monitor.policies

The list of SchedulingEditPolicy classes that interact with the Scheduler. A particular module may be incompatible with the Scheduler, other policies, or a configuration of either

org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy

yarn.resourcemanager.monitor.capacity.preemption.observe_only

If set to true, run the policy but do not affect the cluster with preemption and kill events

false

yarn.resourcemanager.monitor.capacity.preemption.monitoring_interval

The time between invocations of this ProportionalCapacityPreemptionPolicy policy (in milliseconds)

3000

yarn.resourcemanager.monitor.capacity.preemption.max_wait_before_kill

The time between requesting a preemption from an application and killing the container (in milliseconds)

15000

yarn.resourcemanager.monitor.capacity.preemption.total_preemption_per_round

The maximum percentage of resources, preempted in a single round. By controlling this value one can throttle the pace, at which containers are reclaimed from the cluster. After computing the total desired preemption, the policy scales it back within this limit

0.1

yarn.resourcemanager.monitor.capacity.preemption.max_ignored_over_capacity

The maximum amount of resources above the target capacity ignored for preemption. This defines a deadzone around the target capacity, that helps to prevent thrashing and oscillations around the computed target balance. High values would slow the time to capacity and (absent natural.completions) it might prevent convergence to guaranteed capacity

0.1

yarn.resourcemanager.monitor.capacity.preemption.natural_termination_factor

Given a computed preemption target, account for containers naturally expiring and preempt only this percentage of the delta. This determines the rate of geometric convergence into the deadzone (MAX_IGNORED_OVER_CAPACITY). For example, a termination factor of 0.5 will reclaim almost 95% of resources within 5 * #WAIT_TIME_BEFORE_KILL, even absent natural termination

0.2

yarn.resourcemanager.nodes.exclude-path

The path to the file with nodes to exclude

/etc/hadoop/conf/exclude-path.xml

yarn.resourcemanager.nodes.include-path

The path to the file with nodes to include

/etc/hadoop/conf/include-path

yarn.resourcemanager.recovery.enabled

Enables Resource Manager to recover state after starting. If set to true, then yarn.resourcemanager.store.class must be specified

true

yarn.resourcemanager.store.class

The class to use as the persistent store. If org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore is used, the store is implicitly fenced; meaning a single Resource Manager is able to use the store at any point in time. More details on this implicit fencing, along with setting up appropriate ACLs is discussed under yarn.resourcemanager.zk-state-store.root-node.acl

 — 

yarn.resourcemanager.system-metrics-publisher.enabled

The setting that controls whether YARN system metrics are published on the Timeline Server or not by Resource Manager

true

yarn.scheduler.fair.user-as-default-queue

Defines whether to use the username, associated with the allocation as the default queue name, in the event, that a queue name is not specified. If this is set to false or unset, all jobs have a shared default queue, named default. Defaults to true. If a queue placement policy is given in the allocations file, this property is ignored

true

yarn.scheduler.fair.preemption

Defines whether to use preemption

false

yarn.scheduler.fair.preemption.cluster-utilization-threshold

The utilization threshold after which the preemption kicks in. The utilization is computed as the maximum ratio of usage to capacity among all resources

0.8f

yarn.scheduler.fair.sizebasedweight

Defines whether to assign shares to individual apps based on their size, rather than providing an equal share to all apps regardless of size. When set to true, apps are weighted by the natural logarithm of one plus the app total requested memory, divided by the natural logarithm of 2

false

yarn.scheduler.fair.assignmultiple

Defines whether to allow multiple container assignments in one heartbeat

false

yarn.scheduler.fair.dynamic.max.assign

If assignmultiple is true, this parameter specifies whether to dynamically determine the amount of resources that can be assigned in one heartbeat. When turned on, about half of the non-allocated resources on the node are allocated to containers in a single heartbeat

true

yarn.scheduler.fair.max.assign

If assignmultiple is true, the maximum amount of containers that can be assigned in one heartbeat. Defaults to -1, which sets no limit

-1

yarn.scheduler.fair.locality.threshold.node

For applications that request containers on particular nodes, this parameter defines the number of scheduling opportunities since the last container assignment to wait before accepting a placement on another node. Expressed as a floating number between 0 and 1, which, as a fraction of the cluster size, is the number of scheduling opportunities to pass up. The default value of -1.0 means not to pass up any scheduling opportunities

-1.0

yarn.scheduler.fair.locality.threshold.rack

For applications, that request containers on particular racks, the number of scheduling opportunities since the last container assignment to wait before accepting a placement on another rack. Expressed as a floating point between 0 and 1, which, as a fraction of the cluster size, is the number of scheduling opportunities to pass up. The default value of -1.0 means not to pass up any scheduling opportunities

-1.0

yarn.scheduler.fair.allow-undeclared-pools

If set to true, new queues can be created at application submission time, whether because they are specified as the application queue by the submitter or because they are placed there by the user-as-default-queue property. If set to false, any time an app would be placed in a queue that is not specified in the allocations file, it is placed in the default queue instead. Defaults to true. If a queue placement policy is given in the allocations file, this property is ignored

true

yarn.scheduler.fair.update-interval-ms

The time interval, at which to lock the scheduler and recalculate fair shares, recalculate demand, and check whether anything is due for preemption

500

yarn.scheduler.minimum-allocation-mb

The minimum allocation for every container request at the Resource Manager (in MB). Memory requests, lower than this, will throw InvalidResourceRequestException

1024

yarn.scheduler.maximum-allocation-mb

The maximum allocation for every container request at the Resource Manager (in MB). Memory requests, higher than this, will throw InvalidResourceRequestException

4096

yarn.scheduler.minimum-allocation-vcores

The minimum allocation for every container request at the Resource Manager, in terms of virtual CPU cores. Requests, lower than this, will throw InvalidResourceRequestException

1

yarn.scheduler.maximum-allocation-vcores

The maximum allocation for every container request at the Resource Manager, in terms of virtual CPU cores. Requests, higher than this, will throw InvalidResourceRequestException

2

yarn.timeline-service.enabled

On the server side this parameter indicates, whether Timeline service is enabled or not. And on the client side, this parameter can be used to indicate whether client wants to use Timeline service. If this parameter is set on the client side along with security, then YARN Client tries to fetch the delegation tokens for the Timeline Server

true

yarn.timeline-service.hostname

The hostname of the Timeline service Web application

 — 

yarn.timeline-service.http-cross-origin.enabled

Enables cross origin support (CORS) for Timeline Server

true

yarn.webapp.ui2.enable

In the Server side it indicates, whether the new YARN UI v2 is enabled or not

true

yarn.resourcemanager.proxy-user-privileges.enabled

If set to true, ResourceManager will have proxy-user privileges. For example: in a secure cluster, YARN requires the user hdfs delegation-tokens to do localization and log-aggregation on behalf of the user. If this is set to true, ResourceManager is able to request new hdfs delegation tokens on behalf of the user. This is needed by long-running-services, because the hdfs tokens will eventually expire and YARN requires new valid tokens to do localization and log-aggregation. Note that to enable this use case, the corresponding HDFS NameNode must have ResourceManager configured as a proxy-user so that ResourceManager can itself ask for new tokens on behalf of the user when tokens are past their max-life-time

false

yarn.resourcemanager.webapp.spnego-principal

The Kerberos principal to be used for SPNEGO filter for the Resource Manager web UI

HTTP/_HOST@REALM

yarn.resourcemanager.webapp.spnego-keytab-file

The Kerberos keytab file to be used for SPNEGO filter for the Resource Manager web UI

/etc/security/keytabs/HTTP.service.keytab

yarn.nodemanager.linux-container-executor.group

The UNIX group that the linux-container-executor should run as

yarn

yarn.resourcemanager.webapp.delegation-token-auth-filter.enabled

A flag to enable override of the default Kerberos authentication filter with the RM authentication filter to allow authentication using delegation tokens (fallback to Kerberos if the tokens are missing). Only applicable when the http authentication type is kerberos

false

yarn.resourcemanager.principal

The Kerberos principal for the Resource Manager

yarn-resourcemanager/_HOST@REALM

yarn.resourcemanager.keytab

The keytab for the Resource Manager

/etc/security/keytabs/yarn-resourcemanager.service.keytab

yarn.resourcemanager.webapp.https.address

The https address of the Resource Manager web application. If only a host is provided as the value, the webapp will be served on a random port

${yarn.resourcemanager.hostname}:8090

yarn.nodemanager.principal

The Kerberos principal for the NodeManager

yarn-nodemanager/_HOST@REALM

yarn.nodemanager.keytab

Keytab for NodeManager

/etc/security/keytabs/yarn-nodemanager.service.keytab

yarn.nodemanager.webapp.spnego-principal

The Kerberos principal to be used for SPNEGO filter for the NodeManager web interface

HTTP/_HOST@REALM

yarn.nodemanager.webapp.spnego-keytab-file

The Kerberos keytab file to be used for SPNEGO filter for the NodeManager web interface

/etc/security/keytabs/HTTP.service.keytab

yarn.nodemanager.webapp.cross-origin.enabled

A flag to enable cross-origin (CORS) support in the NodeManager. This flag requires the CORS filter initializer to be added to the filter initializers list in core-site.xml

false

yarn.nodemanager.webapp.https.address

The HTTPS address of the NodeManager web application

0.0.0.0:8044

yarn.timeline-service.http-authentication.type

Defines the authentication used for the Timeline Server HTTP endpoint. Supported values are: simple, kerberos, #AUTHENTICATION_HANDLER_CLASSNAME#

simple

yarn.timeline-service.http-authentication.simple.anonymous.allowed

Indicates if anonymous requests are allowed by the Timeline Server when using simple authentication

true

yarn.timeline-service.http-authentication.kerberos.keytab

The Kerberos keytab to be used for the Timeline Server (Collector/Reader) HTTP endpoint

/etc/security/keytabs/HTTP.service.keytab

yarn.timeline-service.http-authentication.kerberos.principal

The Kerberos principal to be used for the Timeline Server (Collector/Reader) HTTP endpoint

HTTP/_HOST@REALM

yarn.timeline-service.principal

The Kerberos principal for the timeline reader. NodeManager principal would be used for timeline collector as it runs as an auxiliary service inside NodeManager

yarn/_HOST@REALM

yarn.timeline-service.keytab

The Kerberos keytab for the timeline reader. NodeManager keytab would be used for timeline collector as it runs as an auxiliary service inside NodeManager

/etc/security/keytabs/yarn.service.keytab

yarn.timeline-service.delegation.key.update-interval

The update interval for delegation keys

86400000

yarn.timeline-service.delegation.token.renew-interval

The time to renew delegation tokens

86400000

yarn.timeline-service.delegation.token.max-lifetime

The maxim token lifetime

86400000

yarn.timeline-service.client.best-effort

Defines, whether a failure to obtain a delegation token should be considered as an application failure (false), or the client should attempt to continue to publish information without it (true)

false

yarn.timeline-service.webapp.https.address

The HTTPS address of the Timeline service web application

${yarn.timeline-service.hostname}:8190

yarn.http.policy

This configures the HTTP endpoint for Yarn Daemons. The following values are supported:

  • HTTP_ONLY — provides service only via HTTP;

  • HTTPS_ONLY — provides service only via HTTPS.

HTTP_ONLY

yarn.nodemanager.container-executor.class

The name of the container-executor Java class

org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor

container-executor.cfg
Parameter Description Default value

banned.users

A comma-separated list of users who cannot run applications

bin

min.user.id

Prevents other super-users

500

YARN heap memory settings
Parameter Description Default value

ResourceManager Heap Memory

Sets initial (-Xms) and maximum (-Xmx) Java heap size for Resource Manager

-Xms1G -Xmx8G

NodeManager Heap Memory

Sets initial (-Xms) and maximum (-Xmx) Java heap size for NodeManager

 — 

Timelineserver Heap Memory

Sets initial (-Xms) and maximum (-Xmx) Java heap size for Timeline server

-Xms700m -Xmx8G

History server Heap Memory

Sets initial (-Xms) and maximum (-Xmx) Java heap size for History server

-Xms700m -Xmx8G

Lists of decommissioned hosts
Parameter Description Default value

DECOMMISSIONED

The list of hosts in the DECOMMISSIONED state

 — 

ranger-yarn-policymgr-ssl.xml
Parameter Description Default value

xasecure.policymgr.clientssl.keystore

The path to the keystore file used by Ranger

 — 

xasecure.policymgr.clientssl.keystore.credential.file

The path to the keystore credentials file

/etc/yarn/conf/ranger-yarn.jceks

xasecure.policymgr.clientssl.truststore.credential.file

The path to the truststore credentials file

/etc/yarn/conf/ranger-yarn.jceks

xasecure.policymgr.clientssl.truststore

The path to the truststore file used by Ranger

 — 

xasecure.policymgr.clientssl.keystore.password

The password to the keystore file

 — 

xasecure.policymgr.clientssl.truststore.password

The password to the truststore file

 — 

Other
Parameter Description Default value

GPU on YARN

Defines, whether to use GPU on YARN

false

capacity-scheduler.xml

The content of capacity-scheduler.xml, which is used by CapacityScheduler

fair-scheduler.xml

The content of fair-scheduler.xml, which is used by FairScheduler

Custom mapred-site.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file mapred-site.xml

 — 

Ranger plugin enabled

Whether or not Ranger plugin is enabled

false

Custom yarn-site.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file yarn-site.xml

 — 

Custom ranger-yarn-audit.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ranger-yarn-audit.xml

 — 

Custom ranger-yarn-security.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ranger-yarn-security.xml

 — 

Custom ranger-yarn-policymgr-ssl.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ranger-yarn-policymgr-ssl.xml

 — 

Zeppelin

zeppelin-site.xml
Parameter Description Default value

zeppelin.dep.localrepo

The local repository for the dependency loader

/srv/zeppelin/local-repo

zeppelin.server.port

The server port

8180

zeppelin.server.kerberos.principal

The principal name to load from the keytab

 — 

zeppelin.server.kerberos.keytab

The path to the keytab file

 — 

zeppelin.shell.auth.type

Sets the authentication type. Possible values are SIMPLE and KERBEROS

 — 

zeppelin.shell.principal

The principal name to load from the keytab

 — 

zeppelin.shell.keytab.location

The path to the keytab file

 — 

zeppelin.jdbc.auth.type

Sets the authentication type. Possible values are SIMPLE and KERBEROS

 — 

zeppelin.jdbc.keytab.location

The path to the keytab file

 — 

zeppelin.jdbc.principal

The principal name to load from the keytab

 — 

zeppelin.jdbc.auth.kerberos.proxy.enable

When the KERBEROS authentication type is used, this parameter enables/disables proxy with the login user to get the connection

true

spark.yarn.keytab

The full path to the file that contains the keytab for the principal. This keytab will be copied to the node running the YARN Application Master via the Secure Distributed Cache, for renewing the login tickets and the delegation tokens periodically

 — 

spark.yarn.principal

The principal to be used to login to KDC, while running on secure HDFS

 — 

zeppelin.livy.keytab

The path to the keytab file

 — 

zeppelin.livy.principal

The principal name to load from the keytab

 — 

zeppelin.server.ssl.port

The port number for SSL communication

8180

zeppelin.ssl

Defines whether to use SSL

false

zeppelin.ssl.keystore.path

The path to the keystore used by Zeppelin

 — 

zeppelin.ssl.keystore.password

The password to access the keystore file

 — 

zeppelin.ssl.truststore.path

The path to the truststore used by Zeppelin

 — 

zeppelin.ssl.truststore.password

The password to access the truststore file

 — 

Zeppelin server heap memory settings
Parameter Description Default value

Zeppelin Server Heap Memory

Sets initial (-Xms) and maximum (-Xmx) Java heap size for Zeppelin Server

-Xms700m -Xmx1024m

Shiro Simple username/password auth
Parameter Description Default value

Users/password map

A map of type <username: password,role>. For example, <myUser1: password1,role1>

 — 

Shiro LDAP auth
Parameter Description Default value

ldapRealm

Extends the Apache Shiro provider to allow for LDAP searches and to provide group membership to the authorization provider

org.apache.zeppelin.realm.LdapRealm

ldapRealm.contextFactory.authenticationMechanism

Specifies the authentication mechanism used by the LDAP service

simple

ldapRealm.contextFactory.url

The URL of the source LDAP. For example, ldap://ldap.example.com:389

 — 

ldapRealm.userDnTemplate

Optional. Knox uses this value to construct the UserDN for the authentication bind. Specify the UserDN where the first attribute is {0} indicating the attribute which matches the user log in token. For example, the UserDnTemplate for Apache DS bundled with Knox is uid={0},ou=people,dc=hadoop,dc=apache,dc=org

 — 

ldapRealm.pagingSize

Allows to set the LDAP paging size

100

ldapRealm.authorizationEnabled

Enables authorization for Shiro ldapRealm

true

ldapRealm.contextFactory.systemAuthenticationMechanism

Defines the authentication mechanism to use for Shiro ldapRealm context factory. Possible values are simple and digest-md+5

simple

ldapRealm.userLowerCase

Forces username returned from LDAP to be lower-cased

true

ldapRealm.memberAttributeValueTemplate

The attribute that identifies a user in the group. For exmaple: cn={0},ou=people,dc=hadoop,dc=apache,dc=org

 — 

ldapRealm.searchBase

The starting DN in the LDAP DIT for the search. Only subtrees of the specified subtree are searched. For example: dc=hadoop,dc=apache,dc=org

 — 

ldapRealm.userSearchBase

Search base for user bind DN. Defaults to the value of ldapRealm.searchBase if no value is defined. If ldapRealm.userSearchAttributeName is defined, also define a value for either ldapRealm.searchBase or ldapRealm.userSearchBase

 — 

ldapRealm.groupSearchBase

Search base used to search for groups. Defaults to the value of ldapRealm.searchBase. Only set if ldapRealm.authorizationEnabled=true

 — 

ldapRealm.groupObjectClass

Set the value to the Objectclass that identifies group entries in LDAP

groupofnames

ldapRealm.userSearchAttributeName

Specify the attribute that corresponds to the user login token. This attribute is used with the search results to compute the UserDN for the authentication bind

sAMAccountName

ldapRealm.memberAttribute

Set the value to the attribute that defines group membership. When the value is rememberer, found groups are treated as dynamic groups

member

ldapRealm.userSearchScope

Allows to define searchScopes. Possible values are subtree, one, base

subtree

ldapRealm.groupSearchScope

Allows to define groupSearchScope. Possible values are subtree, one, base

subtree

ldapRealm.contextFactory.systemUsername

Set to the LDAP Service Account that the Zeppelin uses for LDAP searches. If required, specify the full account UserDN. For example: uid=guest,ou=people,dc=hadoop,dc=apache,dc=org. This account requires read permission to the search base DN

 — 

ldapRealm.contextFactory.systemPassword

Sets the password for systemUsername. This password will be added to the keystore using hadoop credentials

 — 

ldapRealm.groupSearchEnableMatchingRuleInChain

Enables support for nested groups using the LDAP_MATCHING_RULE_IN_CHAIN operator

true

ldapRealm.rolesByGroup

Optional mapping from physical groups to logical application roles. For example: "LDN_USERS":"user_role", "NYK_USERS":"user_role", "HKG_USERS":"user_role", "GLOBAL_ADMIN":"admin_role"

 — 

ldapRealm.allowedRolesForAuthentication

Optional list of roles that are allowed to authenticate. If not specified, all groups are allowed to authenticate (login). This changes nothing for url-specific permissions that will continue to work as specified in [urls]. For example: "admin_role,user_role"

 — 

ldapRealm.permissionsByRole

Optional. Sets permissions by role. For example: 'user_role = :ToDoItemsJdo::*, :ToDoItem::*; admin_role = *'

 — 

securityManager.realms

Specifies a list of Apache Shiro Realms

$ldapRealm

Additional configuration Shiro.ini
Parameter Description Default value

Additional main section in shiro.ini

Allows to add additional key/value pairs to the main section of the shiro.ini file

 — 

Additional roles section in shiro.ini

Allows to add additional key/value pairs to the roles section of the shiro.ini file

 — 

Additional urls section in shiro.ini

Allows to add additional key/value pairs to the urls section of the shiro.ini file

 — 

Other
Parameter Description Default value

Custom zeppelin-site.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file zeppelin-site.xml

 — 

Custom zeppelin-env.sh

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file zeppelin-env.sh

Custom log4j.properties

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file log4j.properties

ZooKeeper

Main
Parameter Description Default value

connect

The ZooKeeper connection string used by other services or clusters. It is generated automatically

 — 

dataDir

The location where ZooKeeper stores the in-memory database snapshots and, unless specified otherwise, the transaction log of updates to the database

/var/lib/zookeeper

zoo.cfg
Parameter Description Default value

clientPort

The port to listen for client connections, that is the port that clients attempt to connect to

2181

tickTime

The basic time unit used by ZooKeeper (in milliseconds). It is used for heartbeats. The minimum session timeout will be twice the tickTime

2000

initLimit

The timeouts that ZooKeeper uses to limit the length of the time for ZooKeeper servers in quorum to connect to the leader

5

syncLimit

Defines the maximum date skew between server and the leader

2

maxClientCnxns

This property limits the number of active connections from the host, specified by IP address, to a single ZooKeeper Server

0

autopurge.snapRetainCount

When enabled, ZooKeeper auto-purge feature retains the autopurge.snapRetainCount most recent snapshots and the corresponding transaction logs in the dataDir and dataLogDir respectively and deletes the rest. The minimum value is 3

3

autopurge.purgeInterval

The time interval, for which the purge task has to be triggered (in hours). Set to a positive integer (1 and above) to enable the auto-purging

24

Add key,value

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file zoo.cfg

 — 

zookeeper-env.sh
Parameter Description Default value

ZOO_LOG_DIR

The directory to store logs

/var/log/zookeeper

ZOOPIDFILE

The directory to store the ZooKeeper process ID

/var/run/zookeeper/zookeeper_server.pid

SERVER_JVMFLAGS

Used for setting different JVM parameters connected, for example, with garbage collecting

-Xmx1024m

JAVA

A path to Java

$JAVA_HOME/bin/java

ZOO_LOG4J_PROP

Used for setting the log4j logging level and defines, which log appenders to turn on. Enabling the log appender CONSOLE directs logs to stdout. Enabling ROLLINGFILE creates the zookeeper.log file, then this file gets rotated, and expired

INFO, CONSOLE, ROLLINGFILE

Found a mistake? Seleсt text and press Ctrl+Enter to report it