Configuration parameters

This topic describes the parameters that can be configured for ADH services via ADCM. To read about the configuring process, refer to the relevant articles: Online installation, Offline installation.

NOTE
Some of the parameters become visible in the ADCM UI after the Show advanced flag being set.

Airflow

Airflow environment
Parameter Description Default value

airflow_dir

The Airflow home directory

/srv/airflow/home

db_dir

The location of Metastore DB

/srv/airflow/metastore

airflow.cfg
Parameter Description Default value

db_user

The user to connect to Metadata DB

airflow

db_password

The password to connect to Metadata DB

 — 

db_root_password

The root password to connect to Metadata DB

 — 

db_port

The port to connect to Metadata DB

3307

server_port

The port to run the web server

8080

flower_port

The port that Celery Flower runs on

5555

worker_port

When you start an Airflow Worker, Airflow starts a tiny web server subprocess to serve the Workers local log files to the Airflow main web server, which then builds pages and sends them to users. This defines the port, on which the logs are served. The port must be free and accessible from the main web server to connect to the Workers

8793

redis_port

The port for running Redis

6379

fernet_key

The secret key to save connection passwords in the database

 — 

security

Defines which security module to use. For example, kerberos

 — 

keytab

The path to the keytab file

 — 

reinit_frequency

Sets the ticket renewal frequency

3600

principal

The Kerberos principal

ssl_active

Defines if SSL is active for Airflow

false

web_server_ssl_cert

The path to SSL certificate

/etc/ssl/certs/host_cert.cert

web_server_ssl_key

The path to SSL certificate key

/etc/ssl/host_cert.key

Logging level

Specifies the logging level for Airflow activity

INFO

Logging level for Flask-appbuilder UI

Specifies the logging level for Flask-appbuilder UI

WARNING

cfg_properties_template

The Jinja template to initialize environment variables for Airflow

External database
Parameter Description Default value

Database type

The external database type. Possible values: PostgreSQL, MySQL/MariaDB

MySQL/MariaDB

Hostname

The external database host

 — 

Custom port

The external database port

 — 

Airflow database name

The external database name

airflow

flink-conf.yaml
Parameter Description Default value

jobmanager.rpc.port

The RPC port through which the JobManager is reachable. In the high availability mode, this value is ignored and the port number to connect to JobManager is generated by ZooKeeper

6123

sql-gateway.endpoint.rest.port

A port to connect to the SQL Gateway service

8083

taskmanager.network.bind-policy

The automatic address binding policy used by the TaskManager

name

parallelism.default

The system-wide default parallelism level for all execution environments

1

taskmanager.numberOfTaskSlots

The number of task slots that each TaskManager offers. Each slot runs one parallel pipeline

1

taskmanager.heap.size

The heap size for the TaskManager JVM

1024m

jobmanager.heap.size

The heap size for the JobManager JVM

1024m

security.kerberos.login.use-ticket-cache

Indicates whether to read from the Kerberos ticket cache

false

security.kerberos.login.keytab

The absolute path to the Kerberos keytab file that stores user credentials

 — 

security.kerberos.login.principal

Flink Kerberos principal

 — 

security.kerberos.login.contexts

A comma-separated list of login contexts to provide the Kerberos credentials to

 — 

security.ssl.rest.enabled

Turns on SSL for external communication via REST endpoints

false

security.ssl.rest.keystore

The Java keystore file with SSL key and certificate to be used by Flink’s external REST endpoints

 — 

security.ssl.rest.truststore

The truststore file containing public CA certificates to verify the peer for Flink’s external REST endpoints

 — 

security.ssl.rest.keystore-password

The secret to decrypt the keystore file for Flink external REST endpoints

 — 

security.ssl.rest.truststore-password

The password to decrypt the truststore for Flink’s external REST endpoints

 — 

security.ssl.rest.key-password

The secret to decrypt the key in the keystore for Flink’s external REST endpoints

 — 

Logging level

Defines the logging level for Flink activity

INFO

high-availability

Defines the High Availability (HA) mode used for cluster execution

 — 

high-availability.zookeeper.quorum

The ZooKeeper quorum to use when running Flink in the HA mode with ZooKeeper

 — 

high-availability.storageDir

A file system path (URI) where Flink persists metadata in the HA mode

 — 

high-availability.zookeeper.path.root

The root path for Flink ZNode in Zookeeper

/flink

high-availability.cluster-id

The ID of the Flink cluster used to separate multiple Flink clusters from each other

 — 

sql-gateway.session.check-interval

The check interval to detect idle sessions. A value <= 0 disables the checks

1 min

sql-gateway.session.idle-timeout

The timeout to close a session if no successful connection was made during this interval. A value <= 0 never closes the sessions

10 min

sql-gateway.session.max-num

The maximum number of sessions to run simultaneously

1000000

sql-gateway.worker.keepalive-time

The time to keep an idle worker thread alive. When the worker thread count exceeds sql-gateway.worker.threads.min, excessive threads are killed after this time interval

5 min

sql-gateway.worker.threads.max

The maximum number of worker threads on the SQL Gateway server

500

sql-gateway.worker.threads.min

The minimum number of worker threads. If the current number of worker threads is less than this value, the worker threads are not deleted automatically

500

zookeeper.sasl.disable

Defines the SASL authentication in Zookeeper

false

Other
Parameter Description Default value

Custom flink-conf.yaml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file flink-conf.yaml

 — 

log4j.properties

The contents of the log4j.properties configuration file

log4j-cli.properties

The contents of the log4j-cli.properties configuration file

HBase

hbase-site.xml
Parameter Description Default value

hbase.balancer.period

The time period to run the Region balancer in Master

300000

hbase.client.pause

General client pause value. Used mostly as value to wait before running a retry of a failed get, region lookup, etc. See hbase.client.retries.number for description of how this pause works with retries

100

hbase.client.max.perregion.tasks

The maximum number of concurrent mutation tasks the Client will maintain to a single Region. That is, if there is already hbase.client.max.perregion.tasks writes in progress for this Region, new puts won’t be sent to this Region, until some writes finishes

1

hbase.client.max.perserver.tasks

The maximum number of concurrent mutation tasks a single HTable instance will send to a single Region Server

2

hbase.client.max.total.tasks

The maximum number of concurrent mutation tasks, a single HTable instance will send to the cluster

100

hbase.client.retries.number

The maximum number of retries. It is used as maximum for all retryable operations, such as: getting a cell value, starting a row update, etc. Retry interval is a rough function based on hbase.client.pause. See the constant RETRY_BACKOFF for how the backup ramps up. Change this setting and hbase.client.pause to suit your workload

15

hbase.client.scanner.timeout.period

The Client scanner lease period in milliseconds

60000

hbase.cluster.distributed

The cluster mode. Possible values are: false — for standalone mode and pseudo-distributed setups with managed ZooKeeper; true — for fully-distributed mode with unmanaged ZooKeeper Quorum. If false, the startup will run all HBase and ZooKeeper daemons together in the one JVM, if true — one JVM instance per daemon

true

hbase.hregion.majorcompaction

The time interval between Major compactions in milliseconds. Set to 0 to disable time-based automatic Major compactions. User-requested and size-based Major compactions will still run. This value is multiplied by hbase.hregion.majorcompaction.jitter to cause compaction to start at a somewhat-random time during a given time frame

604800000

hbase.hregion.max.filesize

The maximum file size. If the total size of some Region HFiles has grown to exceed this value, the Region is split in two. There are two options of how this option works: the first is when any store size exceeds the threshold — then split, and the other is if overall Region size exceeds the threshold — then split. It can be configured by hbase.hregion.split.overallfiles

10737418240

hbase.hstore.blockingStoreFiles

If more than this number of StoreFiles exists in any Store (one StoreFile is written per flush of MemStore), updates are blocked for this Region, until a compaction is completed, or until hbase.hstore.blockingWaitTime is exceeded

16

hbase.hstore.blockingWaitTime

The time for which a Region will block updates after reaching the StoreFile limit, defined by hbase.hstore.blockingStoreFiles. After this time is elapsed, the Region will stop blocking updates, even if a compaction has not been completed

90000

hbase.hstore.compaction.max

The maximum number of StoreFiles that will be selected for a single Minor compaction, regardless of the number of eligible StoreFiles. Effectively, the value of hbase.hstore.compaction.max controls the time it takes for a single compaction to complete. Setting it larger means that more StoreFiles are included in a compaction. For most cases, the default value is appropriate

10

hbase.hstore.compaction.min

The minimum number of StoreFiles that must be eligible for compaction before compaction can run. The goal of tuning hbase.hstore.compaction.min is to avoid a situation with too many tiny StoreFiles to compact. Setting this value to 2 would cause a Minor compaction each time you have two StoreFiles in a Store, and this is probably not appropriate. If you set this value too high, all the other values will need to be adjusted accordingly. For most cases, the default value is appropriate. In the previous versions of HBase, the parameter hbase.hstore.compaction.min was called hbase.hstore.compactionThreshold

3

hbase.hstore.compaction.min.size

A StoreFile, smaller than this size, will always be eligible for Minor compaction. StoreFiles this size or larger are evaluated by hbase.hstore.compaction.ratio to determine, if they are eligible. Because this limit represents the "automatic include" limit for all StoreFiles smaller than this value, this value may need to be reduced in write-heavy environments, where many files in the 1-2 MB range are being flushed, because every StoreFile will be targeted for compaction and the resulting StoreFiles may still be under the minimum size and require further compaction. If this parameter is lowered, the ratio check is triggered more quickly. This addressed some issues seen in earlier versions of HBase, but changing this parameter is no longer necessary in most situations

134217728

hbase.hstore.compaction.ratio

For Minor compaction, this ratio is used to determine, whether a given StoreFile that is larger than hbase.hstore.compaction.min.size, is eligible for compaction. Its effect is to limit compaction of large StoreFile. The value of hbase.hstore.compaction.ratio is expressed as a floating-point decimal

1.2F

hbase.hstore.compaction.ratio.offpeak

The compaction ratio used during off-peak compactions if the off-peak hours are also configured. Expressed as a floating-point decimal. This allows for more aggressive (or less aggressive, if you set it lower than hbase.hstore.compaction.ratio) compaction during a given time period. The value is ignored if off-peak is disabled (default). This works the same as hbase.hstore.compaction.ratio

5.0F

hbase.hstore.compactionThreshold

If more than this number of StoreFiles exists in any Store (one StoreFile is written per flush of MemStore), a compaction is run to rewrite all StoreFiles into a single StoreFile. Larger values delay the compaction, but when compaction does occur, it takes longer to complete

3

hbase.hstore.flusher.count

The number of flush threads. With fewer threads, the MemStore flushes will be queued. With more threads, the flushes will be executed in parallel, increasing the load on HDFS, and potentially causing more compactions

2

hbase.hstore.time.to.purge.deletes

The amount of time to delay purging of delete markers with future timestamps. If unset or set to 0, all the delete markers, including those with future timestamps, are purged during the next Major compaction. Otherwise, a delete marker is kept until the Major compaction that occurs after the marker timestamp plus the value of this setting (in milliseconds)

0

hbase.master.ipc.address

HMaster RPC

0.0.0.0

hbase.normalizer.period

The period at which the Region normalizer runs on Master (in milliseconds)

300000

hbase.regionserver.compaction.enabled

Enables/disables compactions by setting true/false. You can further switch compactions dynamically with the compaction_switch shell command

true

hbase.regionserver.ipc.address

Region Server RPC

0.0.0.0

hbase.regionserver.regionSplitLimit

The limit for the number of Regions, after which no more Region splitting should take place. This is not hard limit for the number of Regions, but acts as a guideline for the Region Server to stop splitting after a certain limit

1000

hbase.rootdir

The directory shared by Region Servers and into which HBase persists. The URL should be fully-qualified to include the filesystem scheme. For example, to specify the HDFS directory /hbase where the HDFS instance NameNode is running at namenode.example.org on port 9000, set this value to: hdfs://namenode.example.org:9000/hbase

 — 

hbase.zookeeper.quorum

A comma-separated list of servers in the ZooKeeper ensemble. For example, host1.mydomain.com,host2.mydomain.com,host3.mydomain.com. By default, this is set to localhost for local and pseudo-distributed modes of operation. For a fully-distributed setup, this should be set to a full list of ZooKeeper ensemble servers. If HBASE_MANAGES_ZK is set in hbase-env.sh, this is the list of servers, which HBase will start/stop ZooKeeper on, as part of cluster start/stop. Client-side, the list of ensemble members is put together with the hbase.zookeeper.property.clientPort config and is passed to the ZooKeeper constructor as the connection string parameter

 — 

zookeeper.session.timeout

The ZooKeeper session timeout in milliseconds. It is used in two different ways. First, this value is processed by the ZooKeeper Client that HBase uses to connect to the ensemble. It is also used by HBase, when it starts a ZooKeeper Server (in that case the timeout is passed as the maxSessionTimeout). See more details in the ZooKeeper documentation. For example, if an HBase Region Server connects to a ZooKeeper ensemble that is also managed by HBase, then the session timeout will be the one specified by this configuration. But a Region Server that connects to an ensemble managed with a different configuration will be subjected to the maxSessionTimeout of that ensemble. So, even though HBase might propose using 90 seconds, the ensemble can have a max timeout, lower than this, and it will take precedence. The current default maxSessionTimeout that ZooKeeper ships with is 40 seconds, which is lower than HBase

90000

zookeeper.znode.parent

The root znode for HBase in ZooKeeper. All of the HBase ZooKeeper files configured with a relative path will go under this node. By default, all of the HBase ZooKeeper file paths are configured with a relative path, so they will all go under this directory unless changed

/hbase

hbase.rest.port

The port used by HBase Rest Servers

60080

hbase.zookeeper.property.authProvider.1

Specifies the ZooKeeper authentication method

hbase.security.authentication

Set the value to true to run HBase RPC with strong authentication

false

hbase.security.authentication.ui

Enables Kerberos authentication to HBase web UI with SPNEGO

 — 

hbase.security.authentication.spnego.kerberos.principal

The Kerberos principal for SPNEGO authentication

 — 

hbase.security.authentication.spnego.kerberos.keytab

The path to the Kerberos keytab file with principals to be used for SPNEGO authentication

 — 

hbase.security.authorization

Set the value to true to run HBase RPC with strong authorization

false

hbase.master.kerberos.principal

The Kerberos principal used to run the HMaster process

 — 

hbase.master.keytab.file

Full path to the Kerberos keytab file to use for logging in the configured HMaster server principal

 — 

hbase.regionserver.kerberos.principal

The Kerberos principal name that should be used to run the HRegionServer process

 — 

hbase.regionserver.keytab.file

Full path to the Kerberos keytab file to use for logging in the configured HRegionServer server principal

 — 

hbase.rest.authentication.type

REST Gateway Kerberos authentication type

 — 

hbase.rest.authentication.kerberos.principal

REST Gateway Kerberos principal

 — 

hbase.rest.authentication.kerberos.keytab

REST Gateway Kerberos principal

 — 

hbase.thrift.keytab.file

Thrift Kerberos keytab

 — 

hbase.rest.keytab.file

HBase REST gateway Kerberos keytab

 — 

hbase.rest.kerberos.principal

HBase REST gateway Kerberos principal

 — 

hbase.thrift.kerberos.principal

Thrift Kerberos principal

 — 

hbase.thrift.security.qop

Defines authentication, integrity, and confidentiality checking. Supported values:

  • auth-conf — authentication, integrity, and confidentiality checking;

  • auth-int — authentication and integrity checking;

  • auth — authentication checking only.

 — 

phoenix.queryserver.keytab.file

The path to the Kerberos keytab file

 — 

phoenix.queryserver.kerberos.principal

The Kerberos principal to use when authenticating. If phoenix.queryserver.kerberos.http.principal is not defined, this principal specified will be also used to both authenticate SPNEGO connections and to connect to HBase

 — 

phoenix.queryserver.kerberos.keytab

The full path to the Kerberos keytab file to use for logging in the configured HMaster server principal

 — 

phoenix.queryserver.http.keytab.file

The keytab file to use for authenticating SPNEGO connections. This configuration must be specified if phoenix.queryserver.kerberos.http.principal is configured. phoenix.queryserver.keytab.file will be used if this property is undefined

 — 

phoenix.queryserver.http.kerberos.principal

The Kerberos principal to use when authenticating SPNEGO connections. phoenix.queryserver.kerberos.principal will be used if this property is undefined

phoenix.queryserver.kerberos.http.principal

Deprecated, use phoenix.queryserver.http.kerberos.principal instead

 — 

hbase.ssl.enabled

Defines whether SSL is enabled for web UIs

false

hadoop.ssl.enabled

Defines whether SSL is enabled for Hadoop RPC

false

ssl.server.keystore.location

The path to the keystore file

 — 

ssl.server.keystore.password

The password to the keystore

 — 

ssl.server.truststore.location

The path to the truststore to be used

 — 

ssl.server.truststore.password

The password to the truststore

 — 

ssl.server.keystore.keypassword

The password to the key in the keystore

 — 

hbase.rest.ssl.enabled

Defines whether SSL is enabled for HBase REST server

false

hbase.rest.ssl.keystore.store

The path to the keystore used by HBase REST server

 — 

hbase.rest.ssl.keystore.password

The password to the keystore

 — 

hbase.rest.ssl.keystore.keypassword

The password to the key in the keystore

 — 

HBASE heap memory settings
Parameter Description Default value

HBASE Regionserver Heap Memory

Sets initial (-Xms) and maximum (-Xmx) Java heap size for HBase Region server

-Xms700m -Xmx9G

HBASE Master Heap Memory

Sets initial (-Xms) and maximum (-Xmx) Java heap size for HBase Master

-Xms700m -Xmx9G

Phoenix Queryserver Heap Memory

Sets initial (-Xms) and maximum (-Xmx) Java heap size for Phoenix Query server

-Xms700m -Xmx8G

HBASE Thrift2 server Heap Memory

Sets initial (-Xms) and maximum (-Xmx) Java heap size for HBase Thrift2 server

-Xms700m -Xmx8G

HBASE Rest server Heap Memory

Sets initial (-Xms) and maximum (-Xmx) Java heap size for HBase Rest server

-Xms200m -Xmx8G

ranger-hbase-audit.xml
Parameter Description Default value

xasecure.audit.destination.solr.batch.filespool.dir

The spool directory path

/srv/ranger/hdfs_plugin/audit_solr_spool

xasecure.audit.destination.solr.urls

Leave this property value empty or set it to NONE when using ZooKeeper to connect to Solr

 — 

xasecure.audit.destination.solr.zookeepers

Specifies the ZooKeeper connection string for the Solr destination

 — 

xasecure.audit.destination.solr.force.use.inmemory.jaas.config

Uses in-memory JAAS configuration file to connect to Solr

 — 

xasecure.audit.is.enabled

Enables Ranger audit

true

xasecure.audit.jaas.Client.loginModuleControlFlag

Specifies whether the success of the module is required, requisite, sufficient, or optional

 — 

xasecure.audit.jaas.Client.loginModuleName

The name of the authenticator class

 — 

xasecure.audit.jaas.Client.option.keyTab

The name of the keytab file to get the principal’s secret key

 — 

xasecure.audit.jaas.Client.option.principal

The name of the principal to be used

 — 

xasecure.audit.jaas.Client.option.serviceName

Represents a user or a service that wants to log in

 — 

xasecure.audit.jaas.Client.option.storeKey

Set this to true if you want the keytab or the principal’s key to be stored in the subject’s private credentials

false

xasecure.audit.jaas.Client.option.useKeyTab

Set this to true if you want the module to get the principal’s key from the keytab

false

ranger-hbase-security.xml
Parameter Description Default value

ranger.plugin.hbase.policy.rest.url

The URL to Ranger Admin

 — 

ranger.plugin.hbase.service.name

The name of the Ranger service containing policies for this instance

 — 

ranger.plugin.hbase.policy.cache.dir

The directory where Ranger policies are cached after successful retrieval from the source

/srv/ranger/hbase/policycache

ranger.plugin.hbase.policy.pollIntervalMs

Defines how often to poll for changes in policies

30000

ranger.plugin.hbase.policy.rest.client.connection.timeoutMs

The HBase Plugin RangerRestClient connection timeout (in milliseconds)

120000

ranger.plugin.hbase.policy.rest.client.read.timeoutMs

The HBase Plugin RangerRestClient read timeout (in milliseconds)

30000

ranger.plugin.hbase.policy.rest.ssl.config.file

The path to the RangerRestClient SSL config file for HBase plugin

/etc/hbase/conf/ranger-hbase-policymgr-ssl.xml

ranger-hbase-policymgr-ssl.xml
Parameter Description Default value

xasecure.policymgr.clientssl.keystore

The path to the keystore file used by Ranger

 — 

xasecure.policymgr.clientssl.keystore.credential.file

The path to the keystore credentials file

/etc/hbase/conf/ranger-hbase.jceks

xasecure.policymgr.clientssl.truststore.credential.file

The path to the truststore credentials file

/etc/hbase/conf/ranger-hbase.jceks

xasecure.policymgr.clientssl.truststore

The path to the truststore file used by Ranger

 — 

xasecure.policymgr.clientssl.keystore.password

The password to the keystore file

 — 

xasecure.policymgr.clientssl.truststore.password

The password to the truststore file

 — 

Other
Parameter Description Default value

Custom hbase-site.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file hbase-site.xml

 — 

Custom hbase-env.sh

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file hbase-env.sh

 — 

Ranger plugin enabled

Whether or not Ranger plugin is enabled

false

Custom ranger-hbase-audit.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ranger-hbase-audit.xml

 — 

Custom ranger-hbase-security.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ranger-hbase-security.xml

 — 

Custom ranger-hbase-security.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ranger-hbase-security.xml

 — 

Custom ranger-hbase-policymgr-ssl.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ranger-hbase-policymgr-ssl.xml

 — 

Custom log4j.properties

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file log4j.properties

Custom hadoop-metrics2-hbase.properties

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file hadoop-metrics2-hbase.properties

HDFS

core-site.xml
Parameter Description Default value

fs.defaultFS

The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The URI scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The URI authority is used to determine the host, port, etc. for a filesystem

 — 

fs.trash.checkpoint.interval

The number of minutes between trash checkpoints. Should be smaller or equal to fs.trash.interval. Every time the checkpointer runs, it creates a new checkpoint out of current and removes checkpoints, created more than fs.trash.interval minutes ago

60

fs.trash.interval

The number of minutes, after which the checkpoint gets deleted. If set to 0, the trash feature is disabled

1440

hadoop.tmp.dir

The base for other temporary directories

/tmp/hadoop-${user.name}

hadoop.zk.address

A comma-separated list of pairs <Host>:<Port>. Each corresponds to a ZooKeeper to be used by the Resource Manager for storing Resource Manager state

 — 

io.file.buffer.size

The buffer size for sequence files. The size of this buffer should probably be a multiple of hardware page size (4096 on Intel x86), and it determines, how much data is buffered during read and write operations

131072

net.topology.script.file.name

The script name, that should be invoked to resolve DNS names to NetworkTopology names. Example: the script would take host.foo.bar as an argument, and return /rack1 as the output

 — 

ha.zookeeper.quorum

A list of ZooKeeper Server addresses, separated by commas, that are to be used by the ZKFailoverController in automatic failover

 — 

ipc.client.fallback-to-simple-auth-allowed

When a client is configured to attempt a secure connection, but attempts to connect to an insecure server, that server may instuct the client to switch to SASL SIMPLE (unsecure) authentication. This setting controls whether or not the client will accept this instruction from the server. When set to false (default), the client does not allow the fallback to SIMPLE authentication and will abort the connection

false

hadoop.security.authentication

Defines the authentication type. Possible values: simple — no authentication, kerberos — enables the authentication by Kerberos

simple

hadoop.security.authorization

Enables RPC service-level authorization

false

hadoop.rpc.protection

Specifies RPC protection. Possible values:

  • authentication — authentication only;

  • integrity — performs the integrity check in addition to authentication;

  • privacy — encrypts the data in addition to integrity.

authentication

hadoop.security.auth_to_local

The value is a string containing new line characters. See Kerberos documentation for more information about the format

 — 

hadoop.http.authentication.type

Defines authentication used for the HTTP web-consoles. The supported values are: simple, kerberos, [AUTHENTICATION_HANDLER-CLASSNAME]

simple

hadoop.http.authentication.kerberos.principal

Indicates the Kerberos principal to be used for HTTP endpoint when using the kerberos authentication. The principal short name adhere to HTTP per Kerberos HTTP SPNEGO specification

HTTP/localhost@$LOCALHOST

hadoop.http.authentication.kerberos.keytab

The location of the keytab file with the credentials for the Kerberos principal used for the HTTP endpoint

/etc/security/keytabs/HTTP.service.keytab

ha.zookeeper.acl

ACLs for all znodes

 — 

hadoop.http.filter.initializers

Add to this property the org.apache.hadoop.security.AuthenticationFilterInitializer initializer class

 — 

hadoop.http.authentication.signature.secret.file

The signature secret file for signing the authentication tokens. If not set, a random secret is generated during the startup. The same secret should be used for all nodes in the cluster, JobTracker, NameNode, DataNode and TastTracker. This file should be readable only by the Unix user running the daemons

/etc/security/http_secret

hadoop.http.authentication.cookie.domain

The domain to use for the HTTP cookie that stores the authentication token. In order for authentication to work properly across all nodes in the cluster, the domain must be correctly set. There is no default value, the HTTP cookie will not have a domain working only with the hostname issuing the HTTP cookie

 — 

hadoop.ssl.require.client.cert

Defines whether client certificates are required

false

hadoop.ssl.hostname.verifier

The host name verifier to provide for HttpsURLConnections. Valid values are: DEFAULT, STRICT, STRICT_IE6, DEFAULT_AND_LOCALHOST, and ALLOW_ALL

DEFAULT

hadoop.ssl.keystores.factory.class

The KeyStoresFactory implementation to use

org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory

hadoop.ssl.server.conf

A resource file from which the SSL server keystore information will be extracted. This file is looked up in the classpath, typically it should be located in Hadoop conf/ directory

ssl-server.xml

hadoop.ssl.client.conf

A resource file from which the SSL client keystore information will be extracted. This file is looked up in the classpath, typically it should be located in Hadoop conf/ directory

ssl-client.xml

User managed hadoop.security.auth_to_local

Disable automatic generation of hadoop.security.auth_to_local

false

hdfs-site.xml
Parameter Description Default value

dfs.client.block.write.replace-datanode-on-failure.enable

If there is a DataNode/network failure in the write pipeline, DFSClient will try to remove the failed DataNode from the pipeline and then continue writing with the remaining DataNodes. As a result, the number of DataNodes in the pipeline is decreased. The feature is to add new DataNodes to the pipeline. This is a site-wide property to enable/disable the feature. When the cluster size is extremely small, e.g. 3 nodes or less, cluster administrators may want to set the policy to NEVER in the default configuration file or disable this feature. Otherwise, users may experience an unusually high rate of pipeline failures since it is impossible to find new DataNodes for replacement. See also dfs.client.block.write.replace-datanode-on-failure.policy

true

dfs.client.block.write.replace-datanode-on-failure.policy

This property is used only if the value of dfs.client.block.write.replace-datanode-on-failure.enable is true. Possible values:

  • ALWAYS. Always adds a new DataNode, when an existing DataNode is removed.

  • NEVER. Never adds a new DataNode.

  • DEFAULT. Let r be the replication number. Let n be the number of existing DataNodes. Add a new DataNode only, if r is greater than or equal to 3 and either:

    1. floor(r/2) is greater than or equal to n;

    2. r is greater than n and the block is hflushed/appended.

DEFAULT

dfs.client.block.write.replace-datanode-on-failure.best-effort

This property is used only if the value of dfs.client.block.write.replace-datanode-on-failure.enable is true. Best effort means, that the client will try to replace a failed DataNode in write pipeline (provided that the policy is satisfied), however, it continues the write operation in case that the DataNode replacement also fails. Suppose, the DataNode replacement fails: false — an exception should be thrown so that the write will fail; true — the write should be resumed with the remaining DataNodes. Note, that setting this property to true allows writing to a pipeline with a smaller number of DataNodes. As a result, it increases the probability of data loss

false

dfs.client.block.write.replace-datanode-on-failure.min-replication

The minimum number of replications needed not to fail the write pipeline if new DataNodes can not be found to replace failed DataNodes (could be due to network failure) in the write pipeline. If the number of the remaining DataNodes in the write pipeline is greater than or equal to this property value, continue writing to the remaining nodes. Otherwise throw exception. If this is set to 0, an exception will be thrown, when a replacement can not be found. See also dfs.client.block.write.replace-datanode-on-failure.policy

0

dfs.balancer.dispatcherThreads

The size of the thread pool for the HDFS balancer block mover — dispatchExecutor

200

dfs.balancer.movedWinWidth

The time window in milliseconds for the HDFS balancer tracking blocks and its locations

5400000

dfs.balancer.moverThreads

The thread pool size for executing block moves — moverThreadAllocator

1000

dfs.balancer.max-size-to-move

The maximum number of bytes that can be moved by the balancer in a single thread

10737418240

dfs.balancer.getBlocks.min-block-size

The minimum block threshold size in bytes to ignore, when fetching a source block list

10485760

dfs.balancer.getBlocks.size

The total size in bytes of DataNode blocks to get, when fetching a source block list

2147483648

dfs.balancer.block-move.timeout

The maximum amount of time for a block to move (in milliseconds). If set greater than 0, the balancer will stop waiting for a block move completion after this time. In typical clusters, a 3-5 minute timeout is reasonable. If the timeout is set for a large proportion of block moves, this needs to be increased. It could also be that too much work is dispatched and many nodes are constantly exceeding the bandwidth limit as a result. In that case, other balancer parameters might need to be adjusted. It is disabled (0) by default

0

dfs.balancer.max-no-move-interval

If this specified amount of time has elapsed and no blocks have been moved out of a source DataNode, one more attempt will be made to move blocks out of this DataNode in the current Balancer iteration

60000

dfs.balancer.max-iteration-time

The maximum amount of time an iteration can be run by the Balancer. After this time the Balancer will stop the iteration, and re-evaluate the work needed to be done to balance the cluster. The default value is 20 minutes

1200000

dfs.blocksize

The default block size for new files (in bytes). You can use the following suffixes to define size units (case insensitive): k (kilo), m (mega), g (giga), t (tera), p (peta), e (exa). For example, 128k, 512m, 1g, etc. You can also specify the block size in bytes (such as 134217728 for 128 MB)

134217728

dfs.client.read.shortcircuit

Turns on short-circuit local reads

true

dfs.datanode.balance.max.concurrent.moves

The maximum number of threads for DataNode balancer pending moves. This value is reconfigurable via the dfsadmin -reconfig command

50

dfs.datanode.data.dir

Determines, where on the local filesystem a DFS data node should store its blocks. If multiple directories are specified, then data will be stored in all named directories, typically on different devices. The directories should be tagged with corresponding storage types (SSD/DISK/ARCHIVE/RAM_DISK) for HDFS storage policies. The default storage type will be DISK if the directory does not have a storage type tagged explicitly. Directories, that do not exist, will be created, if the local filesystem permission allows

/srv/hadoop-hdfs/data:DISK

dfs.disk.balancer.max.disk.throughputInMBperSec

The maximum disk bandwidth, used by the disk balancer during reads from a source disk. The unit is MB/sec

10

dfs.disk.balancer.block.tolerance.percent

The parameter specifies when a good enough value is reached for any copy step (in percents). For example, if set to to 10 then getting close to 10% of the target value is considered as good enough. In other words, if the move operation is 20GB in size, if 18GB (20 * (1-10%)) can be moved, the entire operation is considered successful

10

dfs.disk.balancer.max.disk.errors

During a block move from a source to destination disk, there might be various errors. This parameter defines how many errors to tolerate before declaring a move between 2 disks (or a step) has failed

5

dfs.disk.balancer.plan.valid.interval

The maximum amount of time a disk balancer plan (a set of configurations that define the data volume to be redistributed between two disks) remains valid. This setting supports multiple time unit suffixes as described in dfs.heartbeat.interval. If no suffix is specified, then milliseconds are assumed

1d

dfs.disk.balancer.plan.threshold.percent

Defines a data storage threshold in percents at which disks start participating in data redistribution or balancing activities

10

dfs.domain.socket.path

The path to a UNIX domain socket that will be used for communication between the DataNode and local HDFS clients. If the string _PORT is present in this path, it will be replaced by the TCP port of the DataNode. The parameter is optional

/var/lib/hadoop-hdfs/dn_socket

dfs.hosts

Names a file that contains a list of hosts allowed to connect to the NameNode. The full pathname of the file must be specified. If the value is empty, all hosts are permitted

/etc/hadoop/conf/dfs.hosts

dfs.mover.movedWinWidth

The minimum time interval for a block to be moved to another location again (in milliseconds)

5400000

dfs.mover.moverThreads

Sets the balancer mover thread pool size

1000

dfs.mover.retry.max.attempts

The maximum number of retries before the mover considers the move as failed

10

dfs.mover.max-no-move-interval

If this specified amount of time has elapsed and no block has been moved out of a source DataNode, one more attempt will be made to move blocks out of this DataNode in the current mover iteration

60000

dfs.namenode.name.dir

Determines where on the local filesystem the DFS name node should store the name table (fsimage). If multiple directories are specified, then the name table is replicated in all of the directories, for redundancy

/srv/hadoop-hdfs/name

dfs.namenode.checkpoint.dir

Determines where on the local filesystem the DFS secondary name node should store the temporary images to merge. If multiple directories are specified, then the image is replicated in all of the directories for redundancy

/srv/hadoop-hdfs/checkpoint

dfs.namenode.hosts.provider.classname

The class that provides access for host files. org.apache.hadoop.hdfs.server.blockmanagement.HostFileManager is used by default that loads files specified by dfs.hosts and dfs.hosts.exclude. If org.apache.hadoop.hdfs.server.blockmanagement.CombinedHostFileManager is used, it will load the JSON file defined in dfs.hosts. To change the class name, NameNode restart is required. dfsadmin -refreshNodes only refreshes the configuration files, used by the class

org.apache.hadoop.hdfs.server.blockmanagement.CombinedHostFileManager

dfs.namenode.rpc-bind-host

The actual address, the RPC Server will bind to. If this optional address is set, it overrides only the hostname portion of dfs.namenode.rpc-address. It can also be specified per NameNode or name service for HA/Federation. This is useful for making the NameNode listen on all interfaces by setting it to 0.0.0.0

0.0.0.0

dfs.permissions.superusergroup

The name of the group of super-users. The value should be a single group name

hadoop

dfs.replication

The default block replication. The actual number of replications can be specified, when the file is created. The default is used, if replication is not specified in create time

3

dfs.journalnode.http-address

The HTTP address of the JournalNode web UI

0.0.0.0:8480

dfs.journalnode.https-address

The HTTPS address of the JournalNode web UI

0.0.0.0:8481

dfs.journalnode.rpc-address

The RPC address of the JournalNode web UI

0.0.0.0:8485

dfs.datanode.http.address

The address of the DataNode HTTP server

0.0.0.0:9864

dfs.datanode.https.address

The address of the DataNode HTTPS server

0.0.0.0:9865

dfs.datanode.address

The address of the DataNode for data transfer

0.0.0.0:9866

dfs.datanode.ipc.address

The IPC address of the DataNode

0.0.0.0:9867

dfs.namenode.http-address

The address and the base port to access the dfs NameNode web UI

0.0.0.0:9870

dfs.namenode.https-address

The secure HTTPS address of the NameNode

0.0.0.0:9871

dfs.ha.automatic-failover.enabled

Defines whether automatic failover is enabled

true

dfs.ha.fencing.methods

A list of scripts or Java classes that will be used to fence the Active NameNode during a failover

shell(/bin/true)

dfs.journalnode.edits.dir

The directory where to store journal edit files

/srv/hadoop-hdfs/journalnode

dfs.namenode.shared.edits.dir

The directory on shared storage between the multiple NameNodes in an HA cluster. This directory will be written by the active and read by the standby in order to keep the namespaces synchronized. This directory does not need to be listed in dfs.namenode.edits.dir. It should be left empty in a non-HA cluster

---

dfs.internal.nameservices

A unique nameservices identifier for a cluster or federation. For a single cluster, specify the name that will be used as an alias. For HDFS federation, specify, separated by commas, all namespaces associated with this cluster. This option allows you to use an alias instead of an IP address or FQDN for some commands, for example: hdfs dfs -ls hdfs://<dfs.internal.nameservices>. The value must be alphanumeric without underscores

 — 

dfs.block.access.token.enable

If set to true, access tokens are used as capabilities for accessing DataNodes. If set to false, no access tokens are checked on accessing DataNodes

false

dfs.namenode.kerberos.principal

The NameNode service principal. This is typically set to nn/_HOST@REALM.TLD. Each NameNode will substitute _HOST with its own fully qualified hostname during the startup. The _HOST placeholder allows using the same configuration setting on both NameNodes in an HA setup

nn/_HOST@REALM

dfs.namenode.keytab.file

The keytab file used by each NameNode daemon to login as its service principal. The principal name is configured with dfs.namenode.kerberos.principal

/etc/security/keytabs/nn.service.keytab

dfs.namenode.kerberos.internal.spnego.principal

HTTP Kerberos principal name for the NameNode

HTTP/_HOST@REALM

dfs.web.authentication.kerberos.principal

Kerberos principal name for the WebHDFS

HTTP/_HOST@REALM

dfs.web.authentication.kerberos.keytab

Kerberos keytab file for WebHDFS

/etc/security/keytabs/HTTP.service.keytab

dfs.journalnode.kerberos.principal

The JournalNode service principal. This is typically set to jn/_HOST@REALM.TLD. Each JournalNode will substitute _HOST with its own fully qualified hostname at startup. The _HOST placeholder allows using the same configuration setting on all JournalNodes

jn/_HOST@REALM

dfs.journalnode.keytab.file

The keytab file used by each JournalNode daemon to login as its service principal. The principal name is configured with dfs.journalnode.kerberos.principal

/etc/security/keytabs/jn.service.keytab

dfs.journalnode.kerberos.internal.spnego.principal

The server principal used by the JournalNode HTTP Server for SPNEGO authentication when Kerberos security is enabled. This is typically set to HTTP/_HOST@REALM.TLD. The SPNEGO server principal begins with the prefix HTTP/ by convention. If the value is *, the web server will attempt to login with every principal specified in the keytab file dfs.web.authentication.kerberos.keytab. For most deployments this can be set to ${dfs.web.authentication.kerberos.principal} that is use the value of dfs.web.authentication.kerberos.principal

HTTP/_HOST@REALM

dfs.datanode.data.dir.perm

Permissions for the directories on the local filesystem where the DFS DataNode stores its blocks. The permissions can either be octal or symbolic

700

dfs.datanode.kerberos.principal

The DataNode service principal. This is typically set to dn/_HOST@REALM.TLD. Each DataNode will substitute _HOST with its own fully qualified host name at startup. The _HOST placeholder allows using the same configuration setting on all DataNodes

dn/_HOST@REALM.TLD

dfs.datanode.keytab.file

The keytab file used by each DataNode daemon to login as its service principal. The principal name is configured with dfs.datanode.kerberos.principal

/etc/security/keytabs/dn.service.keytab

dfs.http.policy

Defines if HTTPS (SSL) is supported on HDFS. This configures the HTTP endpoint for HDFS daemons. The following values are supported: HTTP_ONLY — the service is provided only via http; HTTPS_ONLY — the service is provided only via https; HTTP_AND_HTTPS — the service is provided both via http and https

HTTP_ONLY

dfs.data.transfer.protection

A comma-separated list of SASL protection values used for secured connections to the DataNode when reading or writing block data. The possible values are:

  • authentication — provides only authentication; no integrity or privacy;

  • integrity — authentication and integrity are enabled;

  • privacy — authentication, integrity and privacy are enabled.

If dfs.encrypt.data.transfer=true, then it supersedes the setting for dfs.data.transfer.protection and enforces that all connections must use a specialized encrypted SASL handshake. This property is ignored for connections to a DataNode listening on a privileged port. In this case, it is assumed that the use of a privileged port establishes sufficient trust

 — 

dfs.encrypt.data.transfer

Defines whether or not actual block data that is read/written from/to HDFS should be encrypted on the wire. This only needs to be set on the NameNodes and DataNodes, clients will deduce this automatically. It is possible to override this setting per connection by specifying custom logic via dfs.trustedchannel.resolver.class

false

dfs.encrypt.data.transfer.algorithm

This value may be set to either 3des or rc4. If nothing is set, then the configured JCE default on the system is used (usually 3DES). It is widely believed that 3DES is more secure, but RC4 is substantially faster. Note that if AES is supported by both the client and server, then this encryption algorithm will only be used to initially transfer keys for AES

3des

dfs.encrypt.data.transfer.cipher.suites

This value can be either undefined or AES/CTR/NoPadding. If defined, then dfs.encrypt.data.transfer uses the specified cipher suite for data encryption. If not defined, then only the algorithm specified in dfs.encrypt.data.transfer.algorithm is used

 — 

dfs.encrypt.data.transfer.cipher.key.bitlength

The key bitlength negotiated by dfsclient and datanode for encryption. This value may be set to either 128, 192, or 256

128

ignore.secure.ports.for.testing

Allows to skip HTTPS requirements in the SASL mode

false

dfs.client.https.need-auth

Whether SSL client certificate authentication is required

false

httpfs-site.xml
Parameter Description Default value

httpfs.http.administrators

The ACL for the admins. This configuration is used to control who can access the default servlets for HttpFS server. The value should be a comma-separated list of users and groups. The user list comes first and is separated by a space, followed by the group list, for example: user1,user2 group1,group2. Both users and groups are optional, so you can define only users, or groups, or both of them. Notice that in all these cases you should always use the leading space in the groups list. Using the asterisk grants access to all users and groups

*

hadoop.http.temp.dir

The HttpFS temp directory

${hadoop.tmp.dir}/httpfs

httpfs.ssl.enabled

Defines whether SSL is enabled. Default is false, that is disabled

false

httpfs.hadoop.config.dir

The location of the Hadoop configuration directory

/etc/hadoop/conf

httpfs.hadoop.authentication.type

Defines the authentication mechanism used by httpfs for its HTTP clients. Valid values are simple and kerberos. If simple is used, clients must specify the username with the user.name query string parameter. If kerberos is used, HTTP clients must use HTTP SPNEGO or delegation tokens

simple

httpfs.hadoop.authentication.kerberos.keytab

The Kerberos keytab file with the credentials for the HTTP Kerberos principal used by httpfs in the HTTP endpoint. httpfs.authentication.kerberos.keytab is deprecated. Instead, use hadoop.http.authentication.kerberos.keytab

/etc/security/keytabs/httpfs.service.keytab

httpfs.hadoop.authentication.kerberos.principal

The HTTP Kerberos principal used by HttpFS in the HTTP endpoint. The HTTP Kerberos principal MUST start with HTTP/ as per Kerberos HTTP SPNEGO specification. httpfs.authentication.kerberos.principal is deprecated. Instead, use hadoop.http.authentication.kerberos.principal

HTTP/${httpfs.hostname}@${kerberos.realm}

ranger-hdfs-audit.xml
Parameter Description Default value

xasecure.audit.destination.solr.batch.filespool.dir

The spool directory path

/srv/ranger/hdfs_plugin/audit_solr_spool

xasecure.audit.destination.solr.urls

Leave this property value empty or set it to NONE when using ZooKeeper to connect to Solr

 — 

xasecure.audit.destination.solr.zookeepers

Specifies the ZooKeeper connection string for the Solr destination

 — 

xasecure.audit.destination.solr.force.use.inmemory.jaas.config

Uses in-memory JAAS configuration file to connect to Solr

 — 

xasecure.audit.is.enabled

Enables Ranger audit

true

xasecure.audit.jaas.Client.loginModuleControlFlag

Specifies whether the success of the module is required, requisite, sufficient, or optional

 — 

xasecure.audit.jaas.Client.loginModuleName

The name of the authenticator class

 — 

xasecure.audit.jaas.Client.option.keyTab

The name of the keytab file to get the principal’s secret key

 — 

xasecure.audit.jaas.Client.option.principal

The name of the principal to be used

 — 

xasecure.audit.jaas.Client.option.serviceName

Represents a user or a service that wants to log in

 — 

xasecure.audit.jaas.Client.option.storeKey

Set this to true if you want the keytab or the principal’s key to be stored in the subject’s private credentials

false

xasecure.audit.jaas.Client.option.useKeyTab

Set this to true if you want the module to get the principal’s key from the keytab

false

ranger-hdfs-security.xml
Parameter Description Default value

ranger.plugin.hdfs.policy.rest.url

The URL to Ranger Admin

 — 

ranger.plugin.hdfs.service.name

The name of the Ranger service containing policies for this instance

 — 

ranger.plugin.hdfs.policy.cache.dir

The directory where Ranger policies are cached after successful retrieval from the source

/srv/ranger/hdfs/policycache

ranger.plugin.hdfs.policy.pollIntervalMs

Defines how often to poll for changes in policies

30000

ranger.plugin.hdfs.policy.rest.client.connection.timeoutMs

The HDFS Plugin RangerRestClient connection timeout (in milliseconds)

120000

ranger.plugin.hdfs.policy.rest.client.read.timeoutMs

The HDFS Plugin RangerRestClient read timeout (in milliseconds)

30000

ranger.plugin.hdfs.policy.rest.ssl.config.file

The path to the RangerRestClient SSL config file for the HDFS plugin

/etc/hadoop/conf/ranger-hdfs-policymgr-ssl.xml

httpfs-env.sh
Parameter Description Default value

HADOOP_CONF_DIR

Hadoop configuration directory

/etc/hadoop/conf

HADOOP_LOG_DIR

Location of the log directory

${HTTPFS_LOG}

HADOOP_PID_DIR

PID file directory location

${HTTPFS_TEMP}

HTTPFS_SSL_ENABLED

Defines if SSL is enabled for httpfs

false

HTTPFS_SSL_KEYSTORE_FILE

The path to the keystore file

admin

HTTPFS_SSL_KEYSTORE_PASS

The password to access the keystore

admin

Hadoop options
Parameter Description Default value

HDFS_NAMENODE_OPTS

NameNode Heap Memory. Sets initial (-Xms) and maximum (-Xmx) Java heap memory size and environment options for the NameNode

-Xms1G -Xmx8G

HDFS_DATANODE_OPTS

DataNode Heap Memory. Sets initial (-Xms) and maximum (-Xmx) Java heap memory size and environment options for the DataNode

-Xms700m -Xmx8G

HDFS_HTTPFS_OPTS

HttpFS Heap Memory. Sets initial (-Xms) and maximum (-Xmx) Java heap memory size and environment options for the httpfs server

-Xms700m -Xmx8G

HDFS_JOURNALNODE_OPTS

JournalNode Heap Memory. Sets initial (-Xms) and maximum (-Xmx) Java heap memory size and environment options for the JournalNode

-Xms700m -Xmx8G

HDFS_ZKFC_OPTS

ZKFC Heap Memory. Sets initial (-Xms) and maximum (-Xmx) Java heap memory size and environment options for ZKFC

-Xms500m -Xmx8G

ssl-server.xml
Parameter Description Default value

ssl.server.truststore.location

The truststore to be used by NameNodes and DataNodes

 — 

ssl.server.truststore.password

The password to the truststore

 — 

ssl.server.truststore.type

The truststore file format

jks

ssl.server.truststore.reload.interval

The truststore reload check interval (in milliseconds)

10000

ssl.server.keystore.location

The path to the keystore file used by NameNodes and DataNodes

 — 

ssl.server.keystore.password

The password to the keystore

 — 

ssl.server.keystore.keypassword

The password to the key in the keystore

 — 

ssl.server.keystore.type

The keystore file format

 — 

ssl-client.xml
Parameter Description Default value

ssl.client.truststore.location

The truststore to be used by NameNodes and DataNodes

 — 

ssl.client.truststore.password

The password to the truststore

 — 

ssl.client.truststore.location

The truststore to be used by NameNodes and DataNodes

 — 

ssl.client.truststore.type

The truststore file format

jks

ssl.client.truststore.reload.interval

The truststore reload check interval (in milliseconds)

10000

ssl.client.keystore.location

The path to the keystore file used by NameNodes and DataNodes

 — 

ssl.client.keystore.password

The password to the keystore

 — 

ssl.client.keystore.keypassword

The password to the key in the keystore

 — 

ssl.client.keystore.type

The keystore file format

 — 

Lists of decommissioned and in maintenance hosts
Parameter Description Default value

DECOMMISSIONED

When an administrator decommissions a DataNode, the DataNode will first be transitioned into DECOMMISSION_INPROGRESS state. After all blocks belonging to that DataNode are fully replicated elsewhere based on each block replication factor, the DataNode will be transitioned to DECOMMISSIONED state. After that, the administrator can shutdown the node to perform long-term repair and maintenance that could take days or weeks. After the machine has been repaired, the machine can be recommissioned back to the cluster

 — 

IN_MAINTENANCE

Sometimes administrators only need to take DataNodes down for minutes/hours to perform short-term repair/maintenance. For such scenarios, the HDFS block replication overhead, incurred by decommission, might not be necessary and a light-weight process is desirable. And that is what maintenance state is used for. When an administrator puts a DataNode in the maintenance state, the DataNode will first be transitioned to ENTERING_MAINTENANCE state. As long as all blocks belonging to that DataNode, are minimally replicated elsewhere, the DataNode will immediately be transitioned to IN_MAINTENANCE state. After the maintenance has completed, the administrator can take the DataNode out of the maintenance state. In addition, maintenance state supports the timeout that allows administrators to configure the maximum duration, in which a DataNode is allowed to stay in the maintenance state. After the timeout, the DataNode will be transitioned out of maintenance state automatically by HDFS without human intervention

 — 

Other
Parameter Description Default value

Custom core-site.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file core-site.xml

 — 

Custom hdfs-site.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file hdfs-site.xml

 — 

Custom httpfs-site.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file httpfs-site.xml

 — 

Ranger plugin enabled

Whether or not Ranger plugin is enabled

 — 

Custom ranger-hdfs-audit.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ranger-hdfs-audit.xml

 — 

Custom ranger-hdfs-security.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ranger-hdfs-security.xml

 — 

Custom ranger-hdfs-policymgr-ssl.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ranger-hdfs-policymgr-ssl.xml

 — 

Custom httpfs-env.sh

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file httpfs-env.sh

 — 

Custom ssl-server.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ssl-server.xml

 — 

Custom ssl-client.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ssl-client.xml

 — 

Topology script

The topology script used in HDFS

 — 

Topology data

An otional text file to map host names to the rack number for topology script. Stored to /etc/hadoop/conf/topology.data

 — 

Custom log4j.properties

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file log4j.properties

Custom httpfs-log4j.properties

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file httpfs-log4j.properties

Hive

hive-env.sh
Parameter Description Default value

HADOOP_CLASSPATH

A colon-delimited list of directories, files, or wildcard locations that include all necessary classes

/etc/tez/conf/:/usr/lib/tez/:/usr/lib/tez/lib/

HIVE_HOME

The Hive home directory

/usr/lib/hive

METASTORE_PORT

The Hive Metastore port

9083

Hive heap memory settings
Parameter Description Default value

HiveServer2 Heap Memory

Sets initial (-Xms) and maximum (-Xmx) Java heap size for HiveServer2

-Xms256m -Xmx256m

Hive Metastore Heap Memory

Sets initial (-Xms) and maximum (-Xmx) Java heap size for Hive Metastore

-Xms256m -Xmx256m

hive-site.xml
Parameter Description Default value

hive.cbo.enable

When set to true, enables the cost-based optimizer that uses the Calcite framework

true

hive.compute.query.using.stats

When set to true, Hive will answer a few queries like min, max, and count(1) purely using statistics stored in the Metastore. For basic statistics collection, set the configuration property hive.stats.autogather to true. For more advanced statistics collection, run the ANALYZE TABLE queries

false

hive.execution.engine

Selects the execution engine. Supported values are: mr (Map Reduce, default), tez (Tez execution, for Hadoop 2 only), or spark (Spark execution, for Hive 1.1.0 onward)

Tez

hive.log.explain.output

When enabled, logs the EXPLAIN EXTENDED output for the query at log4j INFO level and in the HiveServer2 web UI (Drilldown → Query Plan). Starting Hive 3.1.0, this configuration property only logs as the log4j INFO. To log the EXPLAIN EXTENDED output in WebUI/Drilldown/Query Plan in Hive 3.1.0 and later, use hive.server2.webui.explain.output

true

hive.metastore.event.db.notification.api.auth

Defines whether the Metastore should perform the authorization against database notification related APIs such as get_next_notification. If set to true, then only the superusers in proxy settings have the permission

false

hive.metastore.uris

The Metastore URI used to access metadata in a remote metastore setup. For a remote metastore, you should specify the Thrift metastore server URI: thrift://<hostname>:<port> where <hostname> is a name or IP address of the Thrift metastore server, <port> is the port, on which the Thrift server is listening

 — 

hive.metastore.warehouse.dir

The absolute HDFS file path of the default database for the warehouse, that is local to the cluster

/apps/hive/warehouse

hive.server2.enable.doAs

Impersonate the connected user

false

hive.stats.fetch.column.stats

Annotation of the operator tree with statistics information requires column statistics. Column statistics are fetched from the Metastore. Fetching column statistics for each needed column can be expensive, when the number of columns is high. This flag can be used to disable fetching of column statistics from the Metastore

 — 

hive.tez.container.size

By default, Tez will spawn containers of the size of a mapper. This parameter can be used to overwrite the default value

 — 

hive.support.concurrency

Defines whether Hive should support concurrency or not. A ZooKeeper instance must be up and running for the default Hive Lock Manager to support read/write locks

false

hive.txn.manager

Set this to org.apache.hadoop.hive.ql.lockmgr.DbTxnManager as part of turning on Hive transactions. The default DummyTxnManager replicates pre-Hive-0.13 behavior and provides no transactions

 — 

javax.jdo.option.ConnectionUserName

The metastore database user name

APP

javax.jdo.option.ConnectionPassword

The password for the metastore user name

 — 

javax.jdo.option.ConnectionURL

The JDBC connection URI used to access the data stored in the local Metastore setup. Use the following connection URI: jdbc:<datastore type>://<node name>:<port>/<database name> where:

  • <node name> is the host name or IP address of the data store;

  • <data store type> is the type of the data store;

  • <port> is the port on which the data store listens for remote procedure calls (RPC);

  • <database name> is the name of the database.

For example, the following URI specifies a local metastore that uses MySQL as a data store: jdbc:mysql://hostname23:3306/metastore

jdbc:mysql://{{ groups['mysql.master'][0] | d(omit) }}:3306/hive

javax.jdo.option.ConnectionDriverName

The JDBC driver class name used to access Hive Metastore

com.mysql.jdbc.Driver

hive.server2.transport.mode

Sets the transport mode

tcp

hive.server2.thrift.http.port

The port number for Thrift Server2 to listen on

10001

hive.server2.thrift.http.path

The HTTP endpoint of the Thrift Server2 service

cliservice

hive.server2.authentication.kerberos.principal

Hive server Kerberos principal

hive/_HOST@EXAMPLE.COM

hive.server2.authentication.kerberos.keytab

The path to the Kerberos keytab file containing the Hive server service principal

/etc/security/keytabs/hive.service.keytab

hive.server2.authentication.spnego.principal

The SPNEGO Kerberos principal

HTTP/_HOST@EXAMPLE.COM

hive.server2.webui.spnego.principal

The SPNEGO Kerberos principal to access Web UI

 — 

hive.server2.webui.spnego.keytab

The SPNEGO Kerberos keytab file to access Web UI

 — 

hive.server2.webui.use.spnego

Defines whether to use Kerberos SPNEGO for Web UI access

false

hive.server2.authentication.spnego.keytab

The path to SPNEGO principal

/etc/security/keytabs/HTTP.service.keytab

hive.server2.authentication

Sets the authentication mode

NONE

hive.metastore.sasl.enabled

If true, the Metastore Thrift interface will be secured with SASL. Clients must authenticate with Kerberos

false

hive.metastore.kerberos.principal

The service principal for the metastore Thrift server. The _HOST token will be automatically replaced with the appropriate host name

hive/_HOST@EXAMPLE.COM

hive.metastore.kerberos.keytab.file

The path to the Kerberos keytab file containing the metastore Thrift server’s service principal

/etc/security/keytabs/hive.service.keytab

hive.server2.use.SSL

Defines whether to use SSL for HiveServer2

false

hive.server2.keystore.path

The keystore to be used by Hive

 — 

hive.server2.keystore.password

The password to the Hive keystore

 — 

hive.server2.truststore.path

The truststore to be used by Hive

 — 

hive.server2.webui.use.ssl

Defines whether to use SSL for the Hive web UI

false

hive.server2.webui.keystore.path

The path to the keystore file used to access the Hive web UI

 — 

hive.server2.webui.keystore.password

The password to the keystore file used to access the Hive web UI

 — 

hive.server2.support.dynamic.service.discovery

Defines whether to support dynamic service discovery via ZooKeeper

false

hive.zookeeper.quorum

A comma-separated list of ZooKeeper servers (<host>:<port>) running in the cluster

zookeeper:2181

hive.server2.zookeeper.namespace

Specifies the root namespace on ZooKeeper

hiveserver2

ranger-hive-audit.xml
Parameter Description Default value

xasecure.audit.destination.solr.batch.filespool.dir

The spool directory path

/srv/ranger/hdfs_plugin/audit_solr_spool

xasecure.audit.destination.solr.urls

Leave this property value empty or set it to NONE when using ZooKeeper to connect to Solr

 — 

xasecure.audit.destination.solr.zookeepers

Specifies the ZooKeeper connection string for the Solr destination

 — 

xasecure.audit.destination.solr.force.use.inmemory.jaas.config

Uses in-memory JAAS configuration file to connect to Solr

 — 

xasecure.audit.is.enabled

Enables Ranger audit

true

xasecure.audit.jaas.Client.loginModuleControlFlag

Specifies whether the success of the module is required, requisite, sufficient, or optional

 — 

xasecure.audit.jaas.Client.loginModuleName

The name of the authenticator class

 — 

xasecure.audit.jaas.Client.option.keyTab

The name of the keytab file to get the principal’s secret key

 — 

xasecure.audit.jaas.Client.option.principal

The name of the principal to be used

 — 

xasecure.audit.jaas.Client.option.serviceName

Represents a user or a service that wants to log in

 — 

xasecure.audit.jaas.Client.option.storeKey

Set this to true if you want the keytab or the principal’s key to be stored in the subject’s private credentials

false

xasecure.audit.jaas.Client.option.useKeyTab

Set this to true if you want the module to get the principal’s key from the keytab

false

ranger-hive-security.xml
Parameter Description Default value

ranger.plugin.hive.policy.rest.url

The URL to Ranger Admin

 — 

ranger.plugin.hive.service.name

The name of the Ranger service containing policies for this instance

 — 

ranger.plugin.hive.policy.cache.dir

The directory where Ranger policies are cached after successful retrieval from the source

/srv/ranger/hive/policycache

ranger.plugin.hive.policy.pollIntervalMs

Defines how often to poll for changes in policies

30000

ranger.plugin.hive.policy.rest.client.connection.timeoutMs

The Hive Plugin RangerRestClient connection timeout (in milliseconds)

120000

ranger.plugin.hive.policy.rest.client.read.timeoutMs

The Hive Plugin RangerRestClient read timeout (in milliseconds)

30000

xasecure.hive.update.xapolicies.on.grant.revoke

Controls Hive Ranger policy update from SQL Grant/Revoke commands

true

ranger.plugin.hive.policy.rest.ssl.config.file

The path to the RangerRestClient SSL config file for the Hive plugin

/etc/hive/conf/ranger-hive-policymgr-ssl.xml

ranger-hive-policymgr-ssl.xml
Parameter Description Default value

xasecure.policymgr.clientssl.keystore

The path to the keystore file used by Ranger

 — 

xasecure.policymgr.clientssl.keystore.credential.file

The path to the keystore credentials file

/etc/hive/conf/ranger-hive.jceks

xasecure.policymgr.clientssl.truststore.credential.file

The path to the truststore credentials file

/etc/hive/conf/ranger-hive.jceks

xasecure.policymgr.clientssl.truststore

The path to the truststore file used by Ranger

 — 

xasecure.policymgr.clientssl.keystore.password

The password to the keystore file

 — 

xasecure.policymgr.clientssl.truststore.password

The password to the truststore file

 — 

tez-site.xml
Parameter Description Default value

tez.am.resource.memory.mb

The amount of memory in MB, that YARN will allocate to the Tez Application Master. The size increases with the size of the DAG

 — 

tez.history.logging.service.class

Enables Tez to use the Timeline Server for History Logging

org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService

tez.lib.uris

HDFS paths containing the Tez JAR files

${fs.defaultFS}/apps/tez/tez-0.9.2.tar.gz

tez.task.resource.memory.mb

The amount of memory used by launched tasks in TEZ containers. Usually this value is set in the DAG

 — 

tez.tez-ui.history-url.base

The URL where the Tez UI is hosted

 — 

tez.use.cluster.hadoop-libs

Specifies, whether Tez will use the cluster Hadoop libraries

true

nginx.conf
Parameter Description Default value

ssl_certificate

The path to the SSL certificate for NGINX

/etc/ssl/certs/host_cert.cert

ssl_certificate_key

The path to the SSL certificate key for NGINX

/etc/ssl/host_cert.key

Other
Parameter Description Default value

ACID Transactions

Defines whether to enable ACID transactions

false

Database type

The type of the external database used for Hive Metastore

mysql

Custom hive-site.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file hive-site.xml

 — 

Custom hive-env.sh

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file hive-env.sh

 — 

Ranger plugin enabled

Whether or not Ranger plugin is enabled

false

Custom ranger-hive-audit.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ranger-hive-audit.xml

 — 

Custom ranger-hive-security.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ranger-hive-security.xml

 — 

Custom ranger-hive-policymgr-ssl.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ranger-hive-policymgr-ssl.xml

 — 

Custom tez-site.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file tez-site.xml

 — 

Impala

Parameter Description Default value

impala-env.sh

The contents of the impala-env.sh file that contains Impala environment settings

The Impala Daemon component
impalastore.conf
Parameter Description Default value

hostname

The hostname to use for the Impala daemon. If Kerberos is enabled, it is also used as a part of the Kerberos principal. If this option is not set, the system default is used

 — 

beeswax_port

The port on which Impala daemons serve Beeswax client requests

21000

fe_port

The frontend port of the Impala daemon

21000

be_port

Internal use only. Impala daemons use this port for Thrift-based communication with each other

22000

krpc_port

Internal use only. Impala daemons use this port for KRPC-based communication with each other

27000

hs2_port

The port on which Impala daemons serve HiveServer2 client requests

21050

hs2_http_port

The port is used by client applications to transmit commands and receive results over HTTP via the HiveServer2 protocol

28000

enable_webserver

Enables or disables the Impala daemon web server. Its Web UI contains information about configuration settings, running and completed queries, and associated resource usage for them. It is primarily used for diagnosing query problems that can be traced to a particular node

True

webserver_require_spnego

Enables the Kerberos authentication for Hadoop HTTP web consoles for all roles of this service using the SPNEGO protocol. Use this option only if Kerberos is enabled for the HDFS service

False

webserver_port

The port where the Impala daemon web server is running

25000

catalog_service_host

The host where the Impala Catalog Service component is running

 — 

catalog_service_port

The port on which the Impala Catalog Service component listens

26000

state_store_host

The host where the Impala Statestore component is running

 — 

state_store_port

The port on which the Impala Statestore component is running

24000

state_store_subscriber_port

The port where StateStoreSubscriberService is running. StateStoreSubscriberService listens on this port for updates from the Statestore daemon

23030

scratch_dirs

The directory where Impala Daemons writes data to free up memory during large sort, join, aggregation, and other operations. The files are removed when the operation finishes. This can potentially be large amounts of data

/srv/impala/

log_dir

The directory where an Impala daemon places its log files

/var/log/impala/impalad/

log_filename

The Prefix of the log filename — the full path is <log_dir>/<log_filename>

impalad

max_log_files

The number of log files that are kept for each severity level (INFO, WARNING, ERROR, and FATAL) before older log files are removed. The number should be greater than 1 to keep at least the current log file to remain open. If set to 0, all log files are retained and log rotation is disabled

10

audit_event_log_dir

The directory in which Impala daemon audit event log files are written if the Impala Audit Event Generation property is enabled

/var/log/impala/impalad/audit

minidump_path

The directory for storing Impala daemon Breakpad dumps

/var/log/impala-minidumps

lineage_event_log_dir

The directory in which the Impala daemon generates its lineage log files if the Impala Lineage Generation property is enabled

/var/log/impala/impalad/lineage

local_library_dir

The local directory into which an Impala daemon copies user-defined function (UDF) libraries from HDFS

/usr/lib/impala/udfs

max_lineage_log_file_size

The maximum size (in entries) of the Impala daemon lineage log file. When the size is exceeded, a new file is created

5000

max_audit_event_log_file_size

The maximum size (in queries) of the Impala Daemon audit event log file. When the size is exceeded, a new file is created

5000

fe_service_threads

The maximum number of concurrent client connections allowed. The parameter determines how many queries can run simultaneously. When more clients try to connect to Impala, the later arriving clients have to wait until previous clients disconnect. Setting the fe_service_threads value too high could negatively impact query latency

64

mem_limit

The memory limit (in bytes) for an Impala daemon enforced by the daemon itself. This limit does not include memory consumed by the daemon’s embedded JVM. The Impala daemon uses up this amount of memory for query processing, cached data, network buffers, background operations, etc. If the limit is exceeded, queries will be killed until the used memory becomes under the limit

1473249280

idle_query_timeout

The time in seconds after which an idle query (no processing work is done and no updates are received from the client) is cancelled. If set to 0, idle queries are never expired

0

idle_session_timeout

The time in seconds after which Impala closes an idle session and cancels all running queries. If set to 0, idle sessions never expire

0

max_result_cache_size

The maximum number of query results a client can request to be cached on a per-query basis to support restarting fetches. This option guards against unreasonably large result caches. Requests exceeding this maximum are rejected

100000

max_cached_file_handles

The maximum number of cached HDFS file handles. Caching HDFS file handles reduces the number of new file handles opened and thus reduces the load on a HDFS NameNode. Each cached file handle consumes a small amount of memory. If set to 0, the file handle caching is disabled

20000

unused_file_handle_timeout_sec

The maximum time in seconds during which an unused HDFS file handle remains in the HDFS file handle cache. When the underlying file for a cached file handle is deleted, the disk space may not be freed until the cached file handle is removed from the cache. This timeout allows the disk space occupied by deleted files to be freed in a predictable period of time. If set to 0, unused cached HDFS file handles are not removed

21600

statestore_subscriber_timeout_seconds

The timeout in seconds for Impala Daemon and Catalog Server connections to Statestore

30

default_query_options

A list of key/value pairs representing additional query options to pass to the Impala Daemon command line, separated by commas

default_file_format=parquet,default_transactional_type=none

load_auth_to_local_rules

If checked (True) and Kerberos is enabled for Impala, Impala uses the auth_to_local option from hadoop.security.auth_to_local rules of the HDFS configuration

True

catalog_topic_mode

The granularity of on-demand metadata fetches between the Impala Daemon coordinator and Impala Catalog Service. See Metadata management

minimal

use_local_catalog

Allows coordinators to cache metadata from Impala Catalog Service. If this is set to True, coordinators pull metadata as needed from catalogd and cache it locally. The cached metadata is automatically removed under memory pressure or after an expiration time. See Metadata management

True

abort_on_failed_audit_event

Specifies whether shutdown Impala if there is a problem with recording an audit event

False

max_minidumps

The maximum number of Breakpad dump files stored by the Impala daemon. A negative value or 0 is interpreted as an unlimited number

9

authorized_proxy_user_config

Specifies the set of authorized proxy users (the users who can impersonate other users during authorization), and users who they are allowed to impersonate. The example of syntax for the option is: authenticated_user1=delegated_user1,delegated_user2;authenticated_user2=*. See Configuring Impala delegation for clients. The list can contain short usernames or * to indicate all users

knox=*;zeppelin=*

queue_wait_timeout_ms

The maximum amount of time (in milliseconds) that a request waits to be admitted before timing out. Must be a positive integer

60000

disk_spill_encryption

Specifies whether to encrypt and verify the integrity of all data spilled to the disk as part of a query

False

abort_on_config_error

Specifies whether to abort Impala startup if there are incorrect configs or Impala is running on unsupported hardware

True

kerberos_reinit_interval

The number of minutes between reestablishing the ticket with the Kerberos server

60

principal

The service Kerberos principal

 — 

keytab_file

The service Kerberos keytab file

 — 

ssl_server_certificate

The path to the TLS/SSL file with the server certificate key used for TLS/SSL. It is used when Impala operates as a TLS/SSL server. The certificate file must be in the PEM format

 — 

ssl_private_key

The path to the TLS/SSL file with the private key used for TLS/SSL. It is used when Impala operates as a TLS/SSL server. The file must be in the PEM format

 — 

ssl_client_ca_certificate

The path to the certificate, in the PEM format, used to confirm the authenticity of SSL/TLS servers that the Impala daemons can connect to. Since the Impala daemons connect to each other, it should also include the CA certificate used to sign all the SSL/TLS certificates. SSL/TLS between Impala daemons cannot be enabled without this parameter

 — 

webserver_certificate_file

The path to the TLS/SSL file with the server certificate key used for TLS/SSL. It is used when the Impala daemon web server operates as a TLS/SSL server. The certificate file must be in the PEM format

 — 

webserver_private_key_file

The path to the TLS/SSL file with the private key used for TLS/SSL. It is used when the Impala daemon web server operates as a TLS/SSL server. The certificate file must be in the PEM format

 — 

ssl_minimum_version

The minimum version of TLS

TLSv1.2

Others
Parameter Description Default value

log4j.properties

Apache Log4j utility settings

log.threshold=INFO
main.logger=FA
impala.root.logger=DEBUG,FA
log4j.rootLogger=DEBUG,FA
log.dir=/var/log/impala/impalad
max.log.file.size=200MB
log4j.appender.FA=org.apache.log4j.FileAppender
log4j.appender.FA.File=/var/log/impalad/impalad.INFO
log4j.appender.FA.layout=org.apache.log4j.PatternLayout
log4j.appender.FA.layout.ConversionPattern=%p%d{MMdd HH:mm:ss.SSS'000'} %t %c] %m%n
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{2}: %m%n

Enable custom ulimits

Switch on the corresponding toggle button to specify resource limits (ulimits) for the current process. If you do not set these values, the default system settings are used. Ulimit settings are described in the table below

[Manager]
DefaultLimitCPU=
DefaultLimitFSIZE=
DefaultLimitDATA=
DefaultLimitSTACK=
DefaultLimitCORE=
DefaultLimitRSS=
DefaultLimitNOFILE=
DefaultLimitAS=
DefaultLimitNPROC=
DefaultLimitMEMLOCK=
DefaultLimitLOCKS=
DefaultLimitSIGPENDING=
DefaultLimitMSGQUEUE=
DefaultLimitNICE=
DefaultLimitRTPRIO=
DefaultLimitRTTIME=
Ulimit settings
Parameter Description Corresponding option of the ulimit command in CentOS

DefaultLimitCPU

A limit in seconds on the amount of CPU time that a process can consume

cpu time ( -t)

DefaultLimitFSIZE

The maximum size of files that a process can create, in 512-byte blocks

file size ( -f)

DefaultLimitDATA

The maximum size of a process’s data segment, in kilobytes

data seg size ( -d)

DefaultLimitSTACK

The maximum stack size allocated to a process, in kilobytes

stack size ( -s)

DefaultLimitCORE

The maximum size of a core dump file allowed for a process, in 512-byte blocks

core file size ( -c)

DefaultLimitRSS

The maximum of resident set size, in kilobytes

max memory size ( -m)

DefaultLimitNOFILE

The maximum number of open file descriptors allowed for the process

open files ( -n)

DefaultLimitAS

The maximum size of the process virtual memory (address space), in kilobytes

virtual memory ( -v)

DefaultLimitNPROC

The maximum number of processes

max user processes ( -u)

DefaultLimitMEMLOCK

The maximum memory size that can be locked for the process, in kilobytes. Memory locking ensures the memory is always in RAM and a swap file is not used

max locked memory ( -l)

DefaultLimitLOCKS

The maximum number of files locked by a process

file locks ( -x)

DefaultLimitSIGPENDING

The maximum number of signals that are pending for delivery to the calling thread

pending signals ( -i)

DefaultLimitMSGQUEUE

The maximum number of bytes in POSIX message queues. POSIX message queues allow processes to exchange data in the form of messages

POSIX message queues ( -q)

DefaultLimitNICE

The maximum NICE priority level that can be assigned to a process

scheduling priority ( -e)

DefaultLimitRTPRIO

The maximum real-time scheduling priority level

real-time priority ( -r)

DefaultLimitRTTIME

The maximum pipe buffer size, in 512-byte blocks

pipe size ( -p)

The Impala Statestore component
statestore.conf
Parameter Description Default value

hostname

The hostname to use for the Statestore daemon. If Kerberos is enabled, it is also used as a part of the Kerberos principal. If this option is not set, the system default is used

 — 

state_store_host

The host where the Impala Statestore component is running

 — 

state_store_port

The port on which the Impala Statestore component is running

24000

catalog_service_host

The host where the Impala Catalog Service component is running

 — 

catalog_service_port

The port on which the Impala Catalog Service component listens

26000

enable_webserver

Enables or disables the Statestore daemon web server. Its Web UI contains information about memory usage, configuration settings, and ongoing health checks performed by Statestore

True

webserver_require_spnego

Enables the Kerberos authentication for Hadoop HTTP web consoles for all roles of this service using the SPNEGO protocol. Use this option only if Kerberos is enabled for the HDFS service

False

webserver_port

The port on which the Statestore web server is running

25010

log_dir

The directory where the Statestore daemon places its log files

/var/log/impala/statestored/

log_filename

The Prefix of the log filename — the full path is <log_dir>/<log_filename>

statestored

max_log_files

The number of log files that are kept for each severity level (INFO, WARNING, ERROR, and FATAL) before older log files are removed. The number should be greater than 1 to keep at least the current log file to remain open. If set to 0, all log files are retained and log rotation is disabled

10

minidump_path

The directory for storing Statestore daemon Breakpad dumps

/var/log/impala-minidumps

max_minidumps

The maximum number of Breakpad dump files stored by Statestore daemon. A negative value or 0 is interpreted as an unlimited number

9

state_store_num_server_worker_threads

The number of worker threads for the thread manager of the Statestore Thrift server

4

state_store_pending_task_count_max

The maximum number of tasks allowed to be pending by the thread manager of the Statestore Thrift server. The 0 value allows an infinite number of pending tasks

0

kerberos_reinit_interval

The number of minutes between reestablishing the ticket with the Kerberos server

60

principal

The service Kerberos principal

 — 

keytab_file

The service Kerberos keytab file

 — 

ssl_server_certificate

The path to the TLS/SSL file with the server certificate key used for TLS/SSL. It is used when Impala operates as a TLS/SSL server. The certificate file must be in the PEM format

 — 

ssl_private_key

The path to the TLS/SSL file with the private key used for TLS/SSL. It is used when Impala operates as a TLS/SSL server. The file must be in the PEM format

 — 

ssl_client_ca_certificate

The path to the certificate, in the PEM format, used to confirm the authenticity of SSL/TLS servers that the Impala daemons can connect to. Since the Impala daemons connect to each other, it should also include the CA certificate used to sign all the SSL/TLS certificates. SSL/TLS between Impala daemons cannot be enabled without this parameter

 — 

webserver_certificate_file

The path to the TLS/SSL file with the server certificate key used for TLS/SSL. It is used when the Statestore web server operates as a TLS/SSL server. The certificate file must be in the PEM format

 — 

webserver_private_key_file

The path to the TLS/SSL file with the private key used for TLS/SSL. It is used when the Statestore web server operates as a TLS/SSL server. The certificate file must be in the PEM format

 — 

ssl_minimum_version

The minimum version of TLS

TLSv1.2

Others
Parameter Description Default value

Custom statestore.conf

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file statestore.conf

 — 

Enable custom ulimits

Switch on the corresponding toggle button to specify resource limits (ulimits) for the current process. If you do not set these values, the default system settings are used. Ulimit settings are described in the table below

[Manager]
DefaultLimitCPU=
DefaultLimitFSIZE=
DefaultLimitDATA=
DefaultLimitSTACK=
DefaultLimitCORE=
DefaultLimitRSS=
DefaultLimitNOFILE=
DefaultLimitAS=
DefaultLimitNPROC=
DefaultLimitMEMLOCK=
DefaultLimitLOCKS=
DefaultLimitSIGPENDING=
DefaultLimitMSGQUEUE=
DefaultLimitNICE=
DefaultLimitRTPRIO=
DefaultLimitRTTIME=
Ulimit settings
Parameter Description Corresponding option of the ulimit command in CentOS

DefaultLimitCPU

A limit in seconds on the amount of CPU time that a process can consume

cpu time ( -t)

DefaultLimitFSIZE

The maximum size of files that a process can create, in 512-byte blocks

file size ( -f)

DefaultLimitDATA

The maximum size of a process’s data segment, in kilobytes

data seg size ( -d)

DefaultLimitSTACK

The maximum stack size allocated to a process, in kilobytes

stack size ( -s)

DefaultLimitCORE

The maximum size of a core dump file allowed for a process, in 512-byte blocks

core file size ( -c)

DefaultLimitRSS

The maximum of resident set size, in kilobytes

max memory size ( -m)

DefaultLimitNOFILE

The maximum number of open file descriptors allowed for the process

open files ( -n)

DefaultLimitAS

The maximum size of the process virtual memory (address space), in kilobytes

virtual memory ( -v)

DefaultLimitNPROC

The maximum number of processes

max user processes ( -u)

DefaultLimitMEMLOCK

The maximum memory size that can be locked for the process, in kilobytes. Memory locking ensures the memory is always in RAM and a swap file is not used

max locked memory ( -l)

DefaultLimitLOCKS

The maximum number of files locked by a process

file locks ( -x)

DefaultLimitSIGPENDING

The maximum number of signals that are pending for delivery to the calling thread

pending signals ( -i)

DefaultLimitMSGQUEUE

The maximum number of bytes in POSIX message queues. POSIX message queues allow processes to exchange data in the form of messages

POSIX message queues ( -q)

DefaultLimitNICE

The maximum NICE priority level that can be assigned to a process

scheduling priority ( -e)

DefaultLimitRTPRIO

The maximum real-time scheduling priority level

real-time priority ( -r)

DefaultLimitRTTIME

The maximum pipe buffer size, in 512-byte blocks

pipe size ( -p)

The Impala Catalog Service component
catalogstore.conf
Parameter Description Default value

hostname

The hostname to use for the Catalog Service daemon. If Kerberos is enabled, it is also used as a part of the Kerberos principal. If this option is not set, the system default is used

 — 

state_store_host

The host where the Impala Statestore component is running

 — 

state_store_port

The port on which the Impala Statestore component is running

24000

catalog_service_host

The host where the Impala Catalog Service component is running

 — 

catalog_service_port

The port on which the Impala Catalog Service component listens

26000

enable_webserver

Enables or disables the Catalog Service web server. Its Web UI includes information about the databases, tables, and other objects managed by Impala, in addition to the resource usage and configuration settings of the Catalog Service

True

webserver_require_spnego

Enables the Kerberos authentication for Hadoop HTTP web consoles for all roles of this service using the SPNEGO protocol. Use this option only if Kerberos is enabled for the HDFS service

False

webserver_port

The port on which the Catalog Service web server is running

25020

log_dir

The directory where the Catalog Service daemon places its log files

/var/log/impala/catalogd/

log_filename

The Prefix of the log filename — the full path is <log_dir>/<log_filename>

catalogd

max_log_files

The number of log files that are kept for each severity level (INFO, WARNING, ERROR, and FATAL) before older log files are removed. The number should be greater than 1 to keep at least the current log file to remain open. If set to 0, all log files are retained and log rotation is disabled

10

minidump_path

The directory for storing the Catalog Service daemon Breakpad dumps

/var/log/impala-minidumps

max_minidumps

The maximum number of Breakpad dump files stored by Catalog Service. A negative value or 0 is interpreted as an unlimited number

9

hms_event_polling_interval_s

When this parameter is set to a positive integer, Catalog Service fetches new notifications from Hive Metastore at the specified interval in seconds. If hms_event_polling_interval_s is set to 0, the automatic metadata invalidation and updates are disabled. See Metadata management

2

load_auth_to_local_rules

If checked (True) and Kerberos is enabled for Impala, Impala uses the auth_to_local option from hadoop.security.auth_to_local rules of the HDFS configuration

True

load_catalog_in_background

If it is set to True, the metadata is loaded in the background, even if that metadata is not required for any query. If False, the metadata is loaded when it is referenced for the first time

False

catalog_topic_mode

The granularity of on-demand metadata fetches between the Impala Daemon coordinator and Impala Catalog Service. See Metadata management

minimal

statestore_subscriber_timeout_seconds

The timeout in seconds for Impala Daemon and Catalog Server connections to Statestore

30

state_store_subscriber_port

The port where StateStoreSubscriberService is running. StateStoreSubscriberService listens on this port for updates from the Statestore daemon

23020

kerberos_reinit_interval

The number of minutes between reestablishing the ticket with the Kerberos server

60

principal

The service Kerberos principal

 — 

keytab_file

The service Kerberos keytab file

 — 

ssl_server_certificate

The path to the TLS/SSL file with the server certificate key used for TLS/SSL. It is used when Impala operates as a TLS/SSL server. The certificate file must be in the PEM format

 — 

ssl_private_key

The path to the TLS/SSL file with the private key used for TLS/SSL. It is used when Impala operates as a TLS/SSL server. The file must be in the PEM format

 — 

ssl_client_ca_certificate

The path to the certificate, in the PEM format, used to confirm the authenticity of SSL/TLS servers that the Impala daemons can connect to. Since the Impala daemons connect to each other, it should also include the CA certificate used to sign all the SSL/TLS certificates. SSL/TLS between Impala daemons cannot be enabled without this parameter

 — 

webserver_certificate_file

The path to the TLS/SSL file with the server certificate key used for TLS/SSL. It is used when the Catalog Service web server operates as a TLS/SSL server. The certificate file must be in the PEM format

 — 

webserver_private_key_file

The path to the TLS/SSL file with the private key used for TLS/SSL. It is used when the Catalog Service web server operates as a TLS/SSL server. The certificate file must be in the PEM format

 — 

ssl_minimum_version

The minimum version of TLS

TLSv1.2

Others
Parameter Description Default value

Custom catalogstore.conf

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file catalogstore.conf

 — 

Enable custom ulimits

Switch on the corresponding toggle button to specify resource limits (ulimits) for the current process. If you do not set these values, the default system settings are used. Ulimit settings are described in the table below

[Manager]
DefaultLimitCPU=
DefaultLimitFSIZE=
DefaultLimitDATA=
DefaultLimitSTACK=
DefaultLimitCORE=
DefaultLimitRSS=
DefaultLimitNOFILE=
DefaultLimitAS=
DefaultLimitNPROC=
DefaultLimitMEMLOCK=
DefaultLimitLOCKS=
DefaultLimitSIGPENDING=
DefaultLimitMSGQUEUE=
DefaultLimitNICE=
DefaultLimitRTPRIO=
DefaultLimitRTTIME=
Ulimit settings
Parameter Description Corresponding option of the ulimit command in CentOS

DefaultLimitCPU

A limit in seconds on the amount of CPU time that a process can consume

cpu time ( -t)

DefaultLimitFSIZE

The maximum size of files that a process can create, in 512-byte blocks

file size ( -f)

DefaultLimitDATA

The maximum size of a process’s data segment, in kilobytes

data seg size ( -d)

DefaultLimitSTACK

The maximum stack size allocated to a process, in kilobytes

stack size ( -s)

DefaultLimitCORE

The maximum size of a core dump file allowed for a process, in 512-byte blocks

core file size ( -c)

DefaultLimitRSS

The maximum of resident set size, in kilobytes

max memory size ( -m)

DefaultLimitNOFILE

The maximum number of open file descriptors allowed for the process

open files ( -n)

DefaultLimitAS

The maximum size of the process virtual memory (address space), in kilobytes

virtual memory ( -v)

DefaultLimitNPROC

The maximum number of processes

max user processes ( -u)

DefaultLimitMEMLOCK

The maximum memory size that can be locked for the process, in kilobytes. Memory locking ensures the memory is always in RAM and a swap file is not used

max locked memory ( -l)

DefaultLimitLOCKS

The maximum number of files locked by a process

file locks ( -x)

DefaultLimitSIGPENDING

The maximum number of signals that are pending for delivery to the calling thread

pending signals ( -i)

DefaultLimitMSGQUEUE

The maximum number of bytes in POSIX message queues. POSIX message queues allow processes to exchange data in the form of messages

POSIX message queues ( -q)

DefaultLimitNICE

The maximum NICE priority level that can be assigned to a process

scheduling priority ( -e)

DefaultLimitRTPRIO

The maximum real-time scheduling priority level

real-time priority ( -r)

DefaultLimitRTTIME

The maximum pipe buffer size, in 512-byte blocks

pipe size ( -p)

Kyuubi

The Kyuubi Server component
kyuubi-defaults.conf
Parameter Description Default value

kyuubi.frontend.rest.bind.port

Port on which the REST frontend service runs

10099

kyuubi.frontend.thrift.binary.bind.port

Port on which the Thrift frontend service runs via a binary protocol

10099

kyuubi.frontend.thrift.http.bind.port

Port on which the Thrift frontend service runs via HTTP

10010

kyuubi.frontend.thrift.http.path

The path component of the URL endpoint for the HTTP version of Thrift

cliservice

kyuubi.engine.share.level

An engine share level. Possible values: CONNECTION (one engine per connection), USER (one engine per user), GROUP (one engine per group), SERVER (one engine per server)

USER

kyuubi.engine.type

An engine type supported by Kyuubi. Possible values: SPARK_SQL, FLINK_SQL, TRINO, HIVE_SQL, JDBC

SPARK_SQL

kyuubi.operation.language

Programming language used to interpret inputs. Possible values: SQL, SCALA, PYTHON

SQL

kyuubi.frontend.protocols

A comma-separated list for supported frontend protocols. Possible values: THRIFT_BINARY, THRIFT_HTTP, REST

THRIFT_BINARY

kyuubi.frontend.thrift.binary.ssl.disallowed.protocols

Forbidden SSL versions for Thrift binary frontend

SSLv2,SSLv3,TLSv1.1

kyuubi.frontend.thrift.http.ssl.protocol.blacklist

Forbidden SSL versions for Thrift HTTP frontend

SSLv2,SSLv3,TLSv1.1

kyuubi.ha.addresses

External Kyuubi instance addresses

<hostname_1>:2181, …​, <hostname_N>:2181

kyuubi.ha.namespace

The root directory for the service to deploy its instance URI

kyuubi

kyuubi.metadata.store.jdbc.database.type

A database type for the server metadata store. Possible values: SQLITE, MYSQL, POSTGRESQL

POSTGRESQL

kyuubi.metadata.store.jdbc.url

A JDBC URL for the server metadata store

jdbc:postgresql://{{ groups['adpg.adpg'][0] | d(omit) }}:5432/kyuubi

kyuubi.metadata.store.jdbc.driver

A JDBC driver classname for the server metadata store

org.postgresql.Driver

kyuubi.metadata.store.jdbc.user

A username for the server metadata store

kyuubi

kyuubi.metadata.store.jdbc.password

A password for the server metadata store

 — 

kyuubi.frontend.thrift.binary.ssl.enabled

Indicates whether to use the SSL encryption in the Thrift binary mode

false

kyuubi.frontend.thrift.http.use.SSL

Indicates whether to use the SSL encryption in the Thrift HTTP mode

false

kyuubi.frontend.ssl.keystore.type

Type of the SSL certificate keystore

 — 

kyuubi.frontend.ssl.keystore.path

Path to the SSL certificate keystore

 — 

kyuubi.frontend.ssl.keystore.password

Password for the SSL certificate keystore

 — 

kyuubi.frontend.thrift.http.ssl.keystore.path

Path to the SSL certificate keystore

 — 

kyuubi.frontend.thrift.http.ssl.keystore.password

Password for the SSL certificate keystore

 — 

kyuubi.authentication

Authentication type. Possible values: NONE, KERBEROS

NONE

kyuubi.ha.zookeeper.acl.enabled

Indicates whether the ZooKeeper ensemble is kerberized

false

kyuubi.ha.zookeeper.auth.type

ZooKeeper authentication type. Possible values: NONE, KERBEROS

NONE

kyuubi.ha.zookeeper.auth.principal

Kerberos principal name used for ZooKeeper authentication

 — 

kyuubi.ha.zookeeper.auth.keytab

Path to Kyuubi Server’s keytab used for ZooKeeper authentication

 — 

kyuubi.kinit.principal

Name of the Kerberos principal

 — 

kyuubi.kinit.keytab

Path to Kyuubi Server’s keytab

 — 

kyuubi.spnego.principal

Name of the SPNego service principal. Set only if using SPNego in authentication

 — 

kyuubi.spnego.keytab

Path to the SPNego service keytab. Set only if using SPNego in authentication

 — 

kyuubi.engine.hive.java.options

Extra Java options for the Hive query engine

 — 

kyuubi-env.conf
Parameter Description Default value

KYUUBI_HOME

Kyuubi home directory

/usr/lib/kyuubi

KYUUBI_CONF_DIR

Directory that stores Kyuubi configurations

/etc/kyuubi/conf

KYUUBI_LOG_DIR

Kyuubi server log directory

/var/log/kyuubi

KYUUBI_PID_DIR

Directory that stores the Kyuubi instance .pid-file

/var/run/kyuubi

KYUUBI_ADDITIONAL_CLASSPATH

Path to a directory with additional SSM libraries

/usr/lib/ssm/lib/smart*

HADOOP_HOME

Hadoop home directory

/usr/lib/hadoop

HADOOP_LIB_DIR

Directory that stores Hadoop libraries

${HADOOP_HOME}/lib

KYUUBI_JAVA_OPTS

Java parameters for Kyuubi

-Djava.library.path=${HADOOP_LIB_DIR}/native/ -Djava.io.tmpdir={{ cluster.config.java_tmpdir | d('/tmp') }}

HADOOP_CLASSPATH

A common $HADOOP_CLASSPATH variable value followed by KYUUBI_ADDITIONAL_CLASSPATH

$HADOOP_CLASSPATH:/usr/lib/ssm/lib/smart*

HADOOP_CONF_DIR

Directory that stores Hadoop configurations

/etc/hadoop/conf

SPARK_HOME

Spark home directory

/usr/lib/spark3

SPARK_CONF_DIR

Directory that stores Spark configurations

/etc/spark3/conf

FLINK_HOME

Flink home directory

/usr/lib/flink

FLINK_CONF_DIR

Directory that stores Flink configurations

/etc/flink/conf

FLINK_HADOOP_CLASSPATH

Additional Hadoop .jar files required to use the Kyuubi Flink engine

$(hadoop classpath):/usr/lib/ssm/lib/smart*

HIVE_HOME

Hive home directory

/usr/lib/hive

HIVE_CONF_DIR

Directory that stores Hive configurations

/etc/hive/conf

HIVE_HADOOP_CLASSPATH

Additional Hadoop .jar files required to use the Kyuubi Hive engine

$(hadoop classpath):/etc/tez/conf/:/usr/lib/tez/*:/usr/lib/tez/lib/*:/usr/lib/ssm/lib/smart*

MySQL

root user
Parameter Description Default value

Password

The root password

 — 

Solr

solr-env.sh
Parameter Description Default value

SOLR_HOME

The location for index data and configs

/srv/solr/server

SOLR_AUTH_TYPE

Specifies the authentication type for Solr

 — 

SOLR_AUTHENTICATION_OPTS

Solr authentication options

 — 

GC_TUNE

JVM parameters for Solr

-XX:-UseLargePages

SOLR_SSL_KEY_STORE:

The path to the Solr keystore file (.jks)

 — 

SOLR_SSL_KEY_STORE_PASSWORD

The password to the Solr keystore file

 — 

SOLR_SSL_TRUST_STORE

The path to the Solr truststore file (.jks)

 — 

SOLR_SSL_TRUST_STORE_PASSWORD

The password to the Solr truststore file

 — 

SOLR_SSL_NEED_CLIENT_AUTH

Defines if client authentication is enabled

false

SOLR_SSL_WANT_CLIENT_AUTH

Enables clients to authenticate (but not requires)

false

SOLR_SSL_CLIENT_HOSTNAME_VERIFICATION

Defines whether to enable hostname verification

false

SOLR_HOST

Specifies the host name of the Solr server

 — 

External zookeeper
Parameter Description Default value

ZK_HOST

Comma-separated locations of all servers in the ensemble and the ports on which they communicate. You can put ZooKeeper chroot at the end of your ZK_HOST connection string. For example, host1.mydomain.com:2181,host2.mydomain.com:2181,host3.mydomain.com:2181/solr

 — 

Solr server heap memory settings
Parameter Description Default value

Solr Server Heap Memory

Sets initial (-Xms) and maximum (-Xmx) Java heap size for Solr Server

-Xms512m -Xmx512m

ranger-solr-audit.xml
Parameter Description Default value

xasecure.audit.solr.solr_url

A path to a Solr collection to store audit logs

 — 

xasecure.audit.solr.async.max.queue.size

The maximum size of internal queue used for storing audit logs

1

xasecure.audit.solr.async.max.flush.interval.ms

The maximum time interval between flushes to disk (in milliseconds)

100

ranger-solr-security.xml
Parameter Description Default value

ranger.plugin.solr.policy.rest.url

The URL to Ranger Admin

 — 

ranger.plugin.solr.service.name

The name of the Ranger service containing policies for this instance

 — 

ranger.plugin.solr.policy.cache.dir

The directory where Ranger policies are cached after successful retrieval from the source

/srv/ranger/yarn/policycache

ranger.plugin.solr.policy.pollIntervalMs

Defines how often to poll for changes in policies

30000

ranger.plugin.solr.policy.rest.client.connection.timeoutMs

The Solr Plugin RangerRestClient connection timeout (in milliseconds)

120000

ranger.plugin.solr.policy.rest.client.read.timeoutMs

The Solr Plugin RangerRestClient read timeout (in milliseconds)

30000

ranger-solr-policymgr-ssl.xml
Parameter Description Default value

xasecure.policymgr.clientssl.keystore

The path to the keystore file used by Ranger

 — 

xasecure.policymgr.clientssl.keystore.credential.file

The path to the keystore credentials file

/etc/solr/conf/ranger-solr.jceks

xasecure.policymgr.clientssl.truststore.credential.file

The path to the truststore credentials file

/etc/solr/conf/ranger-solr.jceks

xasecure.policymgr.clientssl.truststore

The path to the truststore file used by Ranger

 — 

xasecure.policymgr.clientssl.keystore.password

The password to the keystore file

 — 

xasecure.policymgr.clientssl.truststore.password

The password to the truststore file

 — 

Other
Parameter Description Default value

solr.xml

The content of solr.xml

Custom solr-env.sh

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file solr-env.sh

 — 

Ranger plugin enabled

Enables the Ranger plugin

false

Spark

Common
Parameter Description Default value

Dynamic allocation (spark.dynamicAllocation.enabled)

Defines whether to use dynamic resource allocation that scales the number of executors, registered with this application, up and down, based on the workload

false

spark-defaults.conf
Parameter Description Default value

spark.yarn.archive

The archive containing needed Spark JARs for distribution to the YARN cache. If set, this configuration replaces spark.yarn.jars and the archive is used in all the application containers. The archive should contain JAR files in its root directory. The archive can also be hosted on HDFS to speed up file distribution

hdfs:///apps/spark/spark-yarn-archive.tgz

spark.master

The cluster manager to connect to

yarn

spark.yarn.historyServer.address

Spark History server address

 — 

spark.dynamicAllocation.enabled

Defines whether to use dynamic resource allocation that scales the number of executors, registered with this application, up and down, based on the workload

false

spark.shuffle.service.enabled

Enables the external shuffle service. This service preserves the shuffle files written by executors so that executors can be safely removed, or so that shuffle fetches can continue in the event of executor failure. The external shuffle service must be set up in order to enable it

false

spark.eventLog.enabled

Defines whether to log Spark events, useful for reconstructing the Web UI after the application has finished

true

spark.eventLog.dir

The base directory where Spark events are logged, if spark.eventLog.enabled=true. Within this base directory, Spark creates a sub-directory for each application, and logs the events specific to the application in this directory. You may want to set this to a unified location like an HDFS directory so history files can be read by the History Server

hdfs:///var/log/spark/apps

spark.serializer

The class to use for serializing objects that will be sent over the network or need to be cached in serialized form. The default of Java serialization works with any Serializable Java object but is quite slow, so we recommend using org.apache.spark.serializer.KryoSerializer and configuring Kryo serialization when speed is necessary. Can be any subclass of org.apache.spark.Serializer

org.apache.spark.serializer.KryoSerializer

spark.dynamicAllocation.executorIdleTimeout

If dynamic allocation is enabled and an executor has been idle for more than this duration, the executor will be removed. For more details, see Spark documentation

120s

spark.dynamicAllocation.cachedExecutorIdleTimeout

If dynamic allocation is enabled and an executor which has cached data blocks has been idle for more than this duration, the executor will be removed. For more details, see Spark documentation

600s

spark.history.provider

The name of the class that implements the application history backend. Currently there is only one implementation provided with Spark that looks for application logs stored in the file system

org.apache.spark.deploy.history.FsHistoryProvider

spark.history.fs.cleaner.enabled

Specifies whether the History Server should periodically clean up event logs from storage

true

spark.history.store.path

A local directory where to cache application history data. If set, the History Server will store application data on disk instead of keeping it in memory. The data written to disk will be re-used in case of the History Server restart

/var/log/spark/history

spark.driver.extraClassPath

Extra classpath entries to prepend to the classpath of the driver

/usr/lib/hive/lib/hive-shims-scheduler.jar:/usr/lib/hadoop-yarn/hadoop-yarn-server-resourcemanager.jar

spark.history.ui.port

The port number of the History Server web UI

18082

spark.history.fs.logDirectory

The log directory of the History Server

hdfs:///var/log/spark/apps

spark.sql.hive.metastore.jars

The location of the JARs that should be used to instantiate HiveMetastoreClient

/usr/lib/hive/lib/*

spark.sql.hive.metastore.version

The Hive Metastore version

3.0.0

spark.driver.extraLibraryPath:

The path to extra native libraries for driver

/usr/lib/hadoop/lib/native/

spark.yarn.am.extraLibraryPath:

The path to extra native libraries for Application Master

/usr/lib/hadoop/lib/native/

spark.executor.extraLibraryPath

The path to extra native libraries for Executor

/usr/lib/hadoop/lib/native/

spark.yarn.appMasterEnv.HIVE_CONF_DIR

A directory on the Application Master with Hive configs required for running Hive in the cluster mode

/etc/spark/conf

spark.yarn.historyServer.allowTracking

Allows to use Spark History Server for tracking UI even if web UI is disabled for a job

True

spark.ssl.enabled

Defines whether to use SSL for Spark

false

spark.ssl.protocol

TLS protocol to be used. The protocol must be supported by JVM

TLSv1.2

spark.ssl.ui.port

The port where the SSL service will listen on

4040

spark.ssl.historyServer.port

The port to access History Server web UI

18082

spark.ssl.keyPassword

The password to the private key in the key store

 — 

spark.ssl.keyStore

The path to the keystore file

 — 

spark.ssl.keyStoreType

The type of the keystore

JKS

spark.ssl.trustStorePassword

The password to the truststore used by Spark

 — 

spark.ssl.trustStore

The path to the truststore file

 — 

spark.ssl.trustStoreType

The type of the truststore

JKS

spark.history.kerberos.enabled

Indicates whether the History Server should use Kerberos to login. This is required if the History Server is accessing HDFS files on a secure Hadoop cluster

false

spark.acls.enable

Enables Spark ACL

false

spark.modify.acls

Defines who has access to modify a running Spark application

spark,hdfs

spark.modify.acls.groups

A comma-separated list of user groups that have modify access to the Spark application

spark,hdfs

spark.history.ui.acls.enable

Specifies whether ACLs should be checked to authorize users viewing the applications in the History Server. If enabled, access control checks are performed regardless of what the individual applications had set for spark.ui.acls.enable. If disabled, no access control checks are made for any application UIs available through the History Server

false

spark.history.ui.admin.acls

A comma-separated list of users that have view access to all the Spark applications in History Server

spark,hdfs,dr.who

spark.history.ui.admin.acls.groups

A comma-separated list of groups that have view access to all the Spark applications in History Server

spark,hdfs,dr.who

spark.ui.view.acls

A comma-separated list of users that have view access to the Spark application. By default, only the user that started the Spark job has view access. Using * as a value means that any user can have view access to this Spark job

spark,hdfs,dr.who

spark.ui.view.acls.groups

A comma-separated list of groups that have view access to the Spark web UI to view the Spark Job details. This can be used if you have a set of administrators or developers or users who can monitor the Spark job submitted. Using * in the list means any user in any group can view the Spark job details on the Spark web UI. The user groups are obtained from the instance of the groups mapping provider specified by spark.user.groups.mapping

spark,hdfs,dr.who

Spark heap memory settings
Parameter Description Default value

Spark History Server Heap Memory

Sets initial (-Xms) and maximum (-Xmx) Java heap size for Spark History Server

1G

Spark Thrift Server Heap Memory

Sets initial (-Xms) and maximum (-Xmx) Java heap size for Spark Thrift Server

1G

Livy Server Heap Memory

Sets initial (-Xms) and maximum (-Xmx) Java heap size for Livy Server

-Xms300m -Xmx4G

livy.conf
Parameter Description Default value

livy.server.host

The host address to start the Livy server. By default, Livy will bind to all network interfaces

0.0.0.0

livy.server.port

The port to run the Livy server

8998

livy.spark.master

The Spark master to use for Livy sessions

yarn-cluster

livy.impersonation.enabled

Defines if Livy should impersonate users when creating a new session

false

livy.server.csrf-protection.enabled

Defines whether to enable the CSRF protection. If enabled, clients should add the X-Requested-By HTTP header for POST/DELETE/PUT/PATCH HTTP methods

true

livy.repl.enable-hive-context

Defines whether to enable HiveContext in the Livy interpreter. If set to true, hive-site.xml and the Livy server classpath will be detected on user request automatically

true

livy.server.recovery.mode

Sets the recovery mode for Livy

recovery

livy.server.recovery.state-store

Defines where Livy should store the state for recovery

filesystem

livy.server.recovery.state-store.url

For the filesystem state store, the path of the state store directory. Do not use a filesystem that does not support atomic rename like S3. For example: file:///tmp/livy or hdfs:///. For ZooKeeper, specify the address to the ZooKeeper servers. For example: host1:port1,host2:port2

/livy-recovery

livy.server.auth.type

Sets the Livy authentication type

 — 

livy.server.access_control.enabled

Defines whether to enable the access control for a Livy server. If set to true, then all the incoming requests will be checked if the requested user has permission

false

livy.server.access_control.users

Users allowed to access Livy. By default, any user is allowed to access Livy. If a user wants to limit the access, the user should list all the permitted users separated by a comma

livy,hdfs,spark

livy.superusers

A list of comma-separated users that have the permissions to change other user’s submitted session, like submitting statements, deleting session, and so on

livy,hdfs,spark

livy.keystore

A path to the keystore file. The path can be absolute or relative to the directory in which the process is started

 — 

livy.keystore.password

The password to access the keystore

 — 

livy.key-password

The password to access the key in the keystore

 — 

livy.server.thrift.ssl.protocol.blacklist

The list of banned TLS protocols

SSLv2,SSLv3,TLSv1,TLSv1.1

Other
Parameter Description Default value

Custom spark-defaults.conf

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file spark-defaults.conf

 — 

spark-env.sh

Enter the contents for the spark-env.sh file that is used to initialize environment variables on worker nodes

spark-env.sh

Custom livy.conf

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file livy.conf

 — 

livy-env.sh

Enter the contents for the livy-env.sh file that is used to prepare the environment for Livy startup

livy-env.sh

thriftserver-env.sh

Enter the contents for the thriftserver-env.sh file that is used to prepare the environment for Thrift server startup

thriftserver-env.sh

spark-history-env.sh

Enter the contents for the spark-history-env.sh file that is used to prepare the environment for History Server startup

spark-history-env.sh

Spark3

Common
Parameter Description Default value

Dynamic allocation (spark.dynamicAllocation.enabled)

Defines whether to use dynamic resource allocation that scales the number of executors, registered with this application, up and down, based on the workload

false

spark-defaults.conf
Parameter Description Default value

spark.yarn.archive

The archive containing all the required Spark JARs for distribution to the YARN cache. If set, this configuration replaces spark.yarn.jars and the archive is used in all the application containers. The archive should contain JAR files in its root directory. The archive can also be hosted on HDFS to speed up file distribution

hdfs:///apps/spark/spark3-yarn-archive.tgz

spark.yarn.historyServer.address

Spark History server address

 — 

spark.master

The cluster manager to connect to

yarn

spark.dynamicAllocation.enabled

Defines whether to use dynamic resource allocation that scales the number of executors, registered with this application, up and down, based on the workload

false

spark.shuffle.service.enabled

Enables the external shuffle service. This service preserves the shuffle files written by executors so that executors can be safely removed, or so that shuffle fetches can continue in the event of executor failure. The external shuffle service must be set up in order to enable it

false

spark.eventLog.enabled

Defines whether to log Spark events, useful for reconstructing the Web UI after the application has finished

true

spark.eventLog.dir

The base directory where Spark events are logged, if spark.eventLog.enabled=true. Within this base directory, Spark creates a sub-directory for each application, and logs the events specific to the application in this directory. You may want to set this to a unified location like an HDFS directory so history files can be read by the History Server

hdfs:///var/log/spark/apps

spark.dynamicAllocation.executorIdleTimeout

If dynamic allocation is enabled and an executor has been idle for more than this duration, the executor will be removed. For more details, see Spark documentation

120s

spark.dynamicAllocation.cachedExecutorIdleTimeout

If dynamic allocation is enabled and an executor which has cached data blocks has been idle for more than this duration, the executor will be removed. For more details, see Spark documentation

600s

spark.history.provider

The name of the class that implements the application history backend. Currently there is only one implementation provided with Spark that looks for application logs stored in the file system

org.apache.spark.deploy.history.FsHistoryProvider

spark.history.fs.cleaner.enabled

Specifies whether the History Server should periodically clean up event logs from storage

true

spark.history.store.path

A local directory where to cache application history data. If set, the History Server will store application data on disk instead of keeping it in memory. The data written to disk will be re-used in case of the History Server restart

/var/log/spark3/history

spark.serializer

The class used for serializing objects that will be sent over the network or need to be cached in the serialized form. By default, works with any Serializable Java object but it may be quite slow, so we recommend using org.apache.spark.serializer.KryoSerializer and configuring Kryo serialization when speed is necessary. Can be any subclass of org.apache.spark.Serializer

org.apache.spark.serializer.KryoSerializer

spark.driver.extraClassPath

Extra classpath entries to prepend to the classpath of the driver

/usr/lib/hive/lib/hive-shims-scheduler.jar:/usr/lib/hadoop-yarn/hadoop-yarn-server-resourcemanager.jar

spark.history.ui.port

The port number of the History Server web UI

18092

spark.ui.port

The port number of the Thrift Server web UI

4140

spark.history.fs.logDirectory

The log directory of the History Server

hdfs:///var/log/spark/apps

spark.sql.hive.metastore.jars

The location of the JARs that should be used to instantiate HiveMetastoreClient

path

spark.sql.hive.metastore.jars.path

A list of comma-separated paths to JARs used to instantiate HiveMetastoreClient

file:///usr/lib/hive/lib/*.jar

spark.sql.hive.metastore.version

The Hive Metastore version

3.1.2

spark.driver.extraLibraryPath

The path to extra native libraries for driver

/usr/lib/hadoop/lib/native/

spark.yarn.am.extraLibraryPath

The path to extra native libraries for Application Master

/usr/lib/hadoop/lib/native/

spark.executor.extraLibraryPath

The path to extra native libraries for Executor

/usr/lib/hadoop/lib/native/

spark.yarn.appMasterEnv.HIVE_CONF_DIR

A directory on the Application Master with Hive configs required for running Hive in the cluster mode

/etc/spark3/conf

spark.yarn.historyServer.allowTracking

Allows to use Spark History Server for tracking UI even if web UI is disabled for a job

True

spark.connect.grpc.binding.port

The port number to connect to Spark Connect via gRPC

15002

spark.history.kerberos.enabled

Indicates whether the History Server should use Kerberos to login. This is required if the History Server is accessing HDFS files on a secure Hadoop cluster

false

spark.acls.enable

Defines whether Spark ACLs should be enabled. If enabled, checks to see if the user has access permissions to view or modify the job. Note this requires the user to be known, so if the user comes across as null no checks are done. Filters can be used within the UI to authenticate and set the user

false

spark.modify.acls

Defines who has access to modify a running Spark application

spark,hdfs

spark.modify.acls.groups

A comma-separated list of user groups that have modify access to the Spark application

spark,hdfs

spark.history.ui.acls.enable

Specifies whether ACLs should be checked to authorize users viewing the applications in the History Server. If enabled, access control checks are performed regardless of what the individual applications had set for spark.ui.acls.enable. If disabled, no access control checks are made for any application UIs available through the History Server

false

spark.history.ui.admin.acls

A comma-separated list of users that have view access to all the Spark applications in History Server

spark,hdfs,dr.who

spark.history.ui.admin.acls.groups

A comma-separated list of groups that have view access to all the Spark applications in History Server

spark,hdfs,dr.who

spark.ui.view.acls

A comma-separated list of users that have view access to the Spark application. By default, only the user that started the Spark job has view access. Using * as a value means that any user can have view access to this Spark job

spark,hdfs,dr.who

spark.ui.view.acls.groups

A comma-separated list of groups that have view access to the Spark web UI to view the Spark Job details. This can be used if you have a set of administrators or developers or users who can monitor the Spark job submitted. Using * in the list means any user in any group can view the Spark job details on the Spark web UI. The user groups are obtained from the instance of the groups mapping provider specified by spark.user.groups.mapping

spark,hdfs,dr.who

spark.ssl.keyPassword

The password to the private key in the keystore

 — 

spark.ssl.keyStore

Path to the keystore file. The path can be absolute or relative to the directory in which the process is started

 — 

spark.ssl.keyStoreType

The type of keystore used

JKS

spark.ssl.trustStorePassword

The password to the private key in the truststore

 — 

spark.ssl.trustStoreType

The type of the truststore

JKS

spark.ssl.enabled

Defines whether to use SSL for Spark

 — 

spark.ssl.protocol

Defines the TLS protocol to use. The protocol must be supported by JVM

TLSv1.2

spark.ssl.ui.port

The port number used by Spark web UI in case of active SSL

4041

spark.ssl.historyServer.port

The port number used by Spark History Server web UI in case of active SSL

18092

livy.conf
Parameter Description Default value

livy.server.host

The host address to start the Livy server. By default, Livy will bind to all network interfaces

0.0.0.0

livy.server.port

The port to run the Livy server

8999

livy.spark.master

The Spark master to use for Livy sessions

yarn

livy.impersonation.enabled

Defines if Livy should impersonate users when creating a new session

true

livy.server.csrf-protection.enabled

Defines whether to enable the CSRF protection. If enabled, clients should add the X-Requested-By HTTP header for POST/DELETE/PUT/PATCH HTTP methods

true

livy.repl.enable-hive-context

Defines whether to enable HiveContext in the Livy interpreter. If set to true, hive-site.xml and the Livy server classpath will be detected on user request automatically

true

livy.server.recovery.mode

Sets the recovery mode for Livy

recovery

livy.server.recovery.state-store

Defines where Livy should store the state for recovery

filesystem

livy.server.recovery.state-store.url

For the filesystem state store, the path of the state store directory. Do not use a filesystem that does not support atomic rename like S3. For example: file:///tmp/livy or hdfs:///. For ZooKeeper, specify the address to the ZooKeeper servers. For example: host1:port1,host2:port2

/livy-recovery

livy.server.auth.type

Sets the Livy authentication type

 — 

livy.server.access_control.enabled

Defines whether to enable the access control for a Livy server. If set to true, then all the incoming requests will be checked if the requested user has permission

false

livy.server.access_control.users

Users allowed to access Livy. By default, any user is allowed to access Livy. If a user wants to limit the access, the user should list all the permitted users separated by a comma

livy,hdfs,spark

livy.superusers

A list of comma-separated users that have the permissions to change other user’s submitted sessions, for example, submitting statements, deleting the session, and so on

livy,hdfs,spark

livy.keystore

A path to the keystore file. The path can be absolute or relative to the directory in which the process is started

 — 

livy.keystore.password

The password to access the keystore

 — 

livy.key-password

The password to access the key in the keystore

 — 

livy.server.thrift.ssl.protocol.blacklist

The list of banned TLS protocols

SSLv2,SSLv3,TLSv1,TLSv1.1

thrift-server.conf
Parameter Description Default value

thrift.server.port

The port number used for communication with Spark3 Thrift Server

10116

Spark heap memory settings
Parameter Description Default value

Spark History Server Heap Memory

Sets the maximum Java heap size for Spark History Server

1G

Other
Parameter Description Default value

Custom spark-defaults.conf

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file spark-defaults.conf

 — 

Custom log4j2.properties

The contents of the log4j2.properties file used for logging the Spark3 activity

log4j2.properties

spark-env.sh

The contents of the spark-env.sh file used to initialize environment variables on worker nodes

spark-env.sh

Custom livy.conf

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file livy.conf

 — 

livy-env.sh

The contents of the livy-env.sh file used to initialize environment variables for the Livy server operation

livy-env.sh

spark-history-env.sh

The contents of the spark-history-env.sh file used to initialize environment variables for the Spark3 History Server operation

spark-history-env.sh

thriftserver-env.sh

The contents of the thriftserver-env.sh file used to initialize environment variables for the Spark3 Thrift Server operation

thriftserver-env.sh

SSM

Credentials Encryption
Parameter Description Default value

Credential provider path

The path to a keystore file used to encrypt credentials

jceks://file/etc/ssm/conf/ssm.jceks

Custom jceks

Set to true to use a custom JCEKS file. Set to false to use the auto-generated JCEKS keystore

false

Password file name

The name of the file that stores a password to access the keystore

ssm_credstore_pass

smart-site.xml
Parameter Description Default value

smart.hadoop.conf.path

The path to the Hadoop configuration directory

/etc/hadoop/conf

smart.conf.dir

The path to the SSM configuration directory

/etc/ssm/conf

smart.server.rpc.address

The RPC address of the SSM Server

0.0.0.0:7042

smart.server.http.address

The HTTP address (web UI) of the SSM Server

0.0.0.0:7045

smart.agent.master.address

The active SSM server’s address

<hostname>

smart.agent.address

Defines the address of SSM Agent components on each host

0.0.0.0

smart.agent.port

The port number used by SSM agents to communicate with the SSM Server

7048

smart.agent.master.port

The port number used by the SSM Server to communicate with SSM agents

7051

smart.ignore.dirs

A list of comma-separated HDFS directories to ignore. SSM will ignore all files under the given HDFS directories

 — 

smart.cover.dirs

A list of comma-separated HDFS directories where SSM scans for files. By default, all HDFS files are covered

 — 

smart.work.dir

The HDFS directory used by SSM as a working directory to store temporary files. SSM will ignore HDFS inotify events for all files under the working directory. Only one directory can be set

/system/ssm

smart.client.concurrent.report.enabled

Used to enable/disable concurrent reports for Smart Client. If enabled, Smart Client concurrently attempts to connect to multiple configured Smart Servers to find the active Smart Server, which is an optimization. Only the active Smart Server will respond to establish the connection. If the report has been successfully delivered to the active Smart Server, connection attempts to other Smart Servers are canceled

 — 

smart.server.rpc.handler.count

The number of RPC handlers on the server

80

smart.namespace.fetcher.batch

The batch size of the namespace fetcher. SSM fetches namespaces from the NameNode during the startup. Large namespaces may lead to long startup time. A larger batch size can speed up the fetcher efficiency and reduce the startup time

500

smart.namespace.fetcher.producers.num

The number of producers in the namespace fetcher

3

smart.namespace.fetcher.consumers.num

The number of consumers in the namespace fetcher

6

smart.rule.executors

The maximum number of rules that can be executed in parallel

5

smart.cmdlet.executors

The maximum number of cmdlets that can be executed in parallel

10

smart.dispatch.cmdlets.extra.num

The number of extra cmdlets dispatched by Smart Server

10

smart.cmdlet.dispatchers

The maximum number of cmdlet dispatchers that work in parallel

3

smart.cmdlet.mover.max.concurrent.blocks.per.srv.inst

The maximum number of file mover cmdlets that can be executed in parallel per SSM service. The 0 value removes the limit

0

smart.action.move.throttle.mb

The throughput limit (in MB) for the SSM move operation

0

smart.action.copy.throttle.mb

The throughput limit (in MB) for the SSM copy operation

0

smart.action.ec.throttle.mb

The throughput limit (in MB) for the SSM EC operation

0

smart.action.local.execution.disabled

Defines whether the active Smart Server can also execute actions like an agent. If set to true, the active SSM Server will NOT be able to execute actions. This configuration has no impact on a standby Smart Server

false

smart.cmdlet.max.num.pending

The maximum number of pending cmdlets in an SSM Server

20000

smart.cmdlet.hist.max.num.records

The maximum number of historic cmdlet records kept in an SSM server. SSM deletes the oldest cmdlets when this threshold is exceeded

100000

smart.cmdlet.hist.max.record.lifetime

The maximum lifetime of historic cmdlet records kept in an SSM server. The SSM Server deletes cmdlet records after the specified interval. Valid time units are day, hour, min, sec. The minimum update granularity is 5sec

30day

smart.cmdlet.cache.batch

The maximum batch size of the cmdlet batch insert

600

smart.copy.scheduler.base.sync.batch

The maximum batch size of the Copy Scheduler base sync batch insert

500

smart.file.diff.max.num.records

The maximum file diff records with useless state

10000

smart.status.report.period

The status report period for actions in milliseconds

10

smart.status.report.period.multiplier

The report period multiplied by this value defines the largest report interval

50

smart.status.report.ratio

If the finished actions ratio equals or exceeds this value, a status report will be triggered

0.2

smart.top.hot.files.num

The number of top hot files displayed in web UI

200

smart.cmdlet.dispatcher.log.disp.result

Defines whether to log dispatch results for each cmdlet dispatched

false

smart.cmdlet.dispatcher.log.disp.metrics.interval

The time interval in milliseconds to log statistic metrics of the cmdlet dispatcher. If no cmdlets were dispatched within this interval, no output is generated for this interval. The 0 value disables the logger

5000

smart.compression.codec

The default compression codec for SSM compression (Zlib, Lz4, Bzip2, snappy). You can also specify codecs as action arguments, which overrides this setting

Zlib

smart.compression.max.split

The maximum number of chunks split for compression

1000

smart.compact.batch.size

The maximum number of small files to be compacted by the compact action

200

smart.compact.container.file.threshold.mb

The maximum size of a container file in MB

1024

smart.access.count.day.tables.num

The maximum number of tables that can be created in the Metastore database to store the file access count per day

30

smart.access.count.hour.tables.num

The maximum number of tables that can be created in the Metastore database to store the file access count per hour

48

smart.access.count.minute.tables.num

The maximum number of tables that can be created in the Metastore database to store the file access count per minute

120

smart.access.count.second.tables.num

The maximum number of tables that can be created in the Metastore database to store the file access count per second

30

smart.access.event.fetch.interval.ms

The interval in milliseconds between access event fetches

1000

smart.cached.file.fetch.interval.ms

The interval in milliseconds between fetches of cached files from HDFS

5000

smart.namespace.fetch.interval.ms

The interval in milliseconds between namespace fetches from HDFS

1

smart.mover.scheduler.storage.report.fetch.interval.ms

The interval in milliseconds between fetches of storage reports from HDFS DataNodes in the mover scheduler

120000

smart.metastore.small-file.insert.batch.size

The maximum size of the Metastore insert batch with information about small files

200

smart.agent.master.ask.timeout.ms

The maximum time in milliseconds for a Smart Agent to wait for a response from the Smart Server during the submission action

5000

smart.ignore.path.templates

A list of comma-separated regex templates of HDFS paths to be completely ignored by SSM

 — 

smart.internal.path.templates

A list of comma-separated regex templates of internal files to be completely ignored by SSM

.*/\..*,.*/__.*,.*_COPYING_.*

smart.security.enable

Enables Kerberos authentication for SSM

false

smart.server.keytab.file

The path to the SSM Server’s keytab file

 — 

smart.server.kerberos.principal

The SSM Server’s Kerberos principal

 — 

smart.agent.keytab.file

The path to the SSM Agent’s keytab file

 — 

smart.agent.kerberos.principal

The SSM Agent’s Kerberos principal

 — 

Druid configuration
Parameter Description Default value

db_url

The URL to the Metastore database

jdbc:postgresql://{{ groups['adpg.adpg'][0] | d(omit) }}:5432/ssm

db_user

The user name to connect to the database

ssm

db_password

The user password to connect to the database

 — 

initialSize

The initial number of connections created when the pool is started

10

minIdle

The minimum number of established connections that should be kept in the pool at all times. The connection pool can shrink below this number if validation queries fail

4

maxActive

The maximum number of active connections that can be allocated from this pool at the same time

50

maxWait

The maximum time in milliseconds the pool will wait (when there are no available connections) for a connection to be returned before throwing an exception

60000

timeBetweenEvictionRunsMillis

The time in milliseconds to sleep between the runs of the idle connection validation/cleaner thread. This value should not be set less than 1 second. It specifies how often to check for idle and abandoned connections, and how often to validate idle connections

90000

minEvictableIdleTimeMillis

The minimum amount of time an object may remain idle in the pool before it is eligible for eviction

300000

validationQuery

The SQL query used to validate connections from the pool before returning them to the caller

SELECT 1

testWhileIdle

Indicates whether connection objects are validated by the idle object evictor (if any)

true

testOnBorrow

Indicates whether objects are validated before being borrowed from the pool

false

testOnReturn

Indicates whether objects are validated before being returned to the pool

false

poolPreparedStatements

Enables the prepared statement pooling

true

maxPoolPreparedStatementPerConnectionSize

The maximum number of prepared statements that can be pooled per connection

30

removeAbandoned

A flag to remove abandoned connections if they exceed removeAbandonedTimeout

true

removeAbandonedTimeout

The timeout in seconds before an abandoned (in use) connection can be removed

180

logAbandoned

A flag to log stack traces for application code which abandoned a connection. Logging of abandoned connections adds extra overhead for every borrowed connection

true

filters

Sets the filters that are applied to the data source

stat

smart-env.sh
Parameter Description Default value

LD_LIBRARY_PATH

The path to extra native libraries for SSM

/usr/lib/hadoop/lib/native

HADOOP_HOME

The path to the Hadoop home directory

/usr/lib/hadoop

Other
Parameter Description Default value

Enable SmartFileSystem for Hadoop

When enabled, requests from different clients (Spark, HDFS, Hive, etc.) are taken into account when calculating AccessCount for files. Otherwise, the AccessCount value gets incremented only when a file is accessed from SSM

false

log4j.properties

The contents of the log4j.properties configuration file

 — 

zeppelin-site.xml

The contents of the zeppelin-site.xml configuration file. SSM uses a Zeppelin configuration for web UI

 — 

Sqoop

sqoop-site.xml
Parameter Description Default value

sqoop.metastore.client.autoconnect.url

The connection string to use when connecting to a job-management metastore. If not set, uses ~/.sqoop/

 — 

sqoop.metastore.server.location

The path to the shared metastore database files. If not set, uses ~/.sqoop/

/srv/sqoop/metastore.db

sqoop.metastore.server.port

The port that this metastore should listen on

16100

sqoop-metastore-env.sh
Parameter Description Default value

HADOOP_OPTS

Sets initial (-Xms) and maximum (-Xmx) Java heap size for Sqoop

-Xms800M -Xmx10G

Other
Parameter Description Default value

Custom sqoop-site.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file sqoop-site.xml

 — 

Custom sqoop-metastore-env.sh

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file sqoop-metastore-env.sh

 — 

YARN

mapred-site.xml
Parameter Description Default value

mapreduce.application.classpath

The CLASSPATH for MapReduce applications. A comma-separated list of CLASSPATH entries. If mapreduce.application.framework is set, then this must specify the appropriate CLASSPATH for that archive, and the name of the archive must be present in the CLASSPATH. If mapreduce.app-submission.cross-platform is false, platform-specific environment variable expansion syntax would be used to construct the default CLASSPATH entries. If mapreduce.app-submission.cross-platform is true, platform-agnostic default CLASSPATH for MapReduce applications would be used:

{{HADOOP_MAPRED_HOME}}/share/hadoop/mapreduce/*, {{HADOOP_MAPRED_HOME}}/share/hadoop/mapreduce/lib/*

Parameter expansion marker will be replaced by NodeManager on container launch, based on the underlying OS accordingly

/etc/hadoop/conf/*:/usr/lib/hadoop/*:/usr/lib/hadoop/lib/*:/usr/lib/hadoop-hdfs/*:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-yarn/*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-mapreduce/*:/usr/lib/hadoop-mapreduce/lib/*

mapreduce.cluster.local.dir

The local directory where MapReduce stores intermediate data files. May be a comma-separated list of directories on different devices in order to spread disk I/O. Directories that do not exist, are ignored

/srv/hadoop-yarn/mr-local

mapreduce.framework.name

The runtime framework for executing MapReduce jobs. Can be one of local, classic, or yarn

yarn

mapreduce.jobhistory.address

MapReduce JobHistory Server IPC (<host>:<port>)

 — 

mapreduce.jobhistory.bind-host

Setting the value to 0.0.0.0 will cause the MapReduce daemons to listen on all addresses and interfaces of the hosts in the cluster

0.0.0.0

mapreduce.jobhistory.webapp.address

MapReduce JobHistory Server Web UI (<host>:<port>)

 — 

mapreduce.map.env

Environment variables for the map task processes added by a user, specified as a comma separated list. Example: VAR1=value1,VAR2=value2

HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce

mapreduce.reduce.env

Environment variables for the reduce task processes added by a user, specified as a comma separated list. Example: VAR1=value1,VAR2=value2

HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce

yarn.app.mapreduce.am.env

Environment variables for the MapReduce App Master processes added by a user. Examples:

  • A=foo. This sets the environment variable A to foo.

  • B=$B:c. This inherits the tasktracker B environment variable.

HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce

yarn.app.mapreduce.am.staging-dir

The staging directory used while submitting jobs

/user

mapreduce.jobhistory.keytab

The location of the Kerberos keytab file for the MapReduce JobHistory Server

/etc/security/keytabs/mapreduce-historyserver.service.keytab

mapreduce.jobhistory.principal

Kerberos principal name for the MapReduce JobHistory Server

mapreduce-historyserver/_HOST@REALM

mapreduce.jobhistory.http.policy

Configures the HTTP endpoint for JobHistoryServer web UI. The following values are supported:

  • HTTP_ONLY — provides service only via HTTP;

  • HTTPS_ONLY — provides service only via HTTPS.

HTTP_ONLY

mapreduce.jobhistory.webapp.https.address

The HTTPS address where MapReduce JobHistory Server WebApp is running

0.0.0.0:19890

mapreduce.shuffle.ssl.enabled

Defines whether to use SSL for for the Shuffle HTTP endpoints

false

ranger-yarn-audit.xml
Parameter Description Default value

xasecure.audit.destination.solr.batch.filespool.dir

The spool directory path

/srv/ranger/hdfs_plugin/audit_solr_spool

xasecure.audit.destination.solr.urls

Leave this property value empty or set it to NONE when using ZooKeeper to connect to Solr

 — 

xasecure.audit.destination.solr.zookeepers

Specifies the ZooKeeper connection string for the Solr destination

 — 

xasecure.audit.destination.solr.force.use.inmemory.jaas.config

Uses in-memory JAAS configuration file to connect to Solr

 — 

xasecure.audit.is.enabled

Enables Ranger audit

true

xasecure.audit.jaas.Client.loginModuleControlFlag

Specifies whether the success of the module is required, requisite, sufficient, or optional

 — 

xasecure.audit.jaas.Client.loginModuleName

The name of the authenticator class

 — 

xasecure.audit.jaas.Client.option.keyTab

The name of the keytab file to get the principal’s secret key

 — 

xasecure.audit.jaas.Client.option.principal

The name of the principal to be used

 — 

xasecure.audit.jaas.Client.option.serviceName

Represents a user or a service that wants to log in

 — 

xasecure.audit.jaas.Client.option.storeKey

Set this to true if you want the keytab or the principal’s key to be stored in the subject’s private credentials

false

xasecure.audit.jaas.Client.option.useKeyTab

Set this to true if you want the module to get the principal’s key from the keytab

false

ranger-yarn-security.xml
Parameter Description Default value

ranger.plugin.yarn.policy.rest.url

The URL to Ranger Admin

 — 

ranger.plugin.yarn.service.name

The name of the Ranger service containing policies for this instance

 — 

ranger.plugin.yarn.policy.cache.dir

The directory where Ranger policies are cached after successful retrieval from the source

/srv/ranger/yarn/policycache

ranger.plugin.yarn.policy.pollIntervalMs

Defines how often to poll for changes in policies

30000

ranger.plugin.yarn.policy.rest.client.connection.timeoutMs

The YARN Plugin RangerRestClient connection timeout (in milliseconds)

120000

ranger.plugin.yarn.policy.rest.client.read.timeoutMs

The YARN Plugin RangerRestClient read timeout (in milliseconds)

30000

ranger.add-yarn-authorization

Set true to use only Ranger ACLs (i.e. ignore YARN ACLs)

false

ranger.plugin.yarn.policy.rest.ssl.config.file

The path to the RangerRestClient SSL config file for the YARN plugin

/etc/yarn/conf/ranger-yarn-policymgr-ssl.xml

yarn-site.xml
Parameter Description Default value

yarn.application.classpath

The CLASSPATH for YARN applications. A comma-separated list of CLASSPATH entries. When this value is empty, the following default CLASSPATH for YARN applications would be used.

  • For Linux:

    $HADOOP_CONF_DIR, $HADOOP_COMMON_HOME/share/hadoop/common/*, $HADOOP_COMMON_HOME/share/hadoop/common/lib/*, $HADOOP_HDFS_HOME/share/hadoop/hdfs/*, $HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*, $HADOOP_YARN_HOME/share/hadoop/yarn/*, $HADOOP_YARN_HOME/share/hadoop/yarn/lib/*
  • For Windows:

    %HADOOP_CONF_DIR%, %HADOOP_COMMON_HOME%/share/hadoop/common/*, %HADOOP_COMMON_HOME%/share/hadoop/common/lib/*, %HADOOP_HDFS_HOME%/share/hadoop/hdfs/*, %HADOOP_HDFS_HOME%/share/hadoop/hdfs/lib/*, %HADOOP_YARN_HOME%/share/hadoop/yarn/*, %HADOOP_YARN_HOME%/share/hadoop/yarn/lib/*
/etc/hadoop/conf/*:/usr/lib/hadoop/*:/usr/lib/hadoop/lib/*:/usr/lib/hadoop-hdfs/*:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-yarn/*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-mapreduce/*:/usr/lib/hadoop-mapreduce/lib/*

yarn.cluster.max-application-priority

Defines the maximum application priority in a cluster. Leaf Queue-level priority: each leaf queue provides default priority by the administrator. The queue default priority will be used for any application submitted without a specified priority. $HADOOP_HOME/etc/hadoop/capacity-scheduler.xml is the configuration file for queue-level priority

0

yarn.log.server.url

The URL for log aggregation Server

 — 

yarn.log-aggregation-enable

Whether to enable log aggregation. Log aggregation collects logs from each container and moves these logs onto a file system, for example HDFS, after the application processing completes. Users can configure the yarn.nodemanager.remote-app-log-dir and yarn.nodemanager.remote-app-log-dir-suffix properties to determine, where these logs are moved to. Users can access the logs via the Application Timeline Server

true

yarn.log-aggregation.retain-seconds

Defines how long to keep aggregation logs before deleting them. The value of -1 disables logs saving. Be careful: setting this value too small will spam the NameNode

172800

yarn.nodemanager.local-dirs

The list of directories to store localized. An application localized file directory will be found in: ${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}. Individual containers work directories, called container_${contid}, will be subdirectories of this

/srv/hadoop-yarn/nm-local

yarn.node-labels.enabled

Enables node labels feature

true

yarn.node-labels.fs-store.root-dir

The URI for NodeLabelManager. The default value is /tmp/hadoop-yarn-${user}/node-labels/ in the local filesystem

hdfs:///system/yarn/node-labels

yarn.timeline-service.bind-host

The actual address the server will bind to. If this optional address is set, the RPC and Webapp servers will bind to this address and the port, specified in yarn.timeline-service.address and yarn.timeline-service.webapp.address, respectively. This is most useful for making the service listen to all interfaces by setting to 0.0.0.0

0.0.0.0

yarn.timeline-service.leveldb-timeline-store.path

Stores file name for leveldb Timeline store

/srv/hadoop-yarn/leveldb-timeline-store

yarn.nodemanager.address

The address of the container manager in the NodeManager

0.0.0.0:8041

yarn.nodemanager.aux-services

A comma-separated list of services, where service name should only contain a-zA-Z0-9_ and cannot start with numbers

mapreduce_shuffle,spark2_shuffle,spark_shuffle

yarn.nodemanager.aux-services.mapreduce_shuffle.class

The auxiliary service class to use

org.apache.hadoop.mapred.ShuffleHandler

yarn.nodemanager.aux-services.spark2_shuffle.class

The class name of YarnShuffleService — an external shuffle service for Spark 2 on YARN

org.apache.spark.network.yarn.YarnShuffleService

yarn.nodemanager.aux-services.spark2_shuffle.classpath

The path to YarnShuffleService — an external shuffle service for Spark 2 on YARN

/usr/lib/spark/yarn/lib/*

yarn.nodemanager.aux-services.spark_shuffle.class

The class name of YarnShuffleService — an external shuffle service for Spark 3 on YARN

org.apache.spark.network.yarn.YarnShuffleService

yarn.nodemanager.aux-services.spark_shuffle.classpath

The path to YarnShuffleService — an external shuffle service for Spark 3 on YARN

/usr/lib/spark3/yarn/lib/*

yarn.nodemanager.recovery.enabled

Enables the NodeManager to recover after starting

true

yarn.nodemanager.recovery.dir

The local filesystem directory, in which the NodeManager will store state, when recovery is enabled

/srv/hadoop-yarn/nm-recovery

yarn.nodemanager.remote-app-log-dir

Defines a directory for logs aggregation

/logs

yarn.nodemanager.resource-plugins

Enables additional discovery/isolation of resources on the NodeManager. By default, this parameters is empty. Acceptable values: yarn.io/gpu, yarn.io/fpga

 — 

yarn.nodemanager.resource-plugins.gpu.path-to-discovery-executables

When yarn.nodemanager.resource.gpu.allowed-gpu-devices=auto, YARN NodeManager needs to run GPU discovery binary (now only support nvidia-smi) to get GPU-related information. When value is empty (default), YARN NodeManager will try to locate discovery executable itself. An example of the config value is: /usr/local/bin/nvidia-smi

/usr/bin/nvidia-smi

yarn.nodemanager.resource.detect-hardware-capabilities

Enables auto-detection of node capabilities such as memory and CPU

true

yarn.nodemanager.vmem-check-enabled

Whether virtual memory limits will be enforced for containers

false

yarn.resource-types

The resource types to be used for scheduling. Use resource-types.xml to specify details about the individual resource types

 — 

yarn.resourcemanager.bind-host

The actual address, the server will bind to. If this optional address is set, the RPC and Webapp servers will bind to this address and the port, specified in yarn.resourcemanager.address and yarn.resourcemanager.webapp.address, respectively. This is most useful for making Resource Manager listen to all interfaces by setting to 0.0.0.0

0.0.0.0

yarn.resourcemanager.cluster-id

The name of the cluster. In the High Availability mode, this parameter is used to ensure that Resource Manager participates in leader election for this cluster and ensures that it does not affect other clusters

 — 

yarn.resource-types.memory-mb.increment-allocation

The FairScheduler grants memory equal to increments of this value. If you submit a task with a resource request which is not a multiple of memory-mb.increment-allocation, the request will be rounded up to the nearest increment

1024

yarn.resource-types.vcores.increment-allocation

The FairScheduler grants vcores in increments of this value. If you submit a task with resource request, that is not a multiple of vcores.increment-allocation, the request will be rounded up to the nearest increment

1

yarn.resourcemanager.ha.enabled

Enables Resource Manager High Availability. When enabled:

  • The Resource Manager starts in the Standby mode by default, and transitions to the Active mode when prompted to.

  • The nodes in the Resource Manager ensemble are listed in yarn.resourcemanager.ha.rm-ids.

  • The id of each Resource Manager either comes from yarn.resourcemanager.ha.id, if yarn.resourcemanager.ha.id is explicitly specified, or can be figured out by matching yarn.resourcemanager.address.{id} with local address.

  • The actual physical addresses come from the configs of the pattern {rpc-config}.{id}.

false

yarn.resourcemanager.ha.rm-ids

The list of Resource Manager nodes in the cluster when the High Availability is enabled. See description of yarn.resourcemanager.ha.enabled for full details on how this is used

 — 

yarn.resourcemanager.hostname

The host name of the Resource Manager

 — 

yarn.resourcemanager.leveldb-state-store.path

The Local path, where the Resource Manager state will be stored, when using org.apache.hadoop.yarn.server.resourcemanager.recovery.LeveldbRMStateStore as the value for yarn.resourcemanager.store.class

/srv/hadoop-yarn/leveldb-state-store

yarn.resourcemanager.monitor.capacity.queue-management.monitoring-interval

The time between invocations of this QueueManagementDynamicEditPolicy policy (in milliseconds)

1500

yarn.resourcemanager.reservation-system.enable

Enables the ReservationSystem in the ResourceManager

false

yarn.resourcemanager.reservation-system.planfollower.time-step

The frequency of the PlanFollower timer (in milliseconds). A large value is expected

1000

Resource scheduler

The type of a pluggable scheduler for Hadoop. Available values: CapacityScheduler and FairScheduler. CapacityScheduler allows for multiple-tenants to securely share a large cluster such that their applications are allocated resources in a timely manner under constraints of allocated capacities. FairScheduler allows YARN applications to share resources in large clusters fairly

CapacityScheduler

yarn.resourcemanager.scheduler.monitor.enable

Enables a set of periodic monitors (specified in yarn.resourcemanager.scheduler.monitor.policies) that affect the Scheduler

false

yarn.resourcemanager.scheduler.monitor.policies

The list of SchedulingEditPolicy classes that interact with the Scheduler. A particular module may be incompatible with the Scheduler, other policies, or a configuration of either

org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy

yarn.resourcemanager.monitor.capacity.preemption.observe_only

If set to true, run the policy but do not affect the cluster with preemption and kill events

false

yarn.resourcemanager.monitor.capacity.preemption.monitoring_interval

The time between invocations of this ProportionalCapacityPreemptionPolicy policy (in milliseconds)

3000

yarn.resourcemanager.monitor.capacity.preemption.max_wait_before_kill

The time between requesting a preemption from an application and killing the container (in milliseconds)

15000

yarn.resourcemanager.monitor.capacity.preemption.total_preemption_per_round

The maximum percentage of resources, preempted in a single round. By controlling this value one can throttle the pace, at which containers are reclaimed from the cluster. After computing the total desired preemption, the policy scales it back within this limit

0.1

yarn.resourcemanager.monitor.capacity.preemption.max_ignored_over_capacity

The maximum amount of resources above the target capacity ignored for preemption. This defines a deadzone around the target capacity, that helps to prevent thrashing and oscillations around the computed target balance. High values would slow the time to capacity and (absent natural.completions) it might prevent convergence to guaranteed capacity

0.1

yarn.resourcemanager.monitor.capacity.preemption.natural_termination_factor

Given a computed preemption target, account for containers naturally expiring and preempt only this percentage of the delta. This determines the rate of geometric convergence into the deadzone (MAX_IGNORED_OVER_CAPACITY). For example, a termination factor of 0.5 will reclaim almost 95% of resources within 5 * #WAIT_TIME_BEFORE_KILL, even absent natural termination

0.2

yarn.resourcemanager.nodes.exclude-path

The path to the file with nodes to exclude

/etc/hadoop/conf/exclude-path.xml

yarn.resourcemanager.nodes.include-path

The path to the file with nodes to include

/etc/hadoop/conf/include-path

yarn.resourcemanager.recovery.enabled

Enables Resource Manager to recover state after starting. If set to true, then yarn.resourcemanager.store.class must be specified

true

yarn.resourcemanager.store.class

The class to use as the persistent store. If org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore is used, the store is implicitly fenced; meaning a single Resource Manager is able to use the store at any point in time. More details on this implicit fencing, along with setting up appropriate ACLs is discussed under yarn.resourcemanager.zk-state-store.root-node.acl

 — 

yarn.resourcemanager.system-metrics-publisher.enabled

The setting that controls whether YARN system metrics are published on the Timeline Server or not by Resource Manager

true

yarn.scheduler.fair.user-as-default-queue

Defines whether to use the username, associated with the allocation as the default queue name, in the event, that a queue name is not specified. If this is set to false or unset, all jobs have a shared default queue, named default. Defaults to true. If a queue placement policy is given in the allocations file, this property is ignored

true

yarn.scheduler.fair.preemption

Defines whether to use preemption

false

yarn.scheduler.fair.preemption.cluster-utilization-threshold

The utilization threshold after which the preemption kicks in. The utilization is computed as the maximum ratio of usage to capacity among all resources

0.8f

yarn.scheduler.fair.sizebasedweight

Defines whether to assign shares to individual apps based on their size, rather than providing an equal share to all apps regardless of size. When set to true, apps are weighted by the natural logarithm of one plus the app total requested memory, divided by the natural logarithm of 2

false

yarn.scheduler.fair.assignmultiple

Defines whether to allow multiple container assignments in one heartbeat

false

yarn.scheduler.fair.dynamic.max.assign

If assignmultiple is true, this parameter specifies whether to dynamically determine the amount of resources that can be assigned in one heartbeat. When turned on, about half of the non-allocated resources on the node are allocated to containers in a single heartbeat

true

yarn.scheduler.fair.max.assign

If assignmultiple is true, the maximum amount of containers that can be assigned in one heartbeat. Defaults to -1, which sets no limit

-1

yarn.scheduler.fair.locality.threshold.node

For applications that request containers on particular nodes, this parameter defines the number of scheduling opportunities since the last container assignment to wait before accepting a placement on another node. Expressed as a floating number between 0 and 1, which, as a fraction of the cluster size, is the number of scheduling opportunities to pass up. The default value of -1.0 means not to pass up any scheduling opportunities

-1.0

yarn.scheduler.fair.locality.threshold.rack

For applications, that request containers on particular racks, the number of scheduling opportunities since the last container assignment to wait before accepting a placement on another rack. Expressed as a floating point between 0 and 1, which, as a fraction of the cluster size, is the number of scheduling opportunities to pass up. The default value of -1.0 means not to pass up any scheduling opportunities

-1.0

yarn.scheduler.fair.allow-undeclared-pools

If set to true, new queues can be created at application submission time, whether because they are specified as the application queue by the submitter or because they are placed there by the user-as-default-queue property. If set to false, any time an app would be placed in a queue that is not specified in the allocations file, it is placed in the default queue instead. Defaults to true. If a queue placement policy is given in the allocations file, this property is ignored

true

yarn.scheduler.fair.update-interval-ms

The time interval, at which to lock the scheduler and recalculate fair shares, recalculate demand, and check whether anything is due for preemption

500

yarn.scheduler.minimum-allocation-mb

The minimum allocation for every container request at the Resource Manager (in MB). Memory requests, lower than this, will throw InvalidResourceRequestException

1024

yarn.scheduler.maximum-allocation-mb

The maximum allocation for every container request at the Resource Manager (in MB). Memory requests, higher than this, will throw InvalidResourceRequestException

4096

yarn.scheduler.minimum-allocation-vcores

The minimum allocation for every container request at the Resource Manager, in terms of virtual CPU cores. Requests, lower than this, will throw InvalidResourceRequestException

1

yarn.scheduler.maximum-allocation-vcores

The maximum allocation for every container request at the Resource Manager, in terms of virtual CPU cores. Requests, higher than this, will throw InvalidResourceRequestException

2

yarn.timeline-service.enabled

On the server side this parameter indicates, whether Timeline service is enabled or not. And on the client side, this parameter can be used to indicate whether client wants to use Timeline service. If this parameter is set on the client side along with security, then YARN Client tries to fetch the delegation tokens for the Timeline Server

true

yarn.timeline-service.hostname

The hostname of the Timeline service Web application

 — 

yarn.timeline-service.http-cross-origin.enabled

Enables cross origin support (CORS) for Timeline Server

true

yarn.webapp.ui2.enable

In the Server side it indicates, whether the new YARN UI v2 is enabled or not

true

yarn.resourcemanager.proxy-user-privileges.enabled

If set to true, ResourceManager will have proxy-user privileges. For example: in a secure cluster, YARN requires the user hdfs delegation-tokens to do localization and log-aggregation on behalf of the user. If this is set to true, ResourceManager is able to request new hdfs delegation tokens on behalf of the user. This is needed by long-running-services, because the hdfs tokens will eventually expire and YARN requires new valid tokens to do localization and log-aggregation. Note that to enable this use case, the corresponding HDFS NameNode must have ResourceManager configured as a proxy-user so that ResourceManager can itself ask for new tokens on behalf of the user when tokens are past their max-life-time

false

yarn.resourcemanager.webapp.spnego-principal

The Kerberos principal to be used for SPNEGO filter for the Resource Manager web UI

HTTP/_HOST@REALM

yarn.resourcemanager.webapp.spnego-keytab-file

The Kerberos keytab file to be used for SPNEGO filter for the Resource Manager web UI

/etc/security/keytabs/HTTP.service.keytab

yarn.nodemanager.linux-container-executor.group

The UNIX group that the linux-container-executor should run as

yarn

yarn.resourcemanager.webapp.delegation-token-auth-filter.enabled

A flag to enable override of the default Kerberos authentication filter with the RM authentication filter to allow authentication using delegation tokens (fallback to Kerberos if the tokens are missing). Only applicable when the http authentication type is kerberos

false

yarn.resourcemanager.principal

The Kerberos principal for the Resource Manager

yarn-resourcemanager/_HOST@REALM

yarn.resourcemanager.keytab

The keytab for the Resource Manager

/etc/security/keytabs/yarn-resourcemanager.service.keytab

yarn.resourcemanager.webapp.https.address

The https address of the Resource Manager web application. If only a host is provided as the value, the webapp will be served on a random port

${yarn.resourcemanager.hostname}:8090

yarn.nodemanager.principal

The Kerberos principal for the NodeManager

yarn-nodemanager/_HOST@REALM

yarn.nodemanager.keytab

Keytab for NodeManager

/etc/security/keytabs/yarn-nodemanager.service.keytab

yarn.nodemanager.webapp.spnego-principal

The Kerberos principal to be used for SPNEGO filter for the NodeManager web interface

HTTP/_HOST@REALM

yarn.nodemanager.webapp.spnego-keytab-file

The Kerberos keytab file to be used for SPNEGO filter for the NodeManager web interface

/etc/security/keytabs/HTTP.service.keytab

yarn.nodemanager.webapp.cross-origin.enabled

A flag to enable cross-origin (CORS) support in the NodeManager. This flag requires the CORS filter initializer to be added to the filter initializers list in core-site.xml

false

yarn.nodemanager.webapp.https.address

The HTTPS address of the NodeManager web application

0.0.0.0:8044

yarn.timeline-service.http-authentication.type

Defines the authentication used for the Timeline Server HTTP endpoint. Supported values are: simple, kerberos, #AUTHENTICATION_HANDLER_CLASSNAME#

simple

yarn.timeline-service.http-authentication.simple.anonymous.allowed

Indicates if anonymous requests are allowed by the Timeline Server when using simple authentication

true

yarn.timeline-service.http-authentication.kerberos.keytab

The Kerberos keytab to be used for the Timeline Server (Collector/Reader) HTTP endpoint

/etc/security/keytabs/HTTP.service.keytab

yarn.timeline-service.http-authentication.kerberos.principal

The Kerberos principal to be used for the Timeline Server (Collector/Reader) HTTP endpoint

HTTP/_HOST@REALM

yarn.timeline-service.principal

The Kerberos principal for the timeline reader. NodeManager principal would be used for timeline collector as it runs as an auxiliary service inside NodeManager

yarn/_HOST@REALM

yarn.timeline-service.keytab

The Kerberos keytab for the timeline reader. NodeManager keytab would be used for timeline collector as it runs as an auxiliary service inside NodeManager

/etc/security/keytabs/yarn.service.keytab

yarn.timeline-service.delegation.key.update-interval

The update interval for delegation keys

86400000

yarn.timeline-service.delegation.token.renew-interval

The time to renew delegation tokens

86400000

yarn.timeline-service.delegation.token.max-lifetime

The maxim token lifetime

86400000

yarn.timeline-service.client.best-effort

Defines, whether a failure to obtain a delegation token should be considered as an application failure (false), or the client should attempt to continue to publish information without it (true)

false

yarn.timeline-service.webapp.https.address

The HTTPS address of the Timeline service web application

${yarn.timeline-service.hostname}:8190

yarn.http.policy

This configures the HTTP endpoint for Yarn Daemons. The following values are supported:

  • HTTP_ONLY — provides service only via HTTP;

  • HTTPS_ONLY — provides service only via HTTPS.

HTTP_ONLY

yarn.nodemanager.container-executor.class

The name of the container-executor Java class

org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor

container-executor.cfg
CAUTION

In AstraLinux, regular user UIDs can start from 100. For YARN to work correctly on AstraLinux, set the min.user.id parameter value to 100.

Parameter Description Default value

banned.users

A comma-separated list of users who cannot run applications

bin

min.user.id

Prevents other super-users

500

YARN heap memory settings
Parameter Description Default value

ResourceManager Heap Memory

Sets initial (-Xms) and maximum (-Xmx) Java heap size for Resource Manager

-Xms1G -Xmx8G

NodeManager Heap Memory

Sets initial (-Xms) and maximum (-Xmx) Java heap size for NodeManager

 — 

Timelineserver Heap Memory

Sets initial (-Xms) and maximum (-Xmx) Java heap size for Timeline server

-Xms700m -Xmx8G

History server Heap Memory

Sets initial (-Xms) and maximum (-Xmx) Java heap size for History server

-Xms700m -Xmx8G

Lists of decommissioned hosts
Parameter Description Default value

DECOMMISSIONED

The list of hosts in the DECOMMISSIONED state

 — 

ranger-yarn-policymgr-ssl.xml
Parameter Description Default value

xasecure.policymgr.clientssl.keystore

The path to the keystore file used by Ranger

 — 

xasecure.policymgr.clientssl.keystore.credential.file

The path to the keystore credentials file

/etc/yarn/conf/ranger-yarn.jceks

xasecure.policymgr.clientssl.truststore.credential.file

The path to the truststore credentials file

/etc/yarn/conf/ranger-yarn.jceks

xasecure.policymgr.clientssl.truststore

The path to the truststore file used by Ranger

 — 

xasecure.policymgr.clientssl.keystore.password

The password to the keystore file

 — 

xasecure.policymgr.clientssl.truststore.password

The password to the truststore file

 — 

Other
Parameter Description Default value

GPU on YARN

Defines, whether to use GPU on YARN

false

capacity-scheduler.xml

The content of capacity-scheduler.xml, which is used by CapacityScheduler

fair-scheduler.xml

The content of fair-scheduler.xml, which is used by FairScheduler

Custom mapred-site.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file mapred-site.xml

 — 

Ranger plugin enabled

Whether or not Ranger plugin is enabled

false

Custom yarn-site.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file yarn-site.xml

 — 

Custom ranger-yarn-audit.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ranger-yarn-audit.xml

 — 

Custom ranger-yarn-security.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ranger-yarn-security.xml

 — 

Custom ranger-yarn-policymgr-ssl.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ranger-yarn-policymgr-ssl.xml

 — 

Zeppelin

User-managed interpreters
Parameter Description Default value

Allow user-managed interpreters

Allows to use Zeppelin interpreters with the user-managed=true property. If selected, ADCM will preserve custom user properties when restarting Zeppelin

True

Custom interpreter.json

Allows to provide a custom JSON definition of interpreters to be available in the Zeppelin web UI. Defining interpreters in this way overwrites all (both user and system) interpreters settings

interpreters.json

Custom interpreter.sh

Allows to provide custom contents of the interpreter.sh script. This script is invoked on the Zeppelin startup and is used to prepare the environment for proper Zeppelin operation

interpreters.sh

zeppelin-site.xml
Parameter Description Default value

zeppelin.dep.localrepo

The local repository for the dependency loader

/srv/zeppelin/local-repo

zeppelin.server.port

The server port

8180

zeppelin.server.kerberos.principal

The principal name to load from the keytab

 — 

zeppelin.server.kerberos.keytab

The path to the keytab file

 — 

zeppelin.shell.auth.type

Sets the authentication type. Possible values are SIMPLE and KERBEROS

 — 

zeppelin.shell.principal

The principal name to load from the keytab

 — 

zeppelin.shell.keytab.location

The path to the keytab file

 — 

zeppelin.jdbc.auth.type

Sets the authentication type. Possible values are SIMPLE and KERBEROS

 — 

zeppelin.jdbc.keytab.location

The path to the keytab file

 — 

zeppelin.jdbc.principal

The principal name to load from the keytab

 — 

zeppelin.jdbc.auth.kerberos.proxy.enable

When the KERBEROS authentication type is used, this parameter enables/disables proxy with the login user to get the connection

true

spark.yarn.keytab

The full path to the file that contains the keytab for the principal. This keytab will be copied to the node running the YARN Application Master via the Secure Distributed Cache, for renewing the login tickets and the delegation tokens periodically

 — 

spark.yarn.principal

The principal to be used to login to KDC, while running on secure HDFS

 — 

zeppelin.livy.keytab

The path to the keytab file

 — 

zeppelin.livy.principal

The principal name to load from the keytab

 — 

zeppelin.server.ssl.port

The port number for SSL communication

8180

zeppelin.ssl

Defines whether to use SSL

false

zeppelin.ssl.keystore.path

The path to the keystore used by Zeppelin

 — 

zeppelin.ssl.keystore.password

The password to access the keystore file

 — 

zeppelin.ssl.truststore.path

The path to the truststore used by Zeppelin

 — 

zeppelin.ssl.truststore.password

The password to access the truststore file

 — 

Zeppelin server heap memory settings
Parameter Description Default value

Zeppelin Server Heap Memory

Sets initial (-Xms) and maximum (-Xmx) Java heap size for Zeppelin Server

-Xms700m -Xmx1024m

Shiro Simple username/password auth
Parameter Description Default value

Users/password map

A map of type <username: password,role>. For example, <myUser1: password1,role1>

 — 

Shiro LDAP auth
Parameter Description Default value

ldapRealm

Extends the Apache Shiro provider to allow for LDAP searches and to provide group membership to the authorization provider

org.apache.zeppelin.realm.LdapRealm

ldapRealm.contextFactory.authenticationMechanism

Specifies the authentication mechanism used by the LDAP service

simple

ldapRealm.contextFactory.url

The URL of the source LDAP. For example, ldap://ldap.example.com:389

 — 

ldapRealm.userDnTemplate

Optional. Knox uses this value to construct the UserDN for the authentication bind. Specify the UserDN where the first attribute is {0} indicating the attribute which matches the user log in token. For example, the UserDnTemplate for Apache DS bundled with Knox is uid={0},ou=people,dc=hadoop,dc=apache,dc=org

 — 

ldapRealm.pagingSize

Allows to set the LDAP paging size

100

ldapRealm.authorizationEnabled

Enables authorization for Shiro ldapRealm

true

ldapRealm.contextFactory.systemAuthenticationMechanism

Defines the authentication mechanism to use for Shiro ldapRealm context factory. Possible values are simple and digest-md+5

simple

ldapRealm.userLowerCase

Forces username returned from LDAP to be lower-cased

true

ldapRealm.memberAttributeValueTemplate

The attribute that identifies a user in the group. For exmaple: cn={0},ou=people,dc=hadoop,dc=apache,dc=org

 — 

ldapRealm.searchBase

The starting DN in the LDAP DIT for the search. Only subtrees of the specified subtree are searched. For example: dc=hadoop,dc=apache,dc=org

 — 

ldapRealm.userSearchBase

Search base for user bind DN. Defaults to the value of ldapRealm.searchBase if no value is defined. If ldapRealm.userSearchAttributeName is defined, also define a value for either ldapRealm.searchBase or ldapRealm.userSearchBase

 — 

ldapRealm.groupSearchBase

Search base used to search for groups. Defaults to the value of ldapRealm.searchBase. Only set if ldapRealm.authorizationEnabled=true

 — 

ldapRealm.groupObjectClass

Set the value to the Objectclass that identifies group entries in LDAP

groupofnames

ldapRealm.userSearchAttributeName

Specify the attribute that corresponds to the user login token. This attribute is used with the search results to compute the UserDN for the authentication bind

sAMAccountName

ldapRealm.memberAttribute

Set the value to the attribute that defines group membership. When the value is rememberer, found groups are treated as dynamic groups

member

ldapRealm.userSearchScope

Allows to define searchScopes. Possible values are subtree, one, base

subtree

ldapRealm.groupSearchScope

Allows to define groupSearchScope. Possible values are subtree, one, base

subtree

ldapRealm.contextFactory.systemUsername

Set to the LDAP Service Account that the Zeppelin uses for LDAP searches. If required, specify the full account UserDN. For example: uid=guest,ou=people,dc=hadoop,dc=apache,dc=org. This account requires read permission to the search base DN

 — 

ldapRealm.contextFactory.systemPassword

Sets the password for systemUsername. This password will be added to the keystore using hadoop credentials

 — 

ldapRealm.groupSearchEnableMatchingRuleInChain

Enables support for nested groups using the LDAP_MATCHING_RULE_IN_CHAIN operator

true

ldapRealm.rolesByGroup

Optional mapping from physical groups to logical application roles. For example: "LDN_USERS":"user_role", "NYK_USERS":"user_role", "HKG_USERS":"user_role", "GLOBAL_ADMIN":"admin_role"

 — 

ldapRealm.allowedRolesForAuthentication

Optional list of roles that are allowed to authenticate. If not specified, all groups are allowed to authenticate (login). This changes nothing for url-specific permissions that will continue to work as specified in [urls]. For example: "admin_role,user_role"

 — 

ldapRealm.permissionsByRole

Optional. Sets permissions by role. For example: 'user_role = :ToDoItemsJdo::*, :ToDoItem::*; admin_role = *'

 — 

securityManager.realms

Specifies a list of Apache Shiro Realms

$ldapRealm

Additional configuration Shiro.ini
Parameter Description Default value

Additional main section in shiro.ini

Allows to add additional key/value pairs to the main section of the shiro.ini file

 — 

Additional roles section in shiro.ini

Allows to add additional key/value pairs to the roles section of the shiro.ini file

 — 

Additional urls section in shiro.ini

Allows to add additional key/value pairs to the urls section of the shiro.ini file

 — 

Other
Parameter Description Default value

Custom zeppelin-site.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file zeppelin-site.xml

 — 

Custom zeppelin-env.sh

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file zeppelin-env.sh

Custom log4j.properties

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file log4j.properties

ZooKeeper

Main
Parameter Description Default value

connect

The ZooKeeper connection string used by other services or clusters. It is generated automatically

 — 

dataDir

The location where ZooKeeper stores the in-memory database snapshots and, unless specified otherwise, the transaction log of updates to the database

/var/lib/zookeeper

zoo.cfg
Parameter Description Default value

clientPort

The port to listen for client connections, that is the port that clients attempt to connect to

2181

tickTime

The basic time unit used by ZooKeeper (in milliseconds). It is used for heartbeats. The minimum session timeout will be twice the tickTime

2000

initLimit

The timeouts that ZooKeeper uses to limit the length of the time for ZooKeeper servers in quorum to connect to the leader

5

syncLimit

Defines the maximum date skew between server and the leader

2

maxClientCnxns

This property limits the number of active connections from the host, specified by IP address, to a single ZooKeeper Server

0

autopurge.snapRetainCount

When enabled, ZooKeeper auto-purge feature retains the autopurge.snapRetainCount most recent snapshots and the corresponding transaction logs in the dataDir and dataLogDir respectively and deletes the rest. The minimum value is 3

3

autopurge.purgeInterval

The time interval, for which the purge task has to be triggered (in hours). Set to a positive integer (1 and above) to enable the auto-purging

24

Add key,value

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file zoo.cfg

 — 

zookeeper-env.sh
Parameter Description Default value

ZOO_LOG_DIR

The directory to store logs

/var/log/zookeeper

ZOOPIDFILE

The directory to store the ZooKeeper process ID

/var/run/zookeeper/zookeeper_server.pid

SERVER_JVMFLAGS

Used for setting different JVM parameters connected, for example, with garbage collecting

-Xmx1024m

JAVA

A path to Java

$JAVA_HOME/bin/java

ZOO_LOG4J_PROP

Used for setting the log4j logging level and defines, which log appenders to turn on. Enabling the log appender CONSOLE directs logs to stdout. Enabling ROLLINGFILE creates the zookeeper.log file, then this file gets rotated, and expired

INFO, CONSOLE, ROLLINGFILE

Found a mistake? Seleсt text and press Ctrl+Enter to report it