Configuration parameters

This topic describes the parameters that can be configured for ADH services via ADCM. To read about the configuring process, refer to the relevant articles: Online installation, Offline installation.

NOTE
  • Some of the parameters become visible in the ADCM UI after the Advanced flag has been set.

  • The parameters that are set in the Custom group will overwrite the existing parameters even if they are read-only.

Airflow

Airflow environment
Parameter Description Default value

airflow_dir

The Airflow home directory

/srv/airflow/home

db_dir

The location of Metastore DB

/srv/airflow/metastore

airflow.cfg
Parameter Description Default value

db_user

The user to connect to Metadata DB

airflow

db_password

The password to connect to Metadata DB

 — 

db_root_password

The root password to connect to Metadata DB

 — 

db_port

The port to connect to Metadata DB

3307

server_port

The port to run the web server

8080

flower_port

The port that Celery Flower runs on

5555

worker_port

When you start an Airflow Worker, Airflow starts a tiny web server subprocess to serve the Workers local log files to the Airflow main web server, which then builds pages and sends them to users. This defines the port, on which the logs are served. The port must be free and accessible from the main web server to connect to the Workers

8793

redis_port

The port for running Redis

6379

fernet_key

The secret key to save connection passwords in the database

 — 

security

Defines which security module to use. For example, kerberos

 — 

keytab

The path to the keytab file

 — 

reinit_frequency

Sets the ticket renewal frequency

3600

principal

The Kerberos principal

ssl_active

Defines if SSL is active for Airflow

false

web_server_ssl_cert

The path to SSL certificate

/etc/ssl/certs/host_cert.cert

web_server_ssl_key

The path to SSL certificate key

/etc/ssl/host_cert.key

Logging level

Specifies the logging level for Airflow activity

INFO

Logging level for Flask-appbuilder UI

Specifies the logging level for Flask-appbuilder UI

WARNING

cfg_properties_template

The Jinja template to initialize environment variables for Airflow

External database
Parameter Description Default value

Database type

The external database type. Possible values: PostgreSQL, MySQL/MariaDB

MySQL/MariaDB

Hostname

The external database host

 — 

Custom port

The external database port

 — 

Airflow database name

The external database name

airflow

flink-conf.yaml
Parameter Description Default value

jobmanager.rpc.port

The RPC port through which the JobManager is reachable. In the high availability mode, this value is ignored and the port number to connect to JobManager is generated by ZooKeeper

6123

sql-gateway.endpoint.rest.port

A port to connect to the SQL Gateway service

8083

taskmanager.network.bind-policy

The automatic address binding policy used by the TaskManager

name

parallelism.default

The system-wide default parallelism level for all execution environments

1

taskmanager.numberOfTaskSlots

The number of task slots that each TaskManager offers. Each slot runs one parallel pipeline

1

taskmanager.heap.size

The heap size for the TaskManager JVM

1024m

jobmanager.heap.size

The heap size for the JobManager JVM

1024m

security.kerberos.login.use-ticket-cache

Indicates whether to read from the Kerberos ticket cache

false

security.kerberos.login.keytab

The absolute path to the Kerberos keytab file that stores user credentials

 — 

security.kerberos.login.principal

Flink Kerberos principal

 — 

security.kerberos.login.contexts

A comma-separated list of login contexts to provide the Kerberos credentials to

 — 

security.ssl.rest.enabled

Turns on SSL for external communication via REST endpoints

false

security.ssl.rest.keystore

The Java keystore file with SSL key and certificate to be used by Flink’s external REST endpoints

 — 

security.ssl.rest.truststore

The truststore file containing public CA certificates to verify the peer for Flink’s external REST endpoints

 — 

security.ssl.rest.keystore-password

The secret to decrypt the keystore file for Flink external REST endpoints

 — 

security.ssl.rest.truststore-password

The password to decrypt the truststore for Flink’s external REST endpoints

 — 

security.ssl.rest.key-password

The secret to decrypt the key in the keystore for Flink’s external REST endpoints

 — 

Logging level

Defines the logging level for Flink activity

INFO

high-availability

Defines the High Availability (HA) mode used for cluster execution

 — 

high-availability.zookeeper.quorum

The ZooKeeper quorum to use when running Flink in the HA mode with ZooKeeper

 — 

high-availability.storageDir

A file system path (URI) where Flink persists metadata in the HA mode

 — 

high-availability.zookeeper.path.root

The root path for Flink ZNode in Zookeeper

/flink

high-availability.cluster-id

The ID of the Flink cluster used to separate multiple Flink clusters from each other

 — 

sql-gateway.session.check-interval

The check interval to detect idle sessions. A value <= 0 disables the checks

1 min

sql-gateway.session.idle-timeout

The timeout to close a session if no successful connection was made during this interval. A value <= 0 never closes the sessions

10 min

sql-gateway.session.max-num

The maximum number of sessions to run simultaneously

1000000

sql-gateway.worker.keepalive-time

The time to keep an idle worker thread alive. When the worker thread count exceeds sql-gateway.worker.threads.min, excessive threads are killed after this time interval

5 min

sql-gateway.worker.threads.max

The maximum number of worker threads on the SQL Gateway server

500

sql-gateway.worker.threads.min

The minimum number of worker threads. If the current number of worker threads is less than this value, the worker threads are not deleted automatically

500

zookeeper.sasl.disable

Defines the SASL authentication in Zookeeper

false

Other
Parameter Description Default value

Custom flink-conf.yaml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file flink-conf.yaml

 — 

log4j.properties

The contents of the log4j.properties configuration file

log4j-cli.properties

The contents of the log4j-cli.properties configuration file

HBase

hbase-site.xml
Parameter Description Default value

hbase.balancer.period

The time period to run the Region balancer in Master

300000

hbase.client.pause

General client pause value. Used mostly as value to wait before running a retry of a failed get, region lookup, etc. See hbase.client.retries.number for description of how this pause works with retries

100

hbase.client.max.perregion.tasks

The maximum number of concurrent mutation tasks the Client will maintain to a single Region. That is, if there is already hbase.client.max.perregion.tasks writes in progress for this Region, new puts won’t be sent to this Region, until some writes finishes

1

hbase.client.max.perserver.tasks

The maximum number of concurrent mutation tasks a single HTable instance will send to a single Region Server

2

hbase.client.max.total.tasks

The maximum number of concurrent mutation tasks, a single HTable instance will send to the cluster

100

hbase.client.retries.number

The maximum number of retries. It is used as maximum for all retryable operations, such as: getting a cell value, starting a row update, etc. Retry interval is a rough function based on hbase.client.pause. See the constant RETRY_BACKOFF for how the backup ramps up. Change this setting and hbase.client.pause to suit your workload

15

hbase.client.scanner.timeout.period

The Client scanner lease period in milliseconds

60000

hbase.cluster.distributed

The cluster mode. Possible values are: false — for standalone mode and pseudo-distributed setups with managed ZooKeeper; true — for fully-distributed mode with unmanaged ZooKeeper Quorum. If false, the startup will run all HBase and ZooKeeper daemons together in the one JVM, if true — one JVM instance per daemon

true

hbase.hregion.majorcompaction

The time interval between Major compactions in milliseconds. Set to 0 to disable time-based automatic Major compactions. User-requested and size-based Major compactions will still run. This value is multiplied by hbase.hregion.majorcompaction.jitter to cause compaction to start at a somewhat-random time during a given time frame

604800000

hbase.hregion.max.filesize

The maximum file size. If the total size of some Region HFiles has grown to exceed this value, the Region is split in two. There are two options of how this option works: the first is when any store size exceeds the threshold — then split, and the other is if overall Region size exceeds the threshold — then split. It can be configured by hbase.hregion.split.overallfiles

10737418240

hbase.hstore.blockingStoreFiles

If more than this number of StoreFiles exists in any Store (one StoreFile is written per flush of MemStore), updates are blocked for this Region, until a compaction is completed, or until hbase.hstore.blockingWaitTime is exceeded

16

hbase.hstore.blockingWaitTime

The time for which a Region will block updates after reaching the StoreFile limit, defined by hbase.hstore.blockingStoreFiles. After this time is elapsed, the Region will stop blocking updates, even if a compaction has not been completed

90000

hbase.hstore.compaction.max

The maximum number of StoreFiles that will be selected for a single Minor compaction, regardless of the number of eligible StoreFiles. Effectively, the value of hbase.hstore.compaction.max controls the time it takes for a single compaction to complete. Setting it larger means that more StoreFiles are included in a compaction. For most cases, the default value is appropriate

10

hbase.hstore.compaction.min

The minimum number of StoreFiles that must be eligible for compaction before compaction can run. The goal of tuning hbase.hstore.compaction.min is to avoid a situation with too many tiny StoreFiles to compact. Setting this value to 2 would cause a Minor compaction each time you have two StoreFiles in a Store, and this is probably not appropriate. If you set this value too high, all the other values will need to be adjusted accordingly. For most cases, the default value is appropriate. In the previous versions of HBase, the parameter hbase.hstore.compaction.min was called hbase.hstore.compactionThreshold

3

hbase.hstore.compaction.min.size

A StoreFile, smaller than this size, will always be eligible for Minor compaction. StoreFiles this size or larger are evaluated by hbase.hstore.compaction.ratio to determine, if they are eligible. Because this limit represents the "automatic include" limit for all StoreFiles smaller than this value, this value may need to be reduced in write-heavy environments, where many files in the 1-2 MB range are being flushed, because every StoreFile will be targeted for compaction and the resulting StoreFiles may still be under the minimum size and require further compaction. If this parameter is lowered, the ratio check is triggered more quickly. This addressed some issues seen in earlier versions of HBase, but changing this parameter is no longer necessary in most situations

134217728

hbase.hstore.compaction.ratio

For Minor compaction, this ratio is used to determine, whether a given StoreFile that is larger than hbase.hstore.compaction.min.size, is eligible for compaction. Its effect is to limit compaction of large StoreFile. The value of hbase.hstore.compaction.ratio is expressed as a floating-point decimal

1.2F

hbase.hstore.compaction.ratio.offpeak

The compaction ratio used during off-peak compactions if the off-peak hours are also configured. Expressed as a floating-point decimal. This allows for more aggressive (or less aggressive, if you set it lower than hbase.hstore.compaction.ratio) compaction during a given time period. The value is ignored if off-peak is disabled (default). This works the same as hbase.hstore.compaction.ratio

5.0F

hbase.hstore.compactionThreshold

If more than this number of StoreFiles exists in any Store (one StoreFile is written per flush of MemStore), a compaction is run to rewrite all StoreFiles into a single StoreFile. Larger values delay the compaction, but when compaction does occur, it takes longer to complete

3

hbase.hstore.flusher.count

The number of flush threads. With fewer threads, the MemStore flushes will be queued. With more threads, the flushes will be executed in parallel, increasing the load on HDFS, and potentially causing more compactions

2

hbase.hstore.time.to.purge.deletes

The amount of time to delay purging of delete markers with future timestamps. If unset or set to 0, all the delete markers, including those with future timestamps, are purged during the next Major compaction. Otherwise, a delete marker is kept until the Major compaction that occurs after the marker timestamp plus the value of this setting (in milliseconds)

0

hbase.master.ipc.address

HMaster RPC

0.0.0.0

hbase.normalizer.period

The period at which the Region normalizer runs on Master (in milliseconds)

300000

hbase.regionserver.compaction.enabled

Enables/disables compactions by setting true/false. You can further switch compactions dynamically with the compaction_switch shell command

true

hbase.regionserver.ipc.address

Region Server RPC

0.0.0.0

hbase.regionserver.regionSplitLimit

The limit for the number of Regions, after which no more Region splitting should take place. This is not hard limit for the number of Regions, but acts as a guideline for the Region Server to stop splitting after a certain limit

1000

hbase.rootdir

The directory shared by Region Servers and into which HBase persists. The URL should be fully-qualified to include the filesystem scheme. For example, to specify the HDFS directory /hbase where the HDFS instance NameNode is running at namenode.example.org on port 9000, set this value to: hdfs://namenode.example.org:9000/hbase

 — 

hbase.zookeeper.quorum

A comma-separated list of servers in the ZooKeeper ensemble. For example, host1.mydomain.com,host2.mydomain.com,host3.mydomain.com. By default, this is set to localhost for local and pseudo-distributed modes of operation. For a fully-distributed setup, this should be set to a full list of ZooKeeper ensemble servers. If HBASE_MANAGES_ZK is set in hbase-env.sh, this is the list of servers, which HBase will start/stop ZooKeeper on, as part of cluster start/stop. Client-side, the list of ensemble members is put together with the hbase.zookeeper.property.clientPort config and is passed to the ZooKeeper constructor as the connection string parameter

 — 

zookeeper.session.timeout

The ZooKeeper session timeout in milliseconds. It is used in two different ways. First, this value is processed by the ZooKeeper Client that HBase uses to connect to the ensemble. It is also used by HBase, when it starts a ZooKeeper Server (in that case the timeout is passed as the maxSessionTimeout). See more details in the ZooKeeper documentation. For example, if an HBase Region Server connects to a ZooKeeper ensemble that is also managed by HBase, then the session timeout will be the one specified by this configuration. But a Region Server that connects to an ensemble managed with a different configuration will be subjected to the maxSessionTimeout of that ensemble. So, even though HBase might propose using 90 seconds, the ensemble can have a max timeout, lower than this, and it will take precedence. The current default maxSessionTimeout that ZooKeeper ships with is 40 seconds, which is lower than HBase

90000

zookeeper.znode.parent

The root znode for HBase in ZooKeeper. All of the HBase ZooKeeper files configured with a relative path will go under this node. By default, all of the HBase ZooKeeper file paths are configured with a relative path, so they will all go under this directory unless changed

/hbase

hbase.rest.port

The port used by HBase Rest Servers

60080

hbase.zookeeper.property.authProvider.1

Specifies the ZooKeeper authentication method

hbase.security.authentication

Set the value to true to run HBase RPC with strong authentication

false

hbase.security.authentication.ui

Enables Kerberos authentication to HBase web UI with SPNEGO

 — 

hbase.security.authentication.spnego.kerberos.principal

The Kerberos principal for SPNEGO authentication

 — 

hbase.security.authentication.spnego.kerberos.keytab

The path to the Kerberos keytab file with principals to be used for SPNEGO authentication

 — 

hbase.security.authorization

Set the value to true to run HBase RPC with strong authorization

false

hbase.master.kerberos.principal

The Kerberos principal used to run the HMaster process

 — 

hbase.master.keytab.file

Full path to the Kerberos keytab file to use for logging in the configured HMaster server principal

 — 

hbase.regionserver.kerberos.principal

The Kerberos principal name that should be used to run the HRegionServer process

 — 

hbase.regionserver.keytab.file

Full path to the Kerberos keytab file to use for logging in the configured HRegionServer server principal

 — 

hbase.rest.authentication.type

REST Gateway Kerberos authentication type

 — 

hbase.rest.authentication.kerberos.principal

REST Gateway Kerberos principal

 — 

hbase.rest.authentication.kerberos.keytab

REST Gateway Kerberos principal

 — 

hbase.thrift.keytab.file

Thrift Kerberos keytab

 — 

hbase.rest.keytab.file

HBase REST gateway Kerberos keytab

 — 

hbase.rest.kerberos.principal

HBase REST gateway Kerberos principal

 — 

hbase.thrift.kerberos.principal

Thrift Kerberos principal

 — 

hbase.thrift.security.qop

Defines authentication, integrity, and confidentiality checking. Supported values:

  • auth-conf — authentication, integrity, and confidentiality checking;

  • auth-int — authentication and integrity checking;

  • auth — authentication checking only.

 — 

phoenix.queryserver.keytab.file

The path to the Kerberos keytab file

 — 

phoenix.queryserver.kerberos.principal

The Kerberos principal to use when authenticating. If phoenix.queryserver.kerberos.http.principal is not defined, this principal specified will be also used to both authenticate SPNEGO connections and to connect to HBase

 — 

phoenix.queryserver.kerberos.keytab

The full path to the Kerberos keytab file to use for logging in the configured HMaster server principal

 — 

phoenix.queryserver.http.keytab.file

The keytab file to use for authenticating SPNEGO connections. This configuration must be specified if phoenix.queryserver.kerberos.http.principal is configured. phoenix.queryserver.keytab.file will be used if this property is undefined

 — 

phoenix.queryserver.http.kerberos.principal

The Kerberos principal to use when authenticating SPNEGO connections. phoenix.queryserver.kerberos.principal will be used if this property is undefined

phoenix.queryserver.kerberos.http.principal

Deprecated, use phoenix.queryserver.http.kerberos.principal instead

 — 

hbase.ssl.enabled

Defines whether SSL is enabled for web UIs

false

hadoop.ssl.enabled

Defines whether SSL is enabled for Hadoop RPC

false

ssl.server.keystore.location

The path to the keystore file

 — 

ssl.server.keystore.password

The password to the keystore

 — 

ssl.server.truststore.location

The path to the truststore to be used

 — 

ssl.server.truststore.password

The password to the truststore

 — 

ssl.server.keystore.keypassword

The password to the key in the keystore

 — 

hbase.rest.ssl.enabled

Defines whether SSL is enabled for HBase REST server

false

hbase.rest.ssl.keystore.store

The path to the keystore used by HBase REST server

 — 

hbase.rest.ssl.keystore.password

The password to the keystore

 — 

hbase.rest.ssl.keystore.keypassword

The password to the key in the keystore

 — 

HBASE heap memory settings
Parameter Description Default value

HBASE Regionserver Heap Memory

Sets initial (-Xms) and maximum (-Xmx) Java heap size for HBase Region server

-Xms700m -Xmx9G

HBASE Master Heap Memory

Sets initial (-Xms) and maximum (-Xmx) Java heap size for HBase Master

-Xms700m -Xmx9G

Phoenix Queryserver Heap Memory

Sets initial (-Xms) and maximum (-Xmx) Java heap size for Phoenix Query server

-Xms700m -Xmx8G

HBASE Thrift2 server Heap Memory

Sets initial (-Xms) and maximum (-Xmx) Java heap size for HBase Thrift2 server

-Xms700m -Xmx8G

HBASE Rest server Heap Memory

Sets initial (-Xms) and maximum (-Xmx) Java heap size for HBase Rest server

-Xms200m -Xmx8G

ranger-hbase-audit.xml
Parameter Description Default value

xasecure.audit.destination.solr.batch.filespool.dir

The spool directory path

/srv/ranger/hdfs_plugin/audit_solr_spool

xasecure.audit.destination.solr.urls

Leave this property value empty or set it to NONE when using ZooKeeper to connect to Solr

 — 

xasecure.audit.destination.solr.zookeepers

Specifies the ZooKeeper connection string for the Solr destination

 — 

xasecure.audit.destination.solr.force.use.inmemory.jaas.config

Uses in-memory JAAS configuration file to connect to Solr

 — 

xasecure.audit.is.enabled

Enables Ranger audit

true

xasecure.audit.jaas.Client.loginModuleControlFlag

Specifies whether the success of the module is required, requisite, sufficient, or optional

 — 

xasecure.audit.jaas.Client.loginModuleName

The name of the authenticator class

 — 

xasecure.audit.jaas.Client.option.keyTab

The name of the keytab file to get the principal’s secret key

 — 

xasecure.audit.jaas.Client.option.principal

The name of the principal to be used

 — 

xasecure.audit.jaas.Client.option.serviceName

Represents a user or a service that wants to log in

 — 

xasecure.audit.jaas.Client.option.storeKey

Set this to true if you want the keytab or the principal’s key to be stored in the subject’s private credentials

false

xasecure.audit.jaas.Client.option.useKeyTab

Set this to true if you want the module to get the principal’s key from the keytab

false

ranger-hbase-security.xml
Parameter Description Default value

ranger.plugin.hbase.policy.rest.url

The URL to Ranger Admin

 — 

ranger.plugin.hbase.service.name

The name of the Ranger service containing policies for this instance

 — 

ranger.plugin.hbase.policy.cache.dir

The directory where Ranger policies are cached after successful retrieval from the source

/srv/ranger/hbase/policycache

ranger.plugin.hbase.policy.pollIntervalMs

Defines how often to poll for changes in policies

30000

ranger.plugin.hbase.policy.rest.client.connection.timeoutMs

The HBase Plugin RangerRestClient connection timeout (in milliseconds)

120000

ranger.plugin.hbase.policy.rest.client.read.timeoutMs

The HBase Plugin RangerRestClient read timeout (in milliseconds)

30000

ranger.plugin.hbase.policy.rest.ssl.config.file

The path to the RangerRestClient SSL config file for HBase plugin

/etc/hbase/conf/ranger-hbase-policymgr-ssl.xml

ranger-hbase-policymgr-ssl.xml
Parameter Description Default value

xasecure.policymgr.clientssl.keystore

The path to the keystore file used by Ranger

 — 

xasecure.policymgr.clientssl.keystore.credential.file

The path to the keystore credentials file

/etc/hbase/conf/ranger-hbase.jceks

xasecure.policymgr.clientssl.truststore.credential.file

The path to the truststore credentials file

/etc/hbase/conf/ranger-hbase.jceks

xasecure.policymgr.clientssl.truststore

The path to the truststore file used by Ranger

 — 

xasecure.policymgr.clientssl.keystore.password

The password to the keystore file

 — 

xasecure.policymgr.clientssl.truststore.password

The password to the truststore file

 — 

Other
Parameter Description Default value

Custom hbase-site.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file hbase-site.xml

 — 

Custom hbase-env.sh

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file hbase-env.sh

 — 

Ranger plugin enabled

Whether or not Ranger plugin is enabled

false

Custom ranger-hbase-audit.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ranger-hbase-audit.xml

 — 

Custom ranger-hbase-security.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ranger-hbase-security.xml

 — 

Custom ranger-hbase-security.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ranger-hbase-security.xml

 — 

Custom ranger-hbase-policymgr-ssl.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ranger-hbase-policymgr-ssl.xml

 — 

Custom log4j.properties

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file log4j.properties

Custom hadoop-metrics2-hbase.properties

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file hadoop-metrics2-hbase.properties

HDFS

core-site.xml
Parameter Description Default value

fs.defaultFS

The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The URI scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The URI authority is used to determine the host, port, etc. for a filesystem

 — 

fs.trash.checkpoint.interval

The number of minutes between trash checkpoints. Should be smaller or equal to fs.trash.interval. Every time the checkpointer runs, it creates a new checkpoint out of current and removes checkpoints, created more than fs.trash.interval minutes ago

60

fs.trash.interval

The number of minutes, after which the checkpoint gets deleted. If set to 0, the trash feature is disabled

1440

hadoop.tmp.dir

The base for other temporary directories

/tmp/hadoop-${user.name}

hadoop.zk.address

A comma-separated list of pairs <Host>:<Port>. Each corresponds to a ZooKeeper to be used by the Resource Manager for storing Resource Manager state

 — 

io.file.buffer.size

The buffer size for sequence files. The size of this buffer should probably be a multiple of hardware page size (4096 on Intel x86), and it determines, how much data is buffered during read and write operations

131072

net.topology.script.file.name

The script name, that should be invoked to resolve DNS names to NetworkTopology names. Example: the script would take host.foo.bar as an argument, and return /rack1 as the output

 — 

ha.zookeeper.quorum

A list of ZooKeeper Server addresses, separated by commas, that are to be used by the ZKFailoverController in automatic failover

 — 

ipc.client.fallback-to-simple-auth-allowed

When a client is configured to attempt a secure connection, but attempts to connect to an insecure server, that server may instuct the client to switch to SASL SIMPLE (unsecure) authentication. This setting controls whether or not the client will accept this instruction from the server. When set to false (default), the client does not allow the fallback to SIMPLE authentication and will abort the connection

false

hadoop.security.authentication

Defines the authentication type. Possible values: simple — no authentication, kerberos — enables the authentication by Kerberos

simple

hadoop.security.authorization

Enables RPC service-level authorization

false

hadoop.rpc.protection

Specifies RPC protection. Possible values:

  • authentication — authentication only;

  • integrity — performs the integrity check in addition to authentication;

  • privacy — encrypts the data in addition to integrity.

authentication

hadoop.security.auth_to_local

The value is a string containing new line characters. See Kerberos documentation for more information about the format

 — 

hadoop.http.authentication.type

Defines authentication used for the HTTP web-consoles. The supported values are: simple, kerberos, [AUTHENTICATION_HANDLER-CLASSNAME]

simple

hadoop.http.authentication.kerberos.principal

Indicates the Kerberos principal to be used for HTTP endpoint when using the kerberos authentication. The principal short name adhere to HTTP per Kerberos HTTP SPNEGO specification

HTTP/localhost@$LOCALHOST

hadoop.http.authentication.kerberos.keytab

The location of the keytab file with the credentials for the Kerberos principal used for the HTTP endpoint

/etc/security/keytabs/HTTP.service.keytab

ha.zookeeper.acl

ACLs for all znodes

 — 

hadoop.http.filter.initializers

Add to this property the org.apache.hadoop.security.AuthenticationFilterInitializer initializer class

 — 

hadoop.http.authentication.signature.secret.file

The signature secret file for signing the authentication tokens. If not set, a random secret is generated during the startup. The same secret should be used for all nodes in the cluster, JobTracker, NameNode, DataNode and TastTracker. This file should be readable only by the Unix user running the daemons

/etc/security/http_secret

hadoop.http.authentication.cookie.domain

The domain to use for the HTTP cookie that stores the authentication token. In order for authentication to work properly across all nodes in the cluster, the domain must be correctly set. There is no default value, the HTTP cookie will not have a domain working only with the hostname issuing the HTTP cookie

 — 

hadoop.ssl.require.client.cert

Defines whether client certificates are required

false

hadoop.ssl.hostname.verifier

The host name verifier to provide for HttpsURLConnections. Valid values are: DEFAULT, STRICT, STRICT_IE6, DEFAULT_AND_LOCALHOST, and ALLOW_ALL

DEFAULT

hadoop.ssl.keystores.factory.class

The KeyStoresFactory implementation to use

org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory

hadoop.ssl.server.conf

A resource file from which the SSL server keystore information will be extracted. This file is looked up in the classpath, typically it should be located in Hadoop conf/ directory

ssl-server.xml

hadoop.ssl.client.conf

A resource file from which the SSL client keystore information will be extracted. This file is looked up in the classpath, typically it should be located in Hadoop conf/ directory

ssl-client.xml

User managed hadoop.security.auth_to_local

Disable automatic generation of hadoop.security.auth_to_local

false

hdfs-site.xml
Parameter Description Default value

dfs.client.block.write.replace-datanode-on-failure.enable

If there is a DataNode/network failure in the write pipeline, DFSClient will try to remove the failed DataNode from the pipeline and then continue writing with the remaining DataNodes. As a result, the number of DataNodes in the pipeline is decreased. The feature is to add new DataNodes to the pipeline. This is a site-wide property to enable/disable the feature. When the cluster size is extremely small, e.g. 3 nodes or less, cluster administrators may want to set the policy to NEVER in the default configuration file or disable this feature. Otherwise, users may experience an unusually high rate of pipeline failures since it is impossible to find new DataNodes for replacement. See also dfs.client.block.write.replace-datanode-on-failure.policy

true

dfs.client.block.write.replace-datanode-on-failure.policy

This property is used only if the value of dfs.client.block.write.replace-datanode-on-failure.enable is true. Possible values:

  • ALWAYS. Always adds a new DataNode, when an existing DataNode is removed.

  • NEVER. Never adds a new DataNode.

  • DEFAULT. Let r be the replication number. Let n be the number of existing DataNodes. Add a new DataNode only, if r is greater than or equal to 3 and either:

    1. floor(r/2) is greater than or equal to n;

    2. r is greater than n and the block is hflushed/appended.

DEFAULT

dfs.client.block.write.replace-datanode-on-failure.best-effort

This property is used only if the value of dfs.client.block.write.replace-datanode-on-failure.enable is true. Best effort means, that the client will try to replace a failed DataNode in write pipeline (provided that the policy is satisfied), however, it continues the write operation in case that the DataNode replacement also fails. Suppose, the DataNode replacement fails: false — an exception should be thrown so that the write will fail; true — the write should be resumed with the remaining DataNodes. Note, that setting this property to true allows writing to a pipeline with a smaller number of DataNodes. As a result, it increases the probability of data loss

false

dfs.client.block.write.replace-datanode-on-failure.min-replication

The minimum number of replications needed not to fail the write pipeline if new DataNodes can not be found to replace failed DataNodes (could be due to network failure) in the write pipeline. If the number of the remaining DataNodes in the write pipeline is greater than or equal to this property value, continue writing to the remaining nodes. Otherwise throw exception. If this is set to 0, an exception will be thrown, when a replacement can not be found. See also dfs.client.block.write.replace-datanode-on-failure.policy

0

dfs.balancer.dispatcherThreads

The size of the thread pool for the HDFS balancer block mover — dispatchExecutor

200

dfs.balancer.movedWinWidth

The time window in milliseconds for the HDFS balancer tracking blocks and its locations

5400000

dfs.balancer.moverThreads

The thread pool size for executing block moves — moverThreadAllocator

1000

dfs.balancer.max-size-to-move

The maximum number of bytes that can be moved by the balancer in a single thread

10737418240

dfs.balancer.getBlocks.min-block-size

The minimum block threshold size in bytes to ignore, when fetching a source block list

10485760

dfs.balancer.getBlocks.size

The total size in bytes of DataNode blocks to get, when fetching a source block list

2147483648

dfs.balancer.block-move.timeout

The maximum amount of time for a block to move (in milliseconds). If set greater than 0, the balancer will stop waiting for a block move completion after this time. In typical clusters, a 3-5 minute timeout is reasonable. If the timeout is set for a large proportion of block moves, this needs to be increased. It could also be that too much work is dispatched and many nodes are constantly exceeding the bandwidth limit as a result. In that case, other balancer parameters might need to be adjusted. It is disabled (0) by default

0

dfs.balancer.max-no-move-interval

If this specified amount of time has elapsed and no blocks have been moved out of a source DataNode, one more attempt will be made to move blocks out of this DataNode in the current Balancer iteration

60000

dfs.balancer.max-iteration-time

The maximum amount of time an iteration can be run by the Balancer. After this time the Balancer will stop the iteration, and re-evaluate the work needed to be done to balance the cluster. The default value is 20 minutes

1200000

dfs.blocksize

The default block size for new files (in bytes). You can use the following suffixes to define size units (case insensitive): k (kilo), m (mega), g (giga), t (tera), p (peta), e (exa). For example, 128k, 512m, 1g, etc. You can also specify the block size in bytes (such as 134217728 for 128 MB)

134217728

dfs.client.read.shortcircuit

Turns on short-circuit local reads

true

dfs.datanode.balance.max.concurrent.moves

The maximum number of threads for DataNode balancer pending moves. This value is reconfigurable via the dfsadmin -reconfig command

50

dfs.datanode.data.dir

Determines, where on the local filesystem a DFS data node should store its blocks. If multiple directories are specified, then data will be stored in all named directories, typically on different devices. The directories should be tagged with corresponding storage types (SSD/DISK/ARCHIVE/RAM_DISK) for HDFS storage policies. The default storage type will be DISK if the directory does not have a storage type tagged explicitly. Directories, that do not exist, will be created, if the local filesystem permission allows

/srv/hadoop-hdfs/data:DISK

dfs.disk.balancer.max.disk.throughputInMBperSec

The maximum disk bandwidth, used by the disk balancer during reads from a source disk. The unit is MB/sec

10

dfs.disk.balancer.block.tolerance.percent

The parameter specifies when a good enough value is reached for any copy step (in percents). For example, if set to to 10 then getting close to 10% of the target value is considered as good enough. In other words, if the move operation is 20GB in size, if 18GB (20 * (1-10%)) can be moved, the entire operation is considered successful

10

dfs.disk.balancer.max.disk.errors

During a block move from a source to destination disk, there might be various errors. This parameter defines how many errors to tolerate before declaring a move between 2 disks (or a step) has failed

5

dfs.disk.balancer.plan.valid.interval

The maximum amount of time a disk balancer plan (a set of configurations that define the data volume to be redistributed between two disks) remains valid. This setting supports multiple time unit suffixes as described in dfs.heartbeat.interval. If no suffix is specified, then milliseconds are assumed

1d

dfs.disk.balancer.plan.threshold.percent

Defines a data storage threshold in percents at which disks start participating in data redistribution or balancing activities

10

dfs.domain.socket.path

The path to a UNIX domain socket that will be used for communication between the DataNode and local HDFS clients. If the string _PORT is present in this path, it will be replaced by the TCP port of the DataNode. The parameter is optional

/var/lib/hadoop-hdfs/dn_socket

dfs.hosts

Names a file that contains a list of hosts allowed to connect to the NameNode. The full pathname of the file must be specified. If the value is empty, all hosts are permitted

/etc/hadoop/conf/dfs.hosts

dfs.mover.movedWinWidth

The minimum time interval for a block to be moved to another location again (in milliseconds)

5400000

dfs.mover.moverThreads

Sets the balancer mover thread pool size

1000

dfs.mover.retry.max.attempts

The maximum number of retries before the mover considers the move as failed

10

dfs.mover.max-no-move-interval

If this specified amount of time has elapsed and no block has been moved out of a source DataNode, one more attempt will be made to move blocks out of this DataNode in the current mover iteration

60000

dfs.namenode.name.dir

Determines where on the local filesystem the DFS name node should store the name table (fsimage). If multiple directories are specified, then the name table is replicated in all of the directories, for redundancy

/srv/hadoop-hdfs/name

dfs.namenode.checkpoint.dir

Determines where on the local filesystem the DFS secondary name node should store the temporary images to merge. If multiple directories are specified, then the image is replicated in all of the directories for redundancy

/srv/hadoop-hdfs/checkpoint

dfs.namenode.hosts.provider.classname

The class that provides access for host files. org.apache.hadoop.hdfs.server.blockmanagement.HostFileManager is used by default that loads files specified by dfs.hosts and dfs.hosts.exclude. If org.apache.hadoop.hdfs.server.blockmanagement.CombinedHostFileManager is used, it will load the JSON file defined in dfs.hosts. To change the class name, NameNode restart is required. dfsadmin -refreshNodes only refreshes the configuration files, used by the class

org.apache.hadoop.hdfs.server.blockmanagement.CombinedHostFileManager

dfs.namenode.rpc-bind-host

The actual address, the RPC Server will bind to. If this optional address is set, it overrides only the hostname portion of dfs.namenode.rpc-address. It can also be specified per NameNode or name service for HA/Federation. This is useful for making the NameNode listen on all interfaces by setting it to 0.0.0.0

0.0.0.0

dfs.permissions.superusergroup

The name of the group of super-users. The value should be a single group name

hadoop

dfs.replication

The default block replication. The actual number of replications can be specified, when the file is created. The default is used, if replication is not specified in create time

3

dfs.journalnode.http-address

The HTTP address of the JournalNode web UI

0.0.0.0:8480

dfs.journalnode.https-address

The HTTPS address of the JournalNode web UI

0.0.0.0:8481

dfs.journalnode.rpc-address

The RPC address of the JournalNode web UI

0.0.0.0:8485

dfs.datanode.http.address

The address of the DataNode HTTP server

0.0.0.0:9864

dfs.datanode.https.address

The address of the DataNode HTTPS server

0.0.0.0:9865

dfs.datanode.address

The address of the DataNode for data transfer

0.0.0.0:9866

dfs.datanode.ipc.address

The IPC address of the DataNode

0.0.0.0:9867

dfs.namenode.http-address

The address and the base port to access the dfs NameNode web UI

0.0.0.0:9870

dfs.namenode.https-address

The secure HTTPS address of the NameNode

0.0.0.0:9871

dfs.ha.automatic-failover.enabled

Defines whether automatic failover is enabled

true

dfs.ha.fencing.methods

A list of scripts or Java classes that will be used to fence the Active NameNode during a failover

shell(/bin/true)

dfs.journalnode.edits.dir

The directory where to store journal edit files

/srv/hadoop-hdfs/journalnode

dfs.namenode.shared.edits.dir

The directory on shared storage between the multiple NameNodes in an HA cluster. This directory will be written by the active and read by the standby in order to keep the namespaces synchronized. This directory does not need to be listed in dfs.namenode.edits.dir. It should be left empty in a non-HA cluster

---

dfs.internal.nameservices

A unique nameservices identifier for a cluster or federation. For a single cluster, specify the name that will be used as an alias. For HDFS federation, specify, separated by commas, all namespaces associated with this cluster. This option allows you to use an alias instead of an IP address or FQDN for some commands, for example: hdfs dfs -ls hdfs://<dfs.internal.nameservices>. The value must be alphanumeric without underscores

 — 

dfs.block.access.token.enable

If set to true, access tokens are used as capabilities for accessing DataNodes. If set to false, no access tokens are checked on accessing DataNodes

false

dfs.namenode.kerberos.principal

The NameNode service principal. This is typically set to nn/_HOST@REALM.TLD. Each NameNode will substitute _HOST with its own fully qualified hostname during the startup. The _HOST placeholder allows using the same configuration setting on both NameNodes in an HA setup

nn/_HOST@REALM

dfs.namenode.keytab.file

The keytab file used by each NameNode daemon to login as its service principal. The principal name is configured with dfs.namenode.kerberos.principal

/etc/security/keytabs/nn.service.keytab

dfs.namenode.kerberos.internal.spnego.principal

HTTP Kerberos principal name for the NameNode

HTTP/_HOST@REALM

dfs.web.authentication.kerberos.principal

Kerberos principal name for the WebHDFS

HTTP/_HOST@REALM

dfs.web.authentication.kerberos.keytab

Kerberos keytab file for WebHDFS

/etc/security/keytabs/HTTP.service.keytab

dfs.journalnode.kerberos.principal

The JournalNode service principal. This is typically set to jn/_HOST@REALM.TLD. Each JournalNode will substitute _HOST with its own fully qualified hostname at startup. The _HOST placeholder allows using the same configuration setting on all JournalNodes

jn/_HOST@REALM

dfs.journalnode.keytab.file

The keytab file used by each JournalNode daemon to login as its service principal. The principal name is configured with dfs.journalnode.kerberos.principal

/etc/security/keytabs/jn.service.keytab

dfs.journalnode.kerberos.internal.spnego.principal

The server principal used by the JournalNode HTTP Server for SPNEGO authentication when Kerberos security is enabled. This is typically set to HTTP/_HOST@REALM.TLD. The SPNEGO server principal begins with the prefix HTTP/ by convention. If the value is *, the web server will attempt to login with every principal specified in the keytab file dfs.web.authentication.kerberos.keytab. For most deployments this can be set to ${dfs.web.authentication.kerberos.principal} that is use the value of dfs.web.authentication.kerberos.principal

HTTP/_HOST@REALM

dfs.datanode.data.dir.perm

Permissions for the directories on the local filesystem where the DFS DataNode stores its blocks. The permissions can either be octal or symbolic

700

dfs.datanode.kerberos.principal

The DataNode service principal. This is typically set to dn/_HOST@REALM.TLD. Each DataNode will substitute _HOST with its own fully qualified host name at startup. The _HOST placeholder allows using the same configuration setting on all DataNodes

dn/_HOST@REALM.TLD

dfs.datanode.keytab.file

The keytab file used by each DataNode daemon to login as its service principal. The principal name is configured with dfs.datanode.kerberos.principal

/etc/security/keytabs/dn.service.keytab

dfs.http.policy

Defines if HTTPS (SSL) is supported on HDFS. This configures the HTTP endpoint for HDFS daemons. The following values are supported: HTTP_ONLY — the service is provided only via http; HTTPS_ONLY — the service is provided only via https; HTTP_AND_HTTPS — the service is provided both via http and https

HTTP_ONLY

dfs.data.transfer.protection

A comma-separated list of SASL protection values used for secured connections to the DataNode when reading or writing block data. The possible values are:

  • authentication — provides only authentication; no integrity or privacy;

  • integrity — authentication and integrity are enabled;

  • privacy — authentication, integrity and privacy are enabled.

If dfs.encrypt.data.transfer=true, then it supersedes the setting for dfs.data.transfer.protection and enforces that all connections must use a specialized encrypted SASL handshake. This property is ignored for connections to a DataNode listening on a privileged port. In this case, it is assumed that the use of a privileged port establishes sufficient trust

 — 

dfs.encrypt.data.transfer

Defines whether or not actual block data that is read/written from/to HDFS should be encrypted on the wire. This only needs to be set on the NameNodes and DataNodes, clients will deduce this automatically. It is possible to override this setting per connection by specifying custom logic via dfs.trustedchannel.resolver.class

false

dfs.encrypt.data.transfer.algorithm

This value may be set to either 3des or rc4. If nothing is set, then the configured JCE default on the system is used (usually 3DES). It is widely believed that 3DES is more secure, but RC4 is substantially faster. Note that if AES is supported by both the client and server, then this encryption algorithm will only be used to initially transfer keys for AES

3des

dfs.encrypt.data.transfer.cipher.suites

This value can be either undefined or AES/CTR/NoPadding. If defined, then dfs.encrypt.data.transfer uses the specified cipher suite for data encryption. If not defined, then only the algorithm specified in dfs.encrypt.data.transfer.algorithm is used

 — 

dfs.encrypt.data.transfer.cipher.key.bitlength

The key bitlength negotiated by dfsclient and datanode for encryption. This value may be set to either 128, 192, or 256

128

ignore.secure.ports.for.testing

Allows to skip HTTPS requirements in the SASL mode

false

dfs.client.https.need-auth

Whether SSL client certificate authentication is required

false

httpfs-site.xml
Parameter Description Default value

httpfs.http.administrators

The ACL for the admins. This configuration is used to control who can access the default servlets for HttpFS server. The value should be a comma-separated list of users and groups. The user list comes first and is separated by a space, followed by the group list, for example: user1,user2 group1,group2. Both users and groups are optional, so you can define only users, or groups, or both of them. Notice that in all these cases you should always use the leading space in the groups list. Using the asterisk grants access to all users and groups

*

hadoop.http.temp.dir

The HttpFS temp directory

${hadoop.tmp.dir}/httpfs

httpfs.ssl.enabled

Defines whether SSL is enabled. Default is false, that is disabled

false

httpfs.hadoop.config.dir

The location of the Hadoop configuration directory

/etc/hadoop/conf

httpfs.hadoop.authentication.type

Defines the authentication mechanism used by httpfs for its HTTP clients. Valid values are simple and kerberos. If simple is used, clients must specify the username with the user.name query string parameter. If kerberos is used, HTTP clients must use HTTP SPNEGO or delegation tokens

simple

httpfs.hadoop.authentication.kerberos.keytab

The Kerberos keytab file with the credentials for the HTTP Kerberos principal used by httpfs in the HTTP endpoint. httpfs.authentication.kerberos.keytab is deprecated. Instead, use hadoop.http.authentication.kerberos.keytab

/etc/security/keytabs/httpfs.service.keytab

httpfs.hadoop.authentication.kerberos.principal

The HTTP Kerberos principal used by HttpFS in the HTTP endpoint. The HTTP Kerberos principal MUST start with HTTP/ as per Kerberos HTTP SPNEGO specification. httpfs.authentication.kerberos.principal is deprecated. Instead, use hadoop.http.authentication.kerberos.principal

HTTP/${httpfs.hostname}@${kerberos.realm}

ranger-hdfs-audit.xml
Parameter Description Default value

xasecure.audit.destination.solr.batch.filespool.dir

The spool directory path

/srv/ranger/hdfs_plugin/audit_solr_spool

xasecure.audit.destination.solr.urls

Leave this property value empty or set it to NONE when using ZooKeeper to connect to Solr

 — 

xasecure.audit.destination.solr.zookeepers

Specifies the ZooKeeper connection string for the Solr destination

 — 

xasecure.audit.destination.solr.force.use.inmemory.jaas.config

Uses in-memory JAAS configuration file to connect to Solr

 — 

xasecure.audit.is.enabled

Enables Ranger audit

true

xasecure.audit.jaas.Client.loginModuleControlFlag

Specifies whether the success of the module is required, requisite, sufficient, or optional

 — 

xasecure.audit.jaas.Client.loginModuleName

The name of the authenticator class

 — 

xasecure.audit.jaas.Client.option.keyTab

The name of the keytab file to get the principal’s secret key

 — 

xasecure.audit.jaas.Client.option.principal

The name of the principal to be used

 — 

xasecure.audit.jaas.Client.option.serviceName

Represents a user or a service that wants to log in

 — 

xasecure.audit.jaas.Client.option.storeKey

Set this to true if you want the keytab or the principal’s key to be stored in the subject’s private credentials

false

xasecure.audit.jaas.Client.option.useKeyTab

Set this to true if you want the module to get the principal’s key from the keytab

false

ranger-hdfs-security.xml
Parameter Description Default value

ranger.plugin.hdfs.policy.rest.url

The URL to Ranger Admin

 — 

ranger.plugin.hdfs.service.name

The name of the Ranger service containing policies for this instance

 — 

ranger.plugin.hdfs.policy.cache.dir

The directory where Ranger policies are cached after successful retrieval from the source

/srv/ranger/hdfs/policycache

ranger.plugin.hdfs.policy.pollIntervalMs

Defines how often to poll for changes in policies

30000

ranger.plugin.hdfs.policy.rest.client.connection.timeoutMs

The HDFS Plugin RangerRestClient connection timeout (in milliseconds)

120000

ranger.plugin.hdfs.policy.rest.client.read.timeoutMs

The HDFS Plugin RangerRestClient read timeout (in milliseconds)

30000

ranger.plugin.hdfs.policy.rest.ssl.config.file

The path to the RangerRestClient SSL config file for the HDFS plugin

/etc/hadoop/conf/ranger-hdfs-policymgr-ssl.xml

httpfs-env.sh
Parameter Description Default value

HADOOP_CONF_DIR

Hadoop configuration directory

/etc/hadoop/conf

HADOOP_LOG_DIR

Location of the log directory

${HTTPFS_LOG}

HADOOP_PID_DIR

PID file directory location

${HTTPFS_TEMP}

HTTPFS_SSL_ENABLED

Defines if SSL is enabled for httpfs

false

HTTPFS_SSL_KEYSTORE_FILE

The path to the keystore file

admin

HTTPFS_SSL_KEYSTORE_PASS

The password to access the keystore

admin

Hadoop options
Parameter Description Default value

HDFS_NAMENODE_OPTS

NameNode Heap Memory. Sets initial (-Xms) and maximum (-Xmx) Java heap memory size and environment options for the NameNode

-Xms1G -Xmx8G

HDFS_DATANODE_OPTS

DataNode Heap Memory. Sets initial (-Xms) and maximum (-Xmx) Java heap memory size and environment options for the DataNode

-Xms700m -Xmx8G

HDFS_HTTPFS_OPTS

HttpFS Heap Memory. Sets initial (-Xms) and maximum (-Xmx) Java heap memory size and environment options for the httpfs server

-Xms700m -Xmx8G

HDFS_JOURNALNODE_OPTS

JournalNode Heap Memory. Sets initial (-Xms) and maximum (-Xmx) Java heap memory size and environment options for the JournalNode

-Xms700m -Xmx8G

HDFS_ZKFC_OPTS

ZKFC Heap Memory. Sets initial (-Xms) and maximum (-Xmx) Java heap memory size and environment options for ZKFC

-Xms500m -Xmx8G

ssl-server.xml
Parameter Description Default value

ssl.server.truststore.location

The truststore to be used by NameNodes and DataNodes

 — 

ssl.server.truststore.password

The password to the truststore

 — 

ssl.server.truststore.type

The truststore file format

jks

ssl.server.truststore.reload.interval

The truststore reload check interval (in milliseconds)

10000

ssl.server.keystore.location

The path to the keystore file used by NameNodes and DataNodes

 — 

ssl.server.keystore.password

The password to the keystore

 — 

ssl.server.keystore.keypassword

The password to the key in the keystore

 — 

ssl.server.keystore.type

The keystore file format

 — 

ssl-client.xml
Parameter Description Default value

ssl.client.truststore.location

The truststore to be used by NameNodes and DataNodes

 — 

ssl.client.truststore.password

The password to the truststore

 — 

ssl.client.truststore.location

The truststore to be used by NameNodes and DataNodes

 — 

ssl.client.truststore.type

The truststore file format

jks

ssl.client.truststore.reload.interval

The truststore reload check interval (in milliseconds)

10000

ssl.client.keystore.location

The path to the keystore file used by NameNodes and DataNodes

 — 

ssl.client.keystore.password

The password to the keystore

 — 

ssl.client.keystore.keypassword

The password to the key in the keystore

 — 

ssl.client.keystore.type

The keystore file format

 — 

Lists of decommissioned and in maintenance hosts
Parameter Description Default value

DECOMMISSIONED

When an administrator decommissions a DataNode, the DataNode will first be transitioned into DECOMMISSION_INPROGRESS state. After all blocks belonging to that DataNode are fully replicated elsewhere based on each block replication factor, the DataNode will be transitioned to DECOMMISSIONED state. After that, the administrator can shutdown the node to perform long-term repair and maintenance that could take days or weeks. After the machine has been repaired, the machine can be recommissioned back to the cluster

 — 

IN_MAINTENANCE

Sometimes administrators only need to take DataNodes down for minutes/hours to perform short-term repair/maintenance. For such scenarios, the HDFS block replication overhead, incurred by decommission, might not be necessary and a light-weight process is desirable. And that is what maintenance state is used for. When an administrator puts a DataNode in the maintenance state, the DataNode will first be transitioned to ENTERING_MAINTENANCE state. As long as all blocks belonging to that DataNode, are minimally replicated elsewhere, the DataNode will immediately be transitioned to IN_MAINTENANCE state. After the maintenance has completed, the administrator can take the DataNode out of the maintenance state. In addition, maintenance state supports the timeout that allows administrators to configure the maximum duration, in which a DataNode is allowed to stay in the maintenance state. After the timeout, the DataNode will be transitioned out of maintenance state automatically by HDFS without human intervention

 — 

Other
Parameter Description Default value

Custom core-site.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file core-site.xml

 — 

Custom hdfs-site.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file hdfs-site.xml

 — 

Custom httpfs-site.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file httpfs-site.xml

 — 

Ranger plugin enabled

Whether or not Ranger plugin is enabled

 — 

Custom ranger-hdfs-audit.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ranger-hdfs-audit.xml

 — 

Custom ranger-hdfs-security.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ranger-hdfs-security.xml

 — 

Custom ranger-hdfs-policymgr-ssl.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ranger-hdfs-policymgr-ssl.xml

 — 

Custom httpfs-env.sh

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file httpfs-env.sh

 — 

Custom ssl-server.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ssl-server.xml

 — 

Custom ssl-client.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ssl-client.xml

 — 

Topology script

The topology script used in HDFS

 — 

Topology data

An otional text file to map host names to the rack number for topology script. Stored to /etc/hadoop/conf/topology.data

 — 

Custom log4j.properties

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file log4j.properties

Custom httpfs-log4j.properties

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file httpfs-log4j.properties

Hive

hive-env.sh
Parameter Description Default value

HADOOP_CLASSPATH

A colon-delimited list of directories, files, or wildcard locations that include all necessary classes

/etc/tez/conf/:/usr/lib/tez/:/usr/lib/tez/lib/

HIVE_HOME

The Hive home directory

/usr/lib/hive

METASTORE_PORT

The Hive Metastore port

9083

Hive heap memory settings
Parameter Description Default value

HiveServer2 Heap Memory

Sets initial (-Xms) and maximum (-Xmx) Java heap size for HiveServer2

-Xms256m -Xmx256m

Hive Metastore Heap Memory

Sets initial (-Xms) and maximum (-Xmx) Java heap size for Hive Metastore

-Xms256m -Xmx256m

hive-site.xml
Parameter Description Default value

hive.cbo.enable

When set to true, enables the cost-based optimizer that uses the Calcite framework

true

hive.compute.query.using.stats

When set to true, Hive will answer a few queries like min, max, and count(1) purely using statistics stored in the Metastore. For basic statistics collection, set the configuration property hive.stats.autogather to true. For more advanced statistics collection, run the ANALYZE TABLE queries

false

hive.execution.engine

Selects the execution engine. Supported values are: mr (Map Reduce, default), tez (Tez execution, for Hadoop 2 only), or spark (Spark execution, for Hive 1.1.0 onward)

Tez

hive.log.explain.output

When enabled, logs the EXPLAIN EXTENDED output for the query at log4j INFO level and in the HiveServer2 web UI (Drilldown → Query Plan). Starting Hive 3.1.0, this configuration property only logs as the log4j INFO. To log the EXPLAIN EXTENDED output in WebUI/Drilldown/Query Plan in Hive 3.1.0 and later, use hive.server2.webui.explain.output

true

hive.metastore.event.db.notification.api.auth

Defines whether the Metastore should perform the authorization against database notification related APIs such as get_next_notification. If set to true, then only the superusers in proxy settings have the permission

false

hive.metastore.uris

The Metastore URI used to access metadata in a remote metastore setup. For a remote metastore, you should specify the Thrift metastore server URI: thrift://<hostname>:<port> where <hostname> is a name or IP address of the Thrift metastore server, <port> is the port, on which the Thrift server is listening

 — 

hive.metastore.warehouse.dir

The absolute HDFS file path of the default database for the warehouse, that is local to the cluster

/apps/hive/warehouse

hive.server2.enable.doAs

Impersonate the connected user

false

hive.stats.fetch.column.stats

Annotation of the operator tree with statistics information requires column statistics. Column statistics are fetched from the Metastore. Fetching column statistics for each needed column can be expensive, when the number of columns is high. This flag can be used to disable fetching of column statistics from the Metastore

 — 

hive.tez.container.size

By default, Tez will spawn containers of the size of a mapper. This parameter can be used to overwrite the default value

 — 

hive.support.concurrency

Defines whether Hive should support concurrency or not. A ZooKeeper instance must be up and running for the default Hive Lock Manager to support read/write locks

false

hive.txn.manager

Set this to org.apache.hadoop.hive.ql.lockmgr.DbTxnManager as part of turning on Hive transactions. The default DummyTxnManager replicates pre-Hive-0.13 behavior and provides no transactions

 — 

javax.jdo.option.ConnectionUserName

The metastore database user name

APP

javax.jdo.option.ConnectionPassword

The password for the metastore user name

 — 

javax.jdo.option.ConnectionURL

The JDBC connection URI used to access the data stored in the local Metastore setup. Use the following connection URI: jdbc:<datastore type>://<node name>:<port>/<database name> where:

  • <node name> is the host name or IP address of the data store;

  • <data store type> is the type of the data store;

  • <port> is the port on which the data store listens for remote procedure calls (RPC);

  • <database name> is the name of the database.

For example, the following URI specifies a local metastore that uses MySQL as a data store: jdbc:mysql://hostname23:3306/metastore

jdbc:mysql://{{ groups['mysql.master'][0] | d(omit) }}:3306/hive

javax.jdo.option.ConnectionDriverName

The JDBC driver class name used to access Hive Metastore

com.mysql.jdbc.Driver

hive.server2.transport.mode

Sets the transport mode

tcp

hive.server2.thrift.http.port

The port number for Thrift Server2 to listen on

10001

hive.server2.thrift.http.path

The HTTP endpoint of the Thrift Server2 service

cliservice

hive.server2.authentication.kerberos.principal

Hive server Kerberos principal

hive/_HOST@EXAMPLE.COM

hive.server2.authentication.kerberos.keytab

The path to the Kerberos keytab file containing the Hive server service principal

/etc/security/keytabs/hive.service.keytab

hive.server2.authentication.spnego.principal

The SPNEGO Kerberos principal

HTTP/_HOST@EXAMPLE.COM

hive.server2.webui.spnego.principal

The SPNEGO Kerberos principal to access Web UI

 — 

hive.server2.webui.spnego.keytab

The SPNEGO Kerberos keytab file to access Web UI

 — 

hive.server2.webui.use.spnego

Defines whether to use Kerberos SPNEGO for Web UI access

false

hive.server2.authentication.spnego.keytab

The path to SPNEGO principal

/etc/security/keytabs/HTTP.service.keytab

hive.server2.authentication

Sets the authentication mode

NONE

hive.metastore.sasl.enabled

If true, the Metastore Thrift interface will be secured with SASL. Clients must authenticate with Kerberos

false

hive.metastore.kerberos.principal

The service principal for the metastore Thrift server. The _HOST token will be automatically replaced with the appropriate host name

hive/_HOST@EXAMPLE.COM

hive.metastore.kerberos.keytab.file

The path to the Kerberos keytab file containing the metastore Thrift server’s service principal

/etc/security/keytabs/hive.service.keytab

hive.server2.use.SSL

Defines whether to use SSL for HiveServer2

false

hive.server2.keystore.path

The keystore to be used by Hive

 — 

hive.server2.keystore.password

The password to the Hive keystore

 — 

hive.server2.truststore.path

The truststore to be used by Hive

 — 

hive.server2.webui.use.ssl

Defines whether to use SSL for the Hive web UI

false

hive.server2.webui.keystore.path

The path to the keystore file used to access the Hive web UI

 — 

hive.server2.webui.keystore.password

The password to the keystore file used to access the Hive web UI

 — 

hive.server2.support.dynamic.service.discovery

Defines whether to support dynamic service discovery via ZooKeeper

false

hive.zookeeper.quorum

A comma-separated list of ZooKeeper servers (<host>:<port>) running in the cluster

zookeeper:2181

hive.server2.zookeeper.namespace

Specifies the root namespace on ZooKeeper

hiveserver2

ranger-hive-audit.xml
Parameter Description Default value

xasecure.audit.destination.solr.batch.filespool.dir

The spool directory path

/srv/ranger/hdfs_plugin/audit_solr_spool

xasecure.audit.destination.solr.urls

Leave this property value empty or set it to NONE when using ZooKeeper to connect to Solr

 — 

xasecure.audit.destination.solr.zookeepers

Specifies the ZooKeeper connection string for the Solr destination

 — 

xasecure.audit.destination.solr.force.use.inmemory.jaas.config

Uses in-memory JAAS configuration file to connect to Solr

 — 

xasecure.audit.is.enabled

Enables Ranger audit

true

xasecure.audit.jaas.Client.loginModuleControlFlag

Specifies whether the success of the module is required, requisite, sufficient, or optional

 — 

xasecure.audit.jaas.Client.loginModuleName

The name of the authenticator class

 — 

xasecure.audit.jaas.Client.option.keyTab

The name of the keytab file to get the principal’s secret key

 — 

xasecure.audit.jaas.Client.option.principal

The name of the principal to be used

 — 

xasecure.audit.jaas.Client.option.serviceName

Represents a user or a service that wants to log in

 — 

xasecure.audit.jaas.Client.option.storeKey

Set this to true if you want the keytab or the principal’s key to be stored in the subject’s private credentials

false

xasecure.audit.jaas.Client.option.useKeyTab

Set this to true if you want the module to get the principal’s key from the keytab

false

ranger-hive-security.xml
Parameter Description Default value

ranger.plugin.hive.policy.rest.url

The URL to Ranger Admin

 — 

ranger.plugin.hive.service.name

The name of the Ranger service containing policies for this instance

 — 

ranger.plugin.hive.policy.cache.dir

The directory where Ranger policies are cached after successful retrieval from the source

/srv/ranger/hive/policycache

ranger.plugin.hive.policy.pollIntervalMs

Defines how often to poll for changes in policies

30000

ranger.plugin.hive.policy.rest.client.connection.timeoutMs

The Hive Plugin RangerRestClient connection timeout (in milliseconds)

120000

ranger.plugin.hive.policy.rest.client.read.timeoutMs

The Hive Plugin RangerRestClient read timeout (in milliseconds)

30000

xasecure.hive.update.xapolicies.on.grant.revoke

Controls Hive Ranger policy update from SQL Grant/Revoke commands

true

ranger.plugin.hive.policy.rest.ssl.config.file

The path to the RangerRestClient SSL config file for the Hive plugin

/etc/hive/conf/ranger-hive-policymgr-ssl.xml

ranger-hive-policymgr-ssl.xml
Parameter Description Default value

xasecure.policymgr.clientssl.keystore

The path to the keystore file used by Ranger

 — 

xasecure.policymgr.clientssl.keystore.credential.file

The path to the keystore credentials file

/etc/hive/conf/ranger-hive.jceks

xasecure.policymgr.clientssl.truststore.credential.file

The path to the truststore credentials file

/etc/hive/conf/ranger-hive.jceks

xasecure.policymgr.clientssl.truststore

The path to the truststore file used by Ranger

 — 

xasecure.policymgr.clientssl.keystore.password

The password to the keystore file

 — 

xasecure.policymgr.clientssl.truststore.password

The password to the truststore file

 — 

tez-site.xml
Parameter Description Default value

tez.am.resource.memory.mb

The amount of memory in MB, that YARN will allocate to the Tez Application Master. The size increases with the size of the DAG

 — 

tez.history.logging.service.class

Enables Tez to use the Timeline Server for History Logging

org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService

tez.lib.uris

HDFS paths containing the Tez JAR files

${fs.defaultFS}/apps/tez/tez-0.9.2.tar.gz

tez.task.resource.memory.mb

The amount of memory used by launched tasks in TEZ containers. Usually this value is set in the DAG

 — 

tez.tez-ui.history-url.base

The URL where the Tez UI is hosted

 — 

tez.use.cluster.hadoop-libs

Specifies, whether Tez will use the cluster Hadoop libraries

true

nginx.conf
Parameter Description Default value

ssl_certificate

The path to the SSL certificate for NGINX

/etc/ssl/certs/host_cert.cert

ssl_certificate_key

The path to the SSL certificate key for NGINX

/etc/ssl/host_cert.key

Other
Parameter Description Default value

ACID Transactions

Defines whether to enable ACID transactions

false

Database type

The type of the external database used for Hive Metastore

mysql

Custom hive-site.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file hive-site.xml

 — 

Custom hive-env.sh

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file hive-env.sh

 — 

Ranger plugin enabled

Whether or not Ranger plugin is enabled

false

Custom ranger-hive-audit.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ranger-hive-audit.xml

 — 

Custom ranger-hive-security.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ranger-hive-security.xml

 — 

Custom ranger-hive-policymgr-ssl.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ranger-hive-policymgr-ssl.xml

 — 

Custom tez-site.xml

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file tez-site.xml

 — 

HUE

Parameter Description Default value

__main_info

Short info on HUE and a link to the HUE Server web interface displayed on the Info tab

 — 

installed_switch

Displays whether the HUE service is installed

 — 

compatible_os_families
Parameter Description Default value

compatible_os_families [0]

OS family compatible with HUE

Altlinux-8

compatible_os_families [1]

OS family compatible with HUE

Astra Linux-1

compatible_os_families [2]

OS family compatible with HUE

RedHat-7

components
Parameter Description Default value

hue_server

System environment variables for the HUE service name and log directory

{"logs": [{"type": "file", "path": "/var/log/hue/hue-server"}], "systemd": {"service_name": "hue-server"}}

actions
Parameter Description Default value

default

 — 

{"order": ["main", "hue_server"]}

start

 — 

{"order": ["hue_server"]}

stop

 — 

{"order": ["hue_server"]}

restart

 — 

{"order": ["hue_server"]}

statuschecker

 — 

{"order": ["hue_server"]}

check

 — 

{"order": ["main", "hue_server"]}

The HUE Server component
hue.ini syntax

The hue.ini configuration file displayed in ADCM has different syntax from its original syntax. In original file, the nesting level is determined by placing the section names in the corresponding number of square brackets. Example:

[notebook]
show_notebooks=true
[[interpreters]]
[[[mysql]]]
name = MySQL
interface=sqlalchemy
options='{"url": "mysql://root:secret@database:3306/hue"}'
[[[hive]]]
name=Hive
interface=hiveserver2

In ADCM, the nesting level is determined by separating the section names with periods. The structure from the above example will look the following way:

notebook.show_notebooks: true
notebook.interpreters.mysql.name: MySQL
notebook.interpreters.mysql.interface: sqlalchemy
notebook.interpreters.mysql.options: '{"url": "mysql://root:secret@database:3306/hue"}'
notebook.interpreters.hive.name: Hive
notebook.interpreters.hive.interface: hiveserver2
hue.ini
Parameter Description Default value

desktop.http_host

HUE Server listening IP address

0.0.0.0

desktop.http_port

HUE Server listening port

8000

desktop.use_cherrypy_server

Defines whether CherryPy (true) or Gunicorn (false) is used as the webserver

false

desktop.gunicorn_work_class

Gunicorn work class: gevent, eventlet, gthread, or sync

gthread

desktop.secret_key

Random string used for secure hashing in the session store

jFE93j;2[290-eiw.KEiwN2s3['d;/.q[eIW^y#e=+Iei*@Mn<qW5o

desktop.enable_xff_for_hive_impala

Defines whether the X-Forwarded-For header is used if Hive or Impala require it

false

desktop.enable_x_csrf_token_for_hive_impala

Defines whether the X-CSRF-Token header is used if Hive or Impala require it

false

desktop.app_blacklist

Comma-separated list of apps to not load at server startup

security,pig,sqoop,oozie,hbase,search

desktop.auth.backend

Comma-separated list of authentication backend combinations in order of priority

desktop.auth.backend.AllowFirstUserDjangoBackend

desktop.database.host

HUE Server database network or IP address

{% raw -%}{{ groups['adpg.adpg'][0] | d(omit) }}{% endraw -%}

desktop.database.port

HUE Server database network port

5432

desktop.database.engine

Engine used by the HUE Server database

postgresql_psycopg2

desktop.database.user

Admin username for the HUE Server database

hue

desktop.database.name

HUE Server database name

hue

desktop.database.password

Password for the desktop.database.user username

 — 

Interpreter Impala
Parameter Description Default value

notebook.interpreters.impala.name

Impala interpreter name

impala

notebook.interpreters.impala.interface

Interface for the Impala interpreter

hiveserver2

impala.server_host

Host of the Impala Server (one of the Impala Daemon components)

 — 

impala.server_port

Port of the Impala Server

21050

impala.impersonation_enabled

Enables the impersonation mechanism during interaction with Impala

true

impala.impala_conf_dir

Path to the Impala configuration directory that contains the impalad_flags file

/etc/hue/conf

impala.ssl.cacerts

Path to the CA certificates

/etc/pki/tls/certs/ca-bundle.crt

impala.ssl.validate

Defines whether HUE should validate certificates received from the server

false

impala.ssl.enabled

Enables SSL communication for this server

false

impala.impala_principal

Kerberos principal name for Impala

 — 

Interpreter HDFS
Parameter Description Default value

hadoop.hdfs_clusters.default.webhdfs_url

WebHDFS or HttpFS endpoint link for accessing HDFS data

 — 

hadoop.hdfs_clusters.default.hadoop_conf_dir

Path to the directory of the Hadoop configuration files

/etc/hadoop/conf

hadoop.hdfs_clusters.default.security_enabled

Defines whether the Hadoop cluster is secured by Kerberos

false

hadoop.hdfs_clusters.default.ssl_cert_ca_verify

Defines whether to verify SSL certificates against the CA

false

Interpreter Hive
Parameter Description Default value

notebook.interpreters.hive.name

Hive interpreter name

hive

notebook.interpreters.hive.interface

Interface for the Hive interpreter

hiveserver2

beeswax.hive_discovery_hs2

Defines whether to use service discovery for HiveServer2

true

beeswax.hive_conf_dir

Path to the Hive configuration directory containing the hive-site.xml file

/etc/hive/conf

beeswax.use_sasl

Defines whether to use the SASL framework to establish connection to host

true

beeswax.hive_discovery_hiveserver2_znode

Hostname of the znode of the HiveServer2 if Hive is using ZooKeeper service discovery mode

hive.server2.zookeeper.namespace

libzookeeper.ensemble

List of ZooKeeper ensemble members hosts and ports

host1:2181,host2:2181,host3:2181

libzookeeper.principal_name

Kerberos principal name for ZooKeeper

 — 

Interpreter YARN
Parameter Description Default value

hadoop.yarn_clusters.default.resourcemanager_host

Network address of the host where the Resource Manager is running

 — 

hadoop.yarn_clusters.default.resourcemanager_port

Port listened by the Resource Manager IPC

 — 

hadoop.yarn_clusters.default.submit_to

Defines whether the jobs are submitted to this cluster

true

hadoop.yarn_clusters.default.logical_name

Resource Manager logical name (required for High Availability mode)

 — 

hadoop.yarn_clusters.default.security_enabled

Defines whether the YARN cluster is secured by Kerberos

false

hadoop.yarn_clusters.default.ssl_cert_ca_verify

Defines whether to verify the SSL certificates from YARN Rest APIs against the CA when using the secure mode (HTTPS)

false

hadoop.yarn_clusters.default.resourcemanager_api_url

URL of the Resource Manager API

 — 

hadoop.yarn_clusters.default.proxy_api_url

URL of the first Resource Manager API

 — 

hadoop.yarn_clusters.default.spark_history_server_url

URL of the Spark History Server

 — 

hadoop.yarn_clusters.default.spark_history_server_security_enabled

Defines whether the Spark History Server is secured by Kerberos

false

hadoop.yarn_clusters.ha.resourcemanager_host

Network address of the host where the Resource Manager is running (High Availability mode)

 — 

hadoop.yarn_clusters.ha.resourcemanager_port

Port listened by the Resource Manager IPC (High Availability mode)

 — 

hadoop.yarn_clusters.ha.logical_name

Resource Manager logical name (required for High Availability mode)

 — 

hadoop.yarn_clusters.ha.security_enabled

Defines whether the YARN cluster is secured by Kerberos (High Availability mode)

false

hadoop.yarn_clusters.ha.submit_to

Defines whether the jobs are submitted to this cluster (High Availability mode)

true

hadoop.yarn_clusters.ha.ssl_cert_ca_verify

Defines whether to verify the SSL certificates from YARN Rest APIs against the CA when using the secure mode (HTTPS) (High Availability mode)

false

hadoop.yarn_clusters.ha.resourcemanager_api_url

URL of the Resource Manager API (High Availability mode)

 — 

hadoop.yarn_clusters.ha.history_server_api_url

URL of the History Server API

 — 

Interpreter Spark3
Parameter Description Default value

notebook.interpreters.sparksql.name

Spark3 interpreter name

Spark3 SQL

notebook.interpreters.hive.interface

Interface for the Hive interpreter

hiveserver2

spark.sql_server_host

Hostname of the SQL server

 — 

spark.sql_server_port

Port of the SQL server

 — 

spark.security_enabled

Defines whether the Spark3 cluster is secured by Kerberos

false

spark.ssl_cert_ca_verify

Defines whether to verify SSL certificates against the CA

false

spark.use_sasl

Defines whether to use the SASL framework to establish connection to host

true

spark.spark_impersonation_enabled

Enables the impersonation mechanism during interaction with Spark3

true

spark.spark_principal

Kerberos principal name for Spark3

 — 

Interpreter Kyuubi
Parameter Description Default value

notebook.dbproxy_extra_classpath

Classpath to be appended to the default DBProxy server classpath

/usr/share/java/kyuubi-hive-jdbc.jar

notebook.interpreters.kyuubi.name

Kyuubi interpreter name

Kyuubi[Spark3]

notebook.interpreters.kyuubi.options

Special parameters for connection to the Kyuubi server

 — 

notebook.interpreters.kyuubi.interface

Interface for the Kyuubi service

jdbc

hue.ini kerberos config
Parameter Description Default value

desktop.kerberos.hue_keytab

Path to HUE Kerberos keytab file

 — 

desktop.kerberos.hue_principal

Kerberos principal name for HUE

 — 

desktop.kerberos.kinit_path

Path to kinit utility

/usr/bin/kinit

desktop.kerberos.reinit_frequency

Time interval in seconds for HUE to renew its keytab

3600

desktop.kerberos.ccache_path

Path to cached Kerberos credentials

/tmp/hue_krb5_ccache

desktop.kerberos.krb5_renewlifetime_enabled

This must be set to false if the renew_lifetime parameter in krb5.conf file is set to 0m

false

hue.ini SSL config
Parameter Description Default value

desktop.ssl_certificate

Path to the SSL certificate file

/etc/ssl/certs/host_cert.cert

desktop.ssl_private_key

Path to the SSL RSA private key file

/etc/ssl/host_cert.key

desktop.ssl_password

SSL certificate password

 — 

desktop.ssl_no_renegotiation

Disables all renegotiation in TLSv1.2 and earlier

true

desktop.ssl_validate

Defines whether HUE should validate certificates received from the server

false

desktop.ssl_cacerts

This must be set to false if the renew_lifetime parameter in krb5.conf file is set to 0m

/etc/pki/tls/certs/ca-bundle.crt

desktop.session.secure

Defines whether the cookie containing the user’s session ID and csrf cookie will use the secure flag

true

desktop.session.http_only

Defines whether the cookie containing the user’s session ID and csrf cookie will use the HTTP only flag

false

LDAP security
Parameter Description Default value

desktop.ldap.ldap_url

URL of the LDAP server

 — 

desktop.ldap.base_dn

The search base for finding users and groups

"DC=mycompany,DC=com"

desktop.ldap.nt_domain

The NT domain used for LDAP authentication

mycompany.com

desktop.ldap.ldap_cert

Certificate files in PEM format for the CA that HUE will trust for authentication over TLS

 — 

desktop.ldap.use_start_tls

Set this to true if you are not using Secure LDAP (LDAPS) but want to establish secure connections using TLS

true

desktop.ldap.bind_dn

Distinguished name of the user to bind as

"CN=ServiceAccount,DC=mycompany,DC=com"

desktop.ldap.bind_password

Password of the bind user

 — 

desktop.ldap.ldap_username_pattern

Pattern for username search. Specify the <username> placeholder for this parameter

"uid=<username>,ou=People,dc=mycompany,dc=com"

desktop.ldap.create_users_on_login

Defines whether to create users in HUE when they try to login with their LDAP credentials

true

desktop.ldap.sync_groups_on_login

Defines whether to synchronize users groups when they login

true

desktop.ldap.login_groups

A comma-separated list of LDAP groups containing users that are allowed to login

 — 

desktop.ldap.ignore_username_case

Defines whether to ignore the case of usernames when searching for existing users

true

desktop.ldap.force_username_lowercase

Defines whether to force use lowercase for usernames when creating new users from LDAP

true

desktop.ldap.force_username_uppercase

Defines whether to force use uppercase for usernames when creating new users from LDAP. This parameter cannot be combined with desktop.ldap.force_username_lowercase

false

desktop.ldap.search_bind_authentication

Enables search bind authentication

true

desktop.ldap.subgroups

Specifies the kind of subgrouping to use: nested or suboordinate (deprecated)

nested

desktop.ldap.nested_members_search_depth

The number of levels to search for nested members

10

desktop.ldap.follow_referrals

Defines whether to follow referrals

false

desktop.ldap.users.user_filter

Base filter for users search

"objectclass=*"

desktop.ldap.users.user_name_attr

The username attribute in the LDAP schema

sAMAccountName

desktop.ldap.groups.group_filter

Base filter for groups search

"objectclass=*"

desktop.ldap.groups.group_name_attr

The group name attribute in the LDAP schema

cn

desktop.ldap.groups.group_member_attr

The attribute of the group object that identifies the group members

member

Others
Parameter Description Default value

Enable custom ulimits

Switch on the corresponding toggle button to specify resource limits (ulimits) for the current process. If you do not set these values, the default system settings are used. Ulimit settings are described in the table below

[Manager]
DefaultLimitCPU=
DefaultLimitFSIZE=
DefaultLimitDATA=
DefaultLimitSTACK=
DefaultLimitCORE=
DefaultLimitRSS=
DefaultLimitNOFILE=
DefaultLimitAS=
DefaultLimitNPROC=
DefaultLimitMEMLOCK=
DefaultLimitLOCKS=
DefaultLimitSIGPENDING=
DefaultLimitMSGQUEUE=
DefaultLimitNICE=
DefaultLimitRTPRIO=
DefaultLimitRTTIME=

Custom hue.ini

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file hue.ini. List of available parameters can be found in the HUE documentation

 — 

Ulimit settings
Parameter Description Corresponding option of the ulimit command in CentOS

DefaultLimitCPU

A limit in seconds on the amount of CPU time that a process can consume

cpu time ( -t)

DefaultLimitFSIZE

The maximum size of files that a process can create, in 512-byte blocks

file size ( -f)

DefaultLimitDATA

The maximum size of a process’s data segment, in kilobytes

data seg size ( -d)

DefaultLimitSTACK

The maximum stack size allocated to a process, in kilobytes

stack size ( -s)

DefaultLimitCORE

The maximum size of a core dump file allowed for a process, in 512-byte blocks

core file size ( -c)

DefaultLimitRSS

The maximum of resident set size, in kilobytes

max memory size ( -m)

DefaultLimitNOFILE

The maximum number of open file descriptors allowed for the process

open files ( -n)

DefaultLimitAS

The maximum size of the process virtual memory (address space), in kilobytes

virtual memory ( -v)

DefaultLimitNPROC

The maximum number of processes

max user processes ( -u)

DefaultLimitMEMLOCK

The maximum memory size that can be locked for the process, in kilobytes. Memory locking ensures the memory is always in RAM and a swap file is not used

max locked memory ( -l)

DefaultLimitLOCKS

The maximum number of files locked by a process

file locks ( -x)

DefaultLimitSIGPENDING

The maximum number of signals that are pending for delivery to the calling thread

pending signals ( -i)

DefaultLimitMSGQUEUE

The maximum number of bytes in POSIX message queues. POSIX message queues allow processes to exchange data in the form of messages

POSIX message queues ( -q)

DefaultLimitNICE

The maximum NICE priority level that can be assigned to a process

scheduling priority ( -e)

DefaultLimitRTPRIO

The maximum real-time scheduling priority level

real-time priority ( -r)

DefaultLimitRTTIME

The maximum pipe buffer size, in 512-byte blocks

pipe size ( -p)

Impala

Parameter Description Default value

impala-env.sh

The contents of the impala-env.sh file that contains Impala environment settings

ranger-hive-audit.xml
Parameter Description Default value

xasecure.audit.destination.solr.batch.filespool.dir

The Solr spool directory location

/srv/ranger/impala_plugin/audit_solr_spool

xasecure.audit.is.enabled:

Enables Ranger audit for Impala

true

ranger-hive-security.xml
Parameter Description Default value

ranger.plugin.hive.service.name

Name of the Ranger service containing policies for this Impala instance

 — 

ranger.plugin.hive.policy.cache.dir

Directory, where Ranger policies are cached after a successful retrieval from the source

/srv/ranger/impala/policycache

ranger.plugin.hive.policy.pollIntervalMs

How often to poll for changes in policies in milliseconds

30000

ranger.plugin.hive.policy.rest.client.connection.timeoutMs

Impala plugin connection timeout in milliseconds

120000

ranger.plugin.hive.policy.rest.client.read.timeoutMs

Impala plugin read timeout in milliseconds

30000

xasecure.hive.update.xapolicies.on.grant.revoke

Specifies whether the Impala plugin should update the Ranger policies on the updates to permissions done using GRANT/REVOKE

true

Enable LDAP
Parameter Description Default value

ldap_uri

URI of the LDAP server. Typically, the URI is prefixed with ldap:// or ldaps:// for SSL-based LDAP transport. The URI can optionally specify the port, for example: ldap://ldap_server.example.com:389

 — 

ldap_domain

Replaces the username with a string <username>@ldap_domain, where <username> is the name of the user trying to authenticate. Mutually exclusive with ldap_baseDN and ldap_bind_pattern

 — 

ldap_bind_dn

Distinguished name of the user to bind to for user/group searches. Required only if the user or group filters are being used and the LDAP server is not configured to allow anonymous searches

 — 

ldap_bind_password

Password of the user to bind to for user/group searches. Required only if the anonymous bind is not activated

 — 

ldap_user_search_basedn

The base DN for the LDAP subtree to search users

 — 

ldap_group_search_basedn

The base DN for the LDAP subtree to search groups

 — 

ldap_baseDN

Search base. Replaces the username with a DN of the form: uid=<userid>,ldap_baseDN, where <userid> is the username of the user trying to authenticate. Mutually exclusive with ldap_domain and ldap_bind_pattern

 — 

ldap_user_filter

A filter for both simple and search bind mechanisms. For a simple bind, it is a comma-separated list of user names. If specified, users must be on this list for authentication to succeed. For a search bind, it is an LDAP filter that will be used during an LDAP search, it can contain the {0} pattern which will be replaced with the user name

 — 

ldap_group_filter

Comma-separated list of groups. If specified, users must belong to one of these groups for authentication to succeed

 — 

ldap_allow_anonymous_binds

When true, LDAP authentication with a blank password (an anonymous bind) is allowed by Impala

false

ldap_search_bind_authentication

Allows switching between the search and simple bind user lookup methods when authenticating

true

ldap_ca_certificate

Specifies the location of the certificate in standard PEM format for SSL. Store this certificate on the local filesystem, in a location that only the impala user and other trusted users can read

 — 

ldap_passwords_in_clear_ok

Enables the webserver to start with the LDAP authentication even if SSL is not enabled. If set to true, the auth_creds_ok_in_clear parameter in the impalarc file gets set to true as well. A potentially unsecure configuration

false

ldap_bind_pattern

A string in which the #UID instance is replaced with the user id. For example, if this parameter is set to user=#UID,OU=foo,CN=bar and the user henry tries to authenticate, the constructed bind name will be user=henry,OU=foo,CN=bar. Mutually exclusive with ldap_domain and ldap_baseDN

 — 

allow_custom_ldap_filters_with_kerberos_auth

Specifies whether to allow custom LDAP user and group filters even if Kerberos is enabled

true

The Impala Daemon component
impalastore.conf
Parameter Description Default value

hostname

The hostname to use for the Impala daemon. If Kerberos is enabled, it is also used as a part of the Kerberos principal. If this option is not set, the system default is used

 — 

beeswax_port

The port on which Impala daemons serve Beeswax client requests

21000

fe_port

The frontend port of the Impala daemon

21000

be_port

Internal use only. Impala daemons use this port for Thrift-based communication with each other

22000

krpc_port

Internal use only. Impala daemons use this port for KRPC-based communication with each other

27000

hs2_port

The port on which Impala daemons serve HiveServer2 client requests

21050

hs2_http_port

The port is used by client applications to transmit commands and receive results over HTTP via the HiveServer2 protocol

28000

enable_webserver

Enables or disables the Impala daemon web server. Its Web UI contains information about configuration settings, running and completed queries, and associated resource usage for them. It is primarily used for diagnosing query problems that can be traced to a particular node

True

webserver_require_spnego

Enables the Kerberos authentication for Hadoop HTTP web consoles for all roles of this service using the SPNEGO protocol. Use this option only if Kerberos is enabled for the HDFS service

False

webserver_port

The port where the Impala daemon web server is running

25000

catalog_service_host

The host where the Impala Catalog Service component is running

 — 

catalog_service_port

The port on which the Impala Catalog Service component listens

26000

state_store_host

The host where the Impala Statestore component is running

 — 

state_store_port

The port on which the Impala Statestore component is running

24000

state_store_subscriber_port

The port where StateStoreSubscriberService is running. StateStoreSubscriberService listens on this port for updates from the Statestore daemon

23030

scratch_dirs

The directory where Impala Daemons writes data to free up memory during large sort, join, aggregation, and other operations. The files are removed when the operation finishes. This can potentially be large amounts of data

/srv/impala/

log_dir

The directory where an Impala daemon places its log files

/var/log/impala/impalad/

log_filename

The Prefix of the log filename — the full path is <log_dir>/<log_filename>

impalad

max_log_files

The number of log files that are kept for each severity level (INFO, WARNING, ERROR, and FATAL) before older log files are removed. The number should be greater than 1 to keep at least the current log file to remain open. If set to 0, all log files are retained and log rotation is disabled

10

audit_event_log_dir

The directory in which Impala daemon audit event log files are written if the Impala Audit Event Generation property is enabled

/var/log/impala/impalad/audit

minidump_path

The directory for storing Impala daemon Breakpad dumps

/var/log/impala-minidumps

lineage_event_log_dir

The directory in which the Impala daemon generates its lineage log files if the Impala Lineage Generation property is enabled

/var/log/impala/impalad/lineage

local_library_dir

The local directory into which an Impala daemon copies user-defined function (UDF) libraries from HDFS

/usr/lib/impala/udfs

max_lineage_log_file_size

The maximum size (in entries) of the Impala daemon lineage log file. When the size is exceeded, a new file is created

5000

max_audit_event_log_file_size

The maximum size (in queries) of the Impala Daemon audit event log file. When the size is exceeded, a new file is created

5000

fe_service_threads

The maximum number of concurrent client connections allowed. The parameter determines how many queries can run simultaneously. When more clients try to connect to Impala, the later arriving clients have to wait until previous clients disconnect. Setting the fe_service_threads value too high could negatively impact query latency

64

mem_limit

The memory limit (in bytes) for an Impala daemon enforced by the daemon itself. This limit does not include memory consumed by the daemon’s embedded JVM. The Impala daemon uses up this amount of memory for query processing, cached data, network buffers, background operations, etc. If the limit is exceeded, queries will be killed until the used memory becomes under the limit

1473249280

idle_query_timeout

The time in seconds after which an idle query (no processing work is done and no updates are received from the client) is cancelled. If set to 0, idle queries are never expired

0

idle_session_timeout

The time in seconds after which Impala closes an idle session and cancels all running queries. If set to 0, idle sessions never expire

0

max_result_cache_size

The maximum number of query results a client can request to be cached on a per-query basis to support restarting fetches. This option guards against unreasonably large result caches. Requests exceeding this maximum are rejected

100000

max_cached_file_handles

The maximum number of cached HDFS file handles. Caching HDFS file handles reduces the number of new file handles opened and thus reduces the load on a HDFS NameNode. Each cached file handle consumes a small amount of memory. If set to 0, the file handle caching is disabled

20000

unused_file_handle_timeout_sec

The maximum time in seconds during which an unused HDFS file handle remains in the HDFS file handle cache. When the underlying file for a cached file handle is deleted, the disk space may not be freed until the cached file handle is removed from the cache. This timeout allows the disk space occupied by deleted files to be freed in a predictable period of time. If set to 0, unused cached HDFS file handles are not removed

21600

statestore_subscriber_timeout_seconds

The timeout in seconds for Impala Daemon and Catalog Server connections to Statestore

30

default_query_options

A list of key/value pairs representing additional query options to pass to the Impala Daemon command line, separated by commas

default_file_format=parquet,default_transactional_type=none

load_auth_to_local_rules

If checked (True) and Kerberos is enabled for Impala, Impala uses the auth_to_local option from hadoop.security.auth_to_local rules of the HDFS configuration

True

catalog_topic_mode

The granularity of on-demand metadata fetches between the Impala Daemon coordinator and Impala Catalog Service. See Metadata management

minimal

use_local_catalog

Allows coordinators to cache metadata from Impala Catalog Service. If this is set to True, coordinators pull metadata as needed from catalogd and cache it locally. The cached metadata is automatically removed under memory pressure or after an expiration time. See Metadata management

True

abort_on_failed_audit_event

Specifies whether shutdown Impala if there is a problem with recording an audit event

False

max_minidumps

The maximum number of Breakpad dump files stored by the Impala daemon. A negative value or 0 is interpreted as an unlimited number

9

authorized_proxy_user_config

Specifies the set of authorized proxy users (the users who can impersonate other users during authorization), and users who they are allowed to impersonate. The example of syntax for the option is: authenticated_user1=delegated_user1,delegated_user2;authenticated_user2=*. See Configuring Impala delegation for clients. The list can contain short usernames or * to indicate all users

knox=*;zeppelin=*

queue_wait_timeout_ms

The maximum amount of time (in milliseconds) that a request waits to be admitted before timing out. Must be a positive integer

60000

disk_spill_encryption

Specifies whether to encrypt and verify the integrity of all data spilled to the disk as part of a query

False

abort_on_config_error

Specifies whether to abort Impala startup if there are incorrect configs or Impala is running on unsupported hardware

True

kerberos_reinit_interval

The number of minutes between reestablishing the ticket with the Kerberos server

60

principal

The service Kerberos principal

 — 

keytab_file

The service Kerberos keytab file

 — 

ssl_server_certificate

The path to the TLS/SSL file with the server certificate key used for TLS/SSL. It is used when Impala operates as a TLS/SSL server. The certificate file must be in the PEM format

 — 

ssl_private_key

The path to the TLS/SSL file with the private key used for TLS/SSL. It is used when Impala operates as a TLS/SSL server. The file must be in the PEM format

 — 

ssl_client_ca_certificate

The path to the certificate, in the PEM format, used to confirm the authenticity of SSL/TLS servers that the Impala daemons can connect to. Since the Impala daemons connect to each other, it should also include the CA certificate used to sign all the SSL/TLS certificates. SSL/TLS between Impala daemons cannot be enabled without this parameter

 — 

webserver_certificate_file

The path to the TLS/SSL file with the server certificate key used for TLS/SSL. It is used when the Impala daemon web server operates as a TLS/SSL server. The certificate file must be in the PEM format

 — 

webserver_private_key_file

The path to the TLS/SSL file with the private key used for TLS/SSL. It is used when the Impala daemon web server operates as a TLS/SSL server. The certificate file must be in the PEM format

 — 

ssl_minimum_version

The minimum version of TLS

TLSv1.2

Others
Parameter Description Default value

log4j.properties

Apache Log4j utility settings

log.threshold=INFO
main.logger=FA
impala.root.logger=DEBUG,FA
log4j.rootLogger=DEBUG,FA
log.dir=/var/log/impala/impalad
max.log.file.size=200MB
log4j.appender.FA=org.apache.log4j.FileAppender
log4j.appender.FA.File=/var/log/impalad/impalad.INFO
log4j.appender.FA.layout=org.apache.log4j.PatternLayout
log4j.appender.FA.layout.ConversionPattern=%p%d{MMdd HH:mm:ss.SSS'000'} %t %c] %m%n
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{2}: %m%n

Enable custom ulimits

Switch on the corresponding toggle button to specify resource limits (ulimits) for the current process. If you do not set these values, the default system settings are used. Ulimit settings are described in the table below

[Manager]
DefaultLimitCPU=
DefaultLimitFSIZE=
DefaultLimitDATA=
DefaultLimitSTACK=
DefaultLimitCORE=
DefaultLimitRSS=
DefaultLimitNOFILE=
DefaultLimitAS=
DefaultLimitNPROC=
DefaultLimitMEMLOCK=
DefaultLimitLOCKS=
DefaultLimitSIGPENDING=
DefaultLimitMSGQUEUE=
DefaultLimitNICE=
DefaultLimitRTPRIO=
DefaultLimitRTTIME=
Ulimit settings
Parameter Description Corresponding option of the ulimit command in CentOS

DefaultLimitCPU

A limit in seconds on the amount of CPU time that a process can consume

cpu time ( -t)

DefaultLimitFSIZE

The maximum size of files that a process can create, in 512-byte blocks

file size ( -f)

DefaultLimitDATA

The maximum size of a process’s data segment, in kilobytes

data seg size ( -d)

DefaultLimitSTACK

The maximum stack size allocated to a process, in kilobytes

stack size ( -s)

DefaultLimitCORE

The maximum size of a core dump file allowed for a process, in 512-byte blocks

core file size ( -c)

DefaultLimitRSS

The maximum of resident set size, in kilobytes

max memory size ( -m)

DefaultLimitNOFILE

The maximum number of open file descriptors allowed for the process

open files ( -n)

DefaultLimitAS

The maximum size of the process virtual memory (address space), in kilobytes

virtual memory ( -v)

DefaultLimitNPROC

The maximum number of processes

max user processes ( -u)

DefaultLimitMEMLOCK

The maximum memory size that can be locked for the process, in kilobytes. Memory locking ensures the memory is always in RAM and a swap file is not used

max locked memory ( -l)

DefaultLimitLOCKS

The maximum number of files locked by a process

file locks ( -x)

DefaultLimitSIGPENDING

The maximum number of signals that are pending for delivery to the calling thread

pending signals ( -i)

DefaultLimitMSGQUEUE

The maximum number of bytes in POSIX message queues. POSIX message queues allow processes to exchange data in the form of messages

POSIX message queues ( -q)

DefaultLimitNICE

The maximum NICE priority level that can be assigned to a process

scheduling priority ( -e)

DefaultLimitRTPRIO

The maximum real-time scheduling priority level

real-time priority ( -r)

DefaultLimitRTTIME

The maximum pipe buffer size, in 512-byte blocks

pipe size ( -p)

The Impala Statestore component
statestore.conf
Parameter Description Default value

hostname

The hostname to use for the Statestore daemon. If Kerberos is enabled, it is also used as a part of the Kerberos principal. If this option is not set, the system default is used

 — 

state_store_host

The host where the Impala Statestore component is running

 — 

state_store_port

The port on which the Impala Statestore component is running

24000

catalog_service_host

The host where the Impala Catalog Service component is running

 — 

catalog_service_port

The port on which the Impala Catalog Service component listens

26000

enable_webserver

Enables or disables the Statestore daemon web server. Its Web UI contains information about memory usage, configuration settings, and ongoing health checks performed by Statestore

True

webserver_require_spnego

Enables the Kerberos authentication for Hadoop HTTP web consoles for all roles of this service using the SPNEGO protocol. Use this option only if Kerberos is enabled for the HDFS service

False

webserver_port

The port on which the Statestore web server is running

25010

log_dir

The directory where the Statestore daemon places its log files

/var/log/impala/statestored/

log_filename

The Prefix of the log filename — the full path is <log_dir>/<log_filename>

statestored

max_log_files

The number of log files that are kept for each severity level (INFO, WARNING, ERROR, and FATAL) before older log files are removed. The number should be greater than 1 to keep at least the current log file to remain open. If set to 0, all log files are retained and log rotation is disabled

10

minidump_path

The directory for storing Statestore daemon Breakpad dumps

/var/log/impala-minidumps

max_minidumps

The maximum number of Breakpad dump files stored by Statestore daemon. A negative value or 0 is interpreted as an unlimited number

9

state_store_num_server_worker_threads

The number of worker threads for the thread manager of the Statestore Thrift server

4

state_store_pending_task_count_max

The maximum number of tasks allowed to be pending by the thread manager of the Statestore Thrift server. The 0 value allows an infinite number of pending tasks

0

kerberos_reinit_interval

The number of minutes between reestablishing the ticket with the Kerberos server

60

principal

The service Kerberos principal

 — 

keytab_file

The service Kerberos keytab file

 — 

ssl_server_certificate

The path to the TLS/SSL file with the server certificate key used for TLS/SSL. It is used when Impala operates as a TLS/SSL server. The certificate file must be in the PEM format

 — 

ssl_private_key

The path to the TLS/SSL file with the private key used for TLS/SSL. It is used when Impala operates as a TLS/SSL server. The file must be in the PEM format

 — 

ssl_client_ca_certificate

The path to the certificate, in the PEM format, used to confirm the authenticity of SSL/TLS servers that the Impala daemons can connect to. Since the Impala daemons connect to each other, it should also include the CA certificate used to sign all the SSL/TLS certificates. SSL/TLS between Impala daemons cannot be enabled without this parameter

 — 

webserver_certificate_file

The path to the TLS/SSL file with the server certificate key used for TLS/SSL. It is used when the Statestore web server operates as a TLS/SSL server. The certificate file must be in the PEM format

 — 

webserver_private_key_file

The path to the TLS/SSL file with the private key used for TLS/SSL. It is used when the Statestore web server operates as a TLS/SSL server. The certificate file must be in the PEM format

 — 

ssl_minimum_version

The minimum version of TLS

TLSv1.2

Others
Parameter Description Default value

Custom statestore.conf

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file statestore.conf

 — 

Enable custom ulimits

Switch on the corresponding toggle button to specify resource limits (ulimits) for the current process. If you do not set these values, the default system settings are used. Ulimit settings are described in the table below

[Manager]
DefaultLimitCPU=
DefaultLimitFSIZE=
DefaultLimitDATA=
DefaultLimitSTACK=
DefaultLimitCORE=
DefaultLimitRSS=
DefaultLimitNOFILE=
DefaultLimitAS=
DefaultLimitNPROC=
DefaultLimitMEMLOCK=
DefaultLimitLOCKS=
DefaultLimitSIGPENDING=
DefaultLimitMSGQUEUE=
DefaultLimitNICE=
DefaultLimitRTPRIO=
DefaultLimitRTTIME=
Ulimit settings
Parameter Description Corresponding option of the ulimit command in CentOS

DefaultLimitCPU

A limit in seconds on the amount of CPU time that a process can consume

cpu time ( -t)

DefaultLimitFSIZE

The maximum size of files that a process can create, in 512-byte blocks

file size ( -f)

DefaultLimitDATA

The maximum size of a process’s data segment, in kilobytes

data seg size ( -d)

DefaultLimitSTACK

The maximum stack size allocated to a process, in kilobytes

stack size ( -s)

DefaultLimitCORE

The maximum size of a core dump file allowed for a process, in 512-byte blocks

core file size ( -c)

DefaultLimitRSS

The maximum of resident set size, in kilobytes

max memory size ( -m)

DefaultLimitNOFILE

The maximum number of open file descriptors allowed for the process

open files ( -n)

DefaultLimitAS

The maximum size of the process virtual memory (address space), in kilobytes

virtual memory ( -v)

DefaultLimitNPROC

The maximum number of processes

max user processes ( -u)

DefaultLimitMEMLOCK

The maximum memory size that can be locked for the process, in kilobytes. Memory locking ensures the memory is always in RAM and a swap file is not used

max locked memory ( -l)

DefaultLimitLOCKS

The maximum number of files locked by a process

file locks ( -x)

DefaultLimitSIGPENDING

The maximum number of signals that are pending for delivery to the calling thread

pending signals ( -i)

DefaultLimitMSGQUEUE

The maximum number of bytes in POSIX message queues. POSIX message queues allow processes to exchange data in the form of messages

POSIX message queues ( -q)

DefaultLimitNICE

The maximum NICE priority level that can be assigned to a process

scheduling priority ( -e)

DefaultLimitRTPRIO

The maximum real-time scheduling priority level

real-time priority ( -r)

DefaultLimitRTTIME

The maximum pipe buffer size, in 512-byte blocks

pipe size ( -p)

The Impala Catalog Service component
catalogstore.conf
Parameter Description Default value

hostname

The hostname to use for the Catalog Service daemon. If Kerberos is enabled, it is also used as a part of the Kerberos principal. If this option is not set, the system default is used

 — 

state_store_host

The host where the Impala Statestore component is running

 — 

state_store_port

The port on which the Impala Statestore component is running

24000

catalog_service_host

The host where the Impala Catalog Service component is running

 — 

catalog_service_port

The port on which the Impala Catalog Service component listens

26000

enable_webserver

Enables or disables the Catalog Service web server. Its Web UI includes information about the databases, tables, and other objects managed by Impala, in addition to the resource usage and configuration settings of the Catalog Service

True

webserver_require_spnego

Enables the Kerberos authentication for Hadoop HTTP web consoles for all roles of this service using the SPNEGO protocol. Use this option only if Kerberos is enabled for the HDFS service

False

webserver_port

The port on which the Catalog Service web server is running

25020

log_dir

The directory where the Catalog Service daemon places its log files

/var/log/impala/catalogd/

log_filename

The Prefix of the log filename — the full path is <log_dir>/<log_filename>

catalogd

max_log_files

The number of log files that are kept for each severity level (INFO, WARNING, ERROR, and FATAL) before older log files are removed. The number should be greater than 1 to keep at least the current log file to remain open. If set to 0, all log files are retained and log rotation is disabled

10

minidump_path

The directory for storing the Catalog Service daemon Breakpad dumps

/var/log/impala-minidumps

max_minidumps

The maximum number of Breakpad dump files stored by Catalog Service. A negative value or 0 is interpreted as an unlimited number

9

hms_event_polling_interval_s

When this parameter is set to a positive integer, Catalog Service fetches new notifications from Hive Metastore at the specified interval in seconds. If hms_event_polling_interval_s is set to 0, the automatic metadata invalidation and updates are disabled. See Metadata management

2

load_auth_to_local_rules

If checked (True) and Kerberos is enabled for Impala, Impala uses the auth_to_local option from hadoop.security.auth_to_local rules of the HDFS configuration

True

load_catalog_in_background

If it is set to True, the metadata is loaded in the background, even if that metadata is not required for any query. If False, the metadata is loaded when it is referenced for the first time

False

catalog_topic_mode

The granularity of on-demand metadata fetches between the Impala Daemon coordinator and Impala Catalog Service. See Metadata management

minimal

statestore_subscriber_timeout_seconds

The timeout in seconds for Impala Daemon and Catalog Server connections to Statestore

30

state_store_subscriber_port

The port where StateStoreSubscriberService is running. StateStoreSubscriberService listens on this port for updates from the Statestore daemon

23020

kerberos_reinit_interval

The number of minutes between reestablishing the ticket with the Kerberos server

60

principal

The service Kerberos principal

 — 

keytab_file

The service Kerberos keytab file

 — 

ssl_server_certificate

The path to the TLS/SSL file with the server certificate key used for TLS/SSL. It is used when Impala operates as a TLS/SSL server. The certificate file must be in the PEM format

 — 

ssl_private_key

The path to the TLS/SSL file with the private key used for TLS/SSL. It is used when Impala operates as a TLS/SSL server. The file must be in the PEM format

 — 

ssl_client_ca_certificate

The path to the certificate, in the PEM format, used to confirm the authenticity of SSL/TLS servers that the Impala daemons can connect to. Since the Impala daemons connect to each other, it should also include the CA certificate used to sign all the SSL/TLS certificates. SSL/TLS between Impala daemons cannot be enabled without this parameter

 — 

webserver_certificate_file

The path to the TLS/SSL file with the server certificate key used for TLS/SSL. It is used when the Catalog Service web server operates as a TLS/SSL server. The certificate file must be in the PEM format

 — 

webserver_private_key_file

The path to the TLS/SSL file with the private key used for TLS/SSL. It is used when the Catalog Service web server operates as a TLS/SSL server. The certificate file must be in the PEM format

 — 

ssl_minimum_version

The minimum version of TLS

TLSv1.2

Others
Parameter Description Default value

Custom catalogstore.conf

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file catalogstore.conf

 — 

Enable custom ulimits

Switch on the corresponding toggle button to specify resource limits (ulimits) for the current process. If you do not set these values, the default system settings are used. Ulimit settings are described in the table below

[Manager]
DefaultLimitCPU=
DefaultLimitFSIZE=
DefaultLimitDATA=
DefaultLimitSTACK=
DefaultLimitCORE=
DefaultLimitRSS=
DefaultLimitNOFILE=
DefaultLimitAS=
DefaultLimitNPROC=
DefaultLimitMEMLOCK=
DefaultLimitLOCKS=
DefaultLimitSIGPENDING=
DefaultLimitMSGQUEUE=
DefaultLimitNICE=
DefaultLimitRTPRIO=
DefaultLimitRTTIME=
Ulimit settings
Parameter Description Corresponding option of the ulimit command in CentOS

DefaultLimitCPU

A limit in seconds on the amount of CPU time that a process can consume

cpu time ( -t)

DefaultLimitFSIZE

The maximum size of files that a process can create, in 512-byte blocks

file size ( -f)

DefaultLimitDATA

The maximum size of a process’s data segment, in kilobytes

data seg size ( -d)

DefaultLimitSTACK

The maximum stack size allocated to a process, in kilobytes

stack size ( -s)

DefaultLimitCORE