Configuration parameters
This topic describes the parameters that can be configured for ADH services via ADCM. To read about the configuring process, refer to the relevant articles: Online installation, Offline installation.
NOTE
Some of the parameters become visible in the ADCM UI after the Show advanced flag being set.
|
Airflow
Parameter | Description | Default value |
---|---|---|
airflow_dir |
The Airflow home directory |
/srv/airflow/home |
db_dir |
The location of Metastore DB |
/srv/airflow/metastore |
Parameter | Description | Default value |
---|---|---|
db_user |
The user to connect to Metadata DB |
airflow |
db_password |
The password to connect to Metadata DB |
— |
db_root_password |
The root password to connect to Metadata DB |
— |
db_port |
The port to connect to Metadata DB |
3307 |
server_port |
The port to run the web server |
8080 |
flower_port |
The port that Celery Flower runs on |
5555 |
worker_port |
When you start an Airflow Worker, Airflow starts a tiny web server subprocess to serve the Workers local log files to the Airflow main web server, which then builds pages and sends them to users. This defines the port, on which the logs are served. The port must be free and accessible from the main web server to connect to the Workers |
8793 |
redis_port |
The port for running Redis |
6379 |
fernet_key |
The secret key to save connection passwords in the database |
— |
security |
Defines which security module to use.
For example, |
— |
keytab |
The path to the keytab file |
— |
reinit_frequency |
Sets the ticket renewal frequency |
3600 |
principal |
The Kerberos principal |
|
ssl_active |
Defines if SSL is active for Airflow |
false |
web_server_ssl_cert |
The path to SSL certificate |
/etc/ssl/certs/host_cert.cert |
web_server_ssl_key |
The path to SSL certificate key |
/etc/ssl/host_cert.key |
Logging level |
Specifies the logging level for Airflow activity |
INFO |
Logging level for Flask-appbuilder UI |
Specifies the logging level for Flask-appbuilder UI |
WARNING |
cfg_properties_template |
The Jinja template to initialize environment variables for Airflow |
Parameter | Description | Default value |
---|---|---|
Database type |
The external database type.
Possible values: |
MySQL/MariaDB |
Hostname |
The external database host |
— |
Custom port |
The external database port |
— |
Airflow database name |
The external database name |
airflow |
Flink
Parameter | Description | Default value |
---|---|---|
jobmanager.rpc.port |
The RPC port through which the JobManager is reachable. In the high availability mode, this value is ignored and the port number to connect to JobManager is generated by ZooKeeper |
6123 |
sql-gateway.endpoint.rest.port |
A port to connect to the SQL Gateway service |
8083 |
taskmanager.network.bind-policy |
The automatic address binding policy used by the TaskManager |
name |
parallelism.default |
The system-wide default parallelism level for all execution environments |
1 |
taskmanager.numberOfTaskSlots |
The number of task slots that each TaskManager offers. Each slot runs one parallel pipeline |
1 |
taskmanager.heap.size |
The heap size for the TaskManager JVM |
1024m |
jobmanager.heap.size |
The heap size for the JobManager JVM |
1024m |
security.kerberos.login.use-ticket-cache |
Indicates whether to read from the Kerberos ticket cache |
false |
security.kerberos.login.keytab |
The absolute path to the Kerberos keytab file that stores user credentials |
— |
security.kerberos.login.principal |
Flink Kerberos principal |
— |
security.kerberos.login.contexts |
A comma-separated list of login contexts to provide the Kerberos credentials to |
— |
security.ssl.rest.enabled |
Turns on SSL for external communication via REST endpoints |
false |
security.ssl.rest.keystore |
The Java keystore file with SSL key and certificate to be used by Flink’s external REST endpoints |
— |
security.ssl.rest.truststore |
The truststore file containing public CA certificates to verify the peer for Flink’s external REST endpoints |
— |
security.ssl.rest.keystore-password |
The secret to decrypt the keystore file for Flink external REST endpoints |
— |
security.ssl.rest.truststore-password |
The password to decrypt the truststore for Flink’s external REST endpoints |
— |
security.ssl.rest.key-password |
The secret to decrypt the key in the keystore for Flink’s external REST endpoints |
— |
Logging level |
Defines the logging level for Flink activity |
INFO |
high-availability |
Defines the High Availability (HA) mode used for cluster execution |
— |
high-availability.zookeeper.quorum |
The ZooKeeper quorum to use when running Flink in the HA mode with ZooKeeper |
— |
high-availability.storageDir |
A file system path (URI) where Flink persists metadata in the HA mode |
— |
high-availability.zookeeper.path.root |
The root path for Flink ZNode in Zookeeper |
/flink |
high-availability.cluster-id |
The ID of the Flink cluster used to separate multiple Flink clusters from each other |
— |
sql-gateway.session.check-interval |
The check interval to detect idle sessions.
A value <= |
1 min |
sql-gateway.session.idle-timeout |
The timeout to close a session if no successful connection was made during this interval.
A value <= |
10 min |
sql-gateway.session.max-num |
The maximum number of sessions to run simultaneously |
1000000 |
sql-gateway.worker.keepalive-time |
The time to keep an idle worker thread alive.
When the worker thread count exceeds |
5 min |
sql-gateway.worker.threads.max |
The maximum number of worker threads on the SQL Gateway server |
500 |
sql-gateway.worker.threads.min |
The minimum number of worker threads. If the current number of worker threads is less than this value, the worker threads are not deleted automatically |
500 |
zookeeper.sasl.disable |
Defines the SASL authentication in Zookeeper |
false |
Parameter | Description | Default value |
---|---|---|
Custom flink-conf.yaml |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file flink-conf.yaml |
— |
log4j.properties |
The contents of the log4j.properties configuration file |
|
log4j-cli.properties |
The contents of the log4j-cli.properties configuration file |
HBase
Parameter | Description | Default value |
---|---|---|
hbase.balancer.period |
The time period to run the Region balancer in Master |
300000 |
hbase.client.pause |
General client pause value.
Used mostly as value to wait before running a retry of a failed get, region lookup, etc.
See |
100 |
hbase.client.max.perregion.tasks |
The maximum number of concurrent mutation tasks the Client will maintain to a single Region.
That is, if there is already |
1 |
hbase.client.max.perserver.tasks |
The maximum number of concurrent mutation tasks a single HTable instance will send to a single Region Server |
2 |
hbase.client.max.total.tasks |
The maximum number of concurrent mutation tasks, a single HTable instance will send to the cluster |
100 |
hbase.client.retries.number |
The maximum number of retries.
It is used as maximum for all retryable operations, such as: getting a cell value, starting a row update, etc.
Retry interval is a rough function based on |
15 |
hbase.client.scanner.timeout.period |
The Client scanner lease period in milliseconds |
60000 |
hbase.cluster.distributed |
The cluster mode.
Possible values are: |
true |
hbase.hregion.majorcompaction |
The time interval between Major compactions in milliseconds.
Set to |
604800000 |
hbase.hregion.max.filesize |
The maximum file size.
If the total size of some Region HFiles has grown to exceed this value, the Region is split in two.
There are two options of how this option works: the first is when any store size exceeds the threshold — then split, and the other is if overall Region size exceeds the threshold — then split.
It can be configured by |
10737418240 |
hbase.hstore.blockingStoreFiles |
If more than this number of StoreFiles exists in any Store (one StoreFile is written per flush of MemStore), updates are blocked for this Region, until a compaction is completed, or until |
16 |
hbase.hstore.blockingWaitTime |
The time for which a Region will block updates after reaching the StoreFile limit, defined by |
90000 |
hbase.hstore.compaction.max |
The maximum number of StoreFiles that will be selected for a single Minor compaction, regardless of the number of eligible StoreFiles.
Effectively, the value of |
10 |
hbase.hstore.compaction.min |
The minimum number of StoreFiles that must be eligible for compaction before compaction can run.
The goal of tuning |
3 |
hbase.hstore.compaction.min.size |
A StoreFile, smaller than this size, will always be eligible for Minor compaction.
StoreFiles this size or larger are evaluated by |
134217728 |
hbase.hstore.compaction.ratio |
For Minor compaction, this ratio is used to determine, whether a given StoreFile that is larger than |
1.2F |
hbase.hstore.compaction.ratio.offpeak |
The compaction ratio used during off-peak compactions if the off-peak hours are also configured.
Expressed as a floating-point decimal.
This allows for more aggressive (or less aggressive, if you set it lower than |
5.0F |
hbase.hstore.compactionThreshold |
If more than this number of StoreFiles exists in any Store (one StoreFile is written per flush of MemStore), a compaction is run to rewrite all StoreFiles into a single StoreFile. Larger values delay the compaction, but when compaction does occur, it takes longer to complete |
3 |
hbase.hstore.flusher.count |
The number of flush threads. With fewer threads, the MemStore flushes will be queued. With more threads, the flushes will be executed in parallel, increasing the load on HDFS, and potentially causing more compactions |
2 |
hbase.hstore.time.to.purge.deletes |
The amount of time to delay purging of delete markers with future timestamps.
If unset or set to |
0 |
hbase.master.ipc.address |
HMaster RPC |
0.0.0.0 |
hbase.normalizer.period |
The period at which the Region normalizer runs on Master (in milliseconds) |
300000 |
hbase.regionserver.compaction.enabled |
Enables/disables compactions by setting |
true |
hbase.regionserver.ipc.address |
Region Server RPC |
0.0.0.0 |
hbase.regionserver.regionSplitLimit |
The limit for the number of Regions, after which no more Region splitting should take place. This is not hard limit for the number of Regions, but acts as a guideline for the Region Server to stop splitting after a certain limit |
1000 |
hbase.rootdir |
The directory shared by Region Servers and into which HBase persists.
The URL should be fully-qualified to include the filesystem scheme.
For example, to specify the HDFS directory /hbase where the HDFS instance NameNode is running at namenode.example.org on port 9000, set this value to: |
— |
hbase.zookeeper.quorum |
A comma-separated list of servers in the ZooKeeper ensemble.
For example, |
— |
zookeeper.session.timeout |
The ZooKeeper session timeout in milliseconds.
It is used in two different ways.
First, this value is processed by the ZooKeeper Client that HBase uses to connect to the ensemble.
It is also used by HBase, when it starts a ZooKeeper Server (in that case the timeout is passed as the |
90000 |
zookeeper.znode.parent |
The root znode for HBase in ZooKeeper. All of the HBase ZooKeeper files configured with a relative path will go under this node. By default, all of the HBase ZooKeeper file paths are configured with a relative path, so they will all go under this directory unless changed |
/hbase |
hbase.rest.port |
The port used by HBase Rest Servers |
60080 |
hbase.zookeeper.property.authProvider.1 |
Specifies the ZooKeeper authentication method |
|
hbase.security.authentication |
Set the value to |
false |
hbase.security.authentication.ui |
Enables Kerberos authentication to HBase web UI with SPNEGO |
— |
hbase.security.authentication.spnego.kerberos.principal |
The Kerberos principal for SPNEGO authentication |
— |
hbase.security.authentication.spnego.kerberos.keytab |
The path to the Kerberos keytab file with principals to be used for SPNEGO authentication |
— |
hbase.security.authorization |
Set the value to |
false |
hbase.master.kerberos.principal |
The Kerberos principal used to run the HMaster process |
— |
hbase.master.keytab.file |
Full path to the Kerberos keytab file to use for logging in the configured HMaster server principal |
— |
hbase.regionserver.kerberos.principal |
The Kerberos principal name that should be used to run the HRegionServer process |
— |
hbase.regionserver.keytab.file |
Full path to the Kerberos keytab file to use for logging in the configured HRegionServer server principal |
— |
hbase.rest.authentication.type |
REST Gateway Kerberos authentication type |
— |
hbase.rest.authentication.kerberos.principal |
REST Gateway Kerberos principal |
— |
hbase.rest.authentication.kerberos.keytab |
REST Gateway Kerberos principal |
— |
hbase.thrift.keytab.file |
Thrift Kerberos keytab |
— |
hbase.rest.keytab.file |
HBase REST gateway Kerberos keytab |
— |
hbase.rest.kerberos.principal |
HBase REST gateway Kerberos principal |
— |
hbase.thrift.kerberos.principal |
Thrift Kerberos principal |
— |
hbase.thrift.security.qop |
Defines authentication, integrity, and confidentiality checking. Supported values:
|
— |
phoenix.queryserver.keytab.file |
The path to the Kerberos keytab file |
— |
phoenix.queryserver.kerberos.principal |
The Kerberos principal to use when authenticating.
If |
— |
phoenix.queryserver.kerberos.keytab |
The full path to the Kerberos keytab file to use for logging in the configured HMaster server principal |
— |
phoenix.queryserver.http.keytab.file |
The keytab file to use for authenticating SPNEGO connections.
This configuration must be specified if |
— |
phoenix.queryserver.http.kerberos.principal |
The Kerberos principal to use when authenticating SPNEGO connections.
|
|
phoenix.queryserver.kerberos.http.principal |
Deprecated, use |
— |
hbase.ssl.enabled |
Defines whether SSL is enabled for web UIs |
false |
hadoop.ssl.enabled |
Defines whether SSL is enabled for Hadoop RPC |
false |
ssl.server.keystore.location |
The path to the keystore file |
— |
ssl.server.keystore.password |
The password to the keystore |
— |
ssl.server.truststore.location |
The path to the truststore to be used |
— |
ssl.server.truststore.password |
The password to the truststore |
— |
ssl.server.keystore.keypassword |
The password to the key in the keystore |
— |
hbase.rest.ssl.enabled |
Defines whether SSL is enabled for HBase REST server |
false |
hbase.rest.ssl.keystore.store |
The path to the keystore used by HBase REST server |
— |
hbase.rest.ssl.keystore.password |
The password to the keystore |
— |
hbase.rest.ssl.keystore.keypassword |
The password to the key in the keystore |
— |
Parameter | Description | Default value |
---|---|---|
HBASE Regionserver Heap Memory |
Sets initial (-Xms) and maximum (-Xmx) Java heap size for HBase Region server |
-Xms700m -Xmx9G |
HBASE Master Heap Memory |
Sets initial (-Xms) and maximum (-Xmx) Java heap size for HBase Master |
-Xms700m -Xmx9G |
Phoenix Queryserver Heap Memory |
Sets initial (-Xms) and maximum (-Xmx) Java heap size for Phoenix Query server |
-Xms700m -Xmx8G |
HBASE Thrift2 server Heap Memory |
Sets initial (-Xms) and maximum (-Xmx) Java heap size for HBase Thrift2 server |
-Xms700m -Xmx8G |
HBASE Rest server Heap Memory |
Sets initial (-Xms) and maximum (-Xmx) Java heap size for HBase Rest server |
-Xms200m -Xmx8G |
Parameter | Description | Default value |
---|---|---|
xasecure.audit.destination.solr.batch.filespool.dir |
The spool directory path |
/srv/ranger/hdfs_plugin/audit_solr_spool |
xasecure.audit.destination.solr.urls |
Leave this property value empty or set it to |
— |
xasecure.audit.destination.solr.zookeepers |
Specifies the ZooKeeper connection string for the Solr destination |
— |
xasecure.audit.destination.solr.force.use.inmemory.jaas.config |
Uses in-memory JAAS configuration file to connect to Solr |
— |
xasecure.audit.is.enabled |
Enables Ranger audit |
true |
xasecure.audit.jaas.Client.loginModuleControlFlag |
Specifies whether the success of the module is |
— |
xasecure.audit.jaas.Client.loginModuleName |
The name of the authenticator class |
— |
xasecure.audit.jaas.Client.option.keyTab |
The name of the keytab file to get the principal’s secret key |
— |
xasecure.audit.jaas.Client.option.principal |
The name of the principal to be used |
— |
xasecure.audit.jaas.Client.option.serviceName |
Represents a user or a service that wants to log in |
— |
xasecure.audit.jaas.Client.option.storeKey |
Set this to |
false |
xasecure.audit.jaas.Client.option.useKeyTab |
Set this to |
false |
Parameter | Description | Default value |
---|---|---|
ranger.plugin.hbase.policy.rest.url |
The URL to Ranger Admin |
— |
ranger.plugin.hbase.service.name |
The name of the Ranger service containing policies for this instance |
— |
ranger.plugin.hbase.policy.cache.dir |
The directory where Ranger policies are cached after successful retrieval from the source |
/srv/ranger/hbase/policycache |
ranger.plugin.hbase.policy.pollIntervalMs |
Defines how often to poll for changes in policies |
30000 |
ranger.plugin.hbase.policy.rest.client.connection.timeoutMs |
The HBase Plugin RangerRestClient connection timeout (in milliseconds) |
120000 |
ranger.plugin.hbase.policy.rest.client.read.timeoutMs |
The HBase Plugin RangerRestClient read timeout (in milliseconds) |
30000 |
ranger.plugin.hbase.policy.rest.ssl.config.file |
The path to the RangerRestClient SSL config file for HBase plugin |
/etc/hbase/conf/ranger-hbase-policymgr-ssl.xml |
Parameter | Description | Default value |
---|---|---|
xasecure.policymgr.clientssl.keystore |
The path to the keystore file used by Ranger |
— |
xasecure.policymgr.clientssl.keystore.credential.file |
The path to the keystore credentials file |
/etc/hbase/conf/ranger-hbase.jceks |
xasecure.policymgr.clientssl.truststore.credential.file |
The path to the truststore credentials file |
/etc/hbase/conf/ranger-hbase.jceks |
xasecure.policymgr.clientssl.truststore |
The path to the truststore file used by Ranger |
— |
xasecure.policymgr.clientssl.keystore.password |
The password to the keystore file |
— |
xasecure.policymgr.clientssl.truststore.password |
The password to the truststore file |
— |
Parameter | Description | Default value |
---|---|---|
Custom hbase-site.xml |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file hbase-site.xml |
— |
Custom hbase-env.sh |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file hbase-env.sh |
— |
Ranger plugin enabled |
Whether or not Ranger plugin is enabled |
false |
Custom ranger-hbase-audit.xml |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ranger-hbase-audit.xml |
— |
Custom ranger-hbase-security.xml |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ranger-hbase-security.xml |
— |
Custom ranger-hbase-security.xml |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ranger-hbase-security.xml |
— |
Custom ranger-hbase-policymgr-ssl.xml |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ranger-hbase-policymgr-ssl.xml |
— |
Custom log4j.properties |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file log4j.properties |
|
Custom hadoop-metrics2-hbase.properties |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file hadoop-metrics2-hbase.properties |
HDFS
Parameter | Description | Default value |
---|---|---|
fs.defaultFS |
The name of the default file system.
A URI whose scheme and authority determine the FileSystem implementation.
The URI scheme determines the config property ( |
— |
fs.trash.checkpoint.interval |
The number of minutes between trash checkpoints.
Should be smaller or equal to |
60 |
fs.trash.interval |
The number of minutes, after which the checkpoint gets deleted.
If set to |
1440 |
hadoop.tmp.dir |
The base for other temporary directories |
/tmp/hadoop-${user.name} |
hadoop.zk.address |
A comma-separated list of pairs <Host>:<Port>. Each corresponds to a ZooKeeper to be used by the Resource Manager for storing Resource Manager state |
— |
io.file.buffer.size |
The buffer size for sequence files. The size of this buffer should probably be a multiple of hardware page size (4096 on Intel x86), and it determines, how much data is buffered during read and write operations |
131072 |
net.topology.script.file.name |
The script name, that should be invoked to resolve DNS names to NetworkTopology names. Example: the script would take host.foo.bar as an argument, and return /rack1 as the output |
— |
ha.zookeeper.quorum |
A list of ZooKeeper Server addresses, separated by commas, that are to be used by the ZKFailoverController in automatic failover |
— |
ipc.client.fallback-to-simple-auth-allowed |
When a client is configured to attempt a secure connection, but attempts to connect to an insecure server, that server may instuct the client to switch to SASL SIMPLE (unsecure) authentication.
This setting controls whether or not the client will accept this instruction from the server.
When set to |
false |
hadoop.security.authentication |
Defines the authentication type.
Possible values: |
simple |
hadoop.security.authorization |
Enables RPC service-level authorization |
false |
hadoop.rpc.protection |
Specifies RPC protection. Possible values:
|
authentication |
hadoop.security.auth_to_local |
The value is a string containing new line characters. See Kerberos documentation for more information about the format |
— |
hadoop.http.authentication.type |
Defines authentication used for the HTTP web-consoles.
The supported values are: |
simple |
hadoop.http.authentication.kerberos.principal |
Indicates the Kerberos principal to be used for HTTP endpoint when using the |
HTTP/localhost@$LOCALHOST |
hadoop.http.authentication.kerberos.keytab |
The location of the keytab file with the credentials for the Kerberos principal used for the HTTP endpoint |
/etc/security/keytabs/HTTP.service.keytab |
ha.zookeeper.acl |
ACLs for all znodes |
— |
hadoop.http.filter.initializers |
Add to this property the |
— |
hadoop.http.authentication.signature.secret.file |
The signature secret file for signing the authentication tokens. If not set, a random secret is generated during the startup. The same secret should be used for all nodes in the cluster, JobTracker, NameNode, DataNode and TastTracker. This file should be readable only by the Unix user running the daemons |
/etc/security/http_secret |
hadoop.http.authentication.cookie.domain |
The domain to use for the HTTP cookie that stores the authentication token. In order for authentication to work properly across all nodes in the cluster, the domain must be correctly set. There is no default value, the HTTP cookie will not have a domain working only with the hostname issuing the HTTP cookie |
— |
hadoop.ssl.require.client.cert |
Defines whether client certificates are required |
false |
hadoop.ssl.hostname.verifier |
The host name verifier to provide for HttpsURLConnections.
Valid values are: |
DEFAULT |
hadoop.ssl.keystores.factory.class |
The KeyStoresFactory implementation to use |
org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory |
hadoop.ssl.server.conf |
A resource file from which the SSL server keystore information will be extracted. This file is looked up in the classpath, typically it should be located in Hadoop conf/ directory |
ssl-server.xml |
hadoop.ssl.client.conf |
A resource file from which the SSL client keystore information will be extracted. This file is looked up in the classpath, typically it should be located in Hadoop conf/ directory |
ssl-client.xml |
User managed hadoop.security.auth_to_local |
Disable automatic generation of |
false |
Parameter | Description | Default value |
---|---|---|
dfs.client.block.write.replace-datanode-on-failure.enable |
If there is a DataNode/network failure in the write pipeline, DFSClient will try to remove the failed DataNode from the pipeline and then continue writing with the remaining DataNodes.
As a result, the number of DataNodes in the pipeline is decreased.
The feature is to add new DataNodes to the pipeline.
This is a site-wide property to enable/disable the feature.
When the cluster size is extremely small, e.g. 3 nodes or less, cluster administrators may want to set the policy to |
true |
dfs.client.block.write.replace-datanode-on-failure.policy |
This property is used only if the value of
|
DEFAULT |
dfs.client.block.write.replace-datanode-on-failure.best-effort |
This property is used only if the value of |
false |
dfs.client.block.write.replace-datanode-on-failure.min-replication |
The minimum number of replications needed not to fail the write pipeline if new DataNodes can not be found to replace failed DataNodes (could be due to network failure) in the write pipeline.
If the number of the remaining DataNodes in the write pipeline is greater than or equal to this property value, continue writing to the remaining nodes.
Otherwise throw exception.
If this is set to |
0 |
dfs.balancer.dispatcherThreads |
The size of the thread pool for the HDFS balancer block mover — dispatchExecutor |
200 |
dfs.balancer.movedWinWidth |
The time window in milliseconds for the HDFS balancer tracking blocks and its locations |
5400000 |
dfs.balancer.moverThreads |
The thread pool size for executing block moves — moverThreadAllocator |
1000 |
dfs.balancer.max-size-to-move |
The maximum number of bytes that can be moved by the balancer in a single thread |
10737418240 |
dfs.balancer.getBlocks.min-block-size |
The minimum block threshold size in bytes to ignore, when fetching a source block list |
10485760 |
dfs.balancer.getBlocks.size |
The total size in bytes of DataNode blocks to get, when fetching a source block list |
2147483648 |
dfs.balancer.block-move.timeout |
The maximum amount of time for a block to move (in milliseconds).
If set greater than |
0 |
dfs.balancer.max-no-move-interval |
If this specified amount of time has elapsed and no blocks have been moved out of a source DataNode, one more attempt will be made to move blocks out of this DataNode in the current Balancer iteration |
60000 |
dfs.balancer.max-iteration-time |
The maximum amount of time an iteration can be run by the Balancer.
After this time the Balancer will stop the iteration, and re-evaluate the work needed to be done to balance the cluster.
The default value is |
1200000 |
dfs.blocksize |
The default block size for new files (in bytes).
You can use the following suffixes to define size units (case insensitive): |
134217728 |
dfs.client.read.shortcircuit |
Turns on short-circuit local reads |
true |
dfs.datanode.balance.max.concurrent.moves |
The maximum number of threads for DataNode balancer pending moves.
This value is reconfigurable via the |
50 |
dfs.datanode.data.dir |
Determines, where on the local filesystem a DFS data node should store its blocks.
If multiple directories are specified, then data will be stored in all named directories, typically on different devices.
The directories should be tagged with corresponding storage types ( |
/srv/hadoop-hdfs/data:DISK |
dfs.disk.balancer.max.disk.throughputInMBperSec |
The maximum disk bandwidth, used by the disk balancer during reads from a source disk. The unit is MB/sec |
10 |
dfs.disk.balancer.block.tolerance.percent |
The parameter specifies when a good enough value is reached for any copy step (in percents).
For example, if set to to |
10 |
dfs.disk.balancer.max.disk.errors |
During a block move from a source to destination disk, there might be various errors. This parameter defines how many errors to tolerate before declaring a move between 2 disks (or a step) has failed |
5 |
dfs.disk.balancer.plan.valid.interval |
The maximum amount of time a disk balancer plan (a set of configurations that define the data volume to be redistributed between two disks) remains valid.
This setting supports multiple time unit suffixes as described in |
1d |
dfs.disk.balancer.plan.threshold.percent |
Defines a data storage threshold in percents at which disks start participating in data redistribution or balancing activities |
10 |
dfs.domain.socket.path |
The path to a UNIX domain socket that will be used for communication between the DataNode and local HDFS clients.
If the string |
/var/lib/hadoop-hdfs/dn_socket |
dfs.hosts |
Names a file that contains a list of hosts allowed to connect to the NameNode. The full pathname of the file must be specified. If the value is empty, all hosts are permitted |
/etc/hadoop/conf/dfs.hosts |
dfs.mover.movedWinWidth |
The minimum time interval for a block to be moved to another location again (in milliseconds) |
5400000 |
dfs.mover.moverThreads |
Sets the balancer mover thread pool size |
1000 |
dfs.mover.retry.max.attempts |
The maximum number of retries before the mover considers the move as failed |
10 |
dfs.mover.max-no-move-interval |
If this specified amount of time has elapsed and no block has been moved out of a source DataNode, one more attempt will be made to move blocks out of this DataNode in the current mover iteration |
60000 |
dfs.namenode.name.dir |
Determines where on the local filesystem the DFS name node should store the name table (fsimage). If multiple directories are specified, then the name table is replicated in all of the directories, for redundancy |
/srv/hadoop-hdfs/name |
dfs.namenode.checkpoint.dir |
Determines where on the local filesystem the DFS secondary name node should store the temporary images to merge. If multiple directories are specified, then the image is replicated in all of the directories for redundancy |
/srv/hadoop-hdfs/checkpoint |
dfs.namenode.hosts.provider.classname |
The class that provides access for host files.
|
org.apache.hadoop.hdfs.server.blockmanagement.CombinedHostFileManager |
dfs.namenode.rpc-bind-host |
The actual address, the RPC Server will bind to.
If this optional address is set, it overrides only the hostname portion of |
0.0.0.0 |
dfs.permissions.superusergroup |
The name of the group of super-users. The value should be a single group name |
hadoop |
dfs.replication |
The default block replication. The actual number of replications can be specified, when the file is created. The default is used, if replication is not specified in create time |
3 |
dfs.journalnode.http-address |
The HTTP address of the JournalNode web UI |
0.0.0.0:8480 |
dfs.journalnode.https-address |
The HTTPS address of the JournalNode web UI |
0.0.0.0:8481 |
dfs.journalnode.rpc-address |
The RPC address of the JournalNode web UI |
0.0.0.0:8485 |
dfs.datanode.http.address |
The address of the DataNode HTTP server |
0.0.0.0:9864 |
dfs.datanode.https.address |
The address of the DataNode HTTPS server |
0.0.0.0:9865 |
dfs.datanode.address |
The address of the DataNode for data transfer |
0.0.0.0:9866 |
dfs.datanode.ipc.address |
The IPC address of the DataNode |
0.0.0.0:9867 |
dfs.namenode.http-address |
The address and the base port to access the dfs NameNode web UI |
0.0.0.0:9870 |
dfs.namenode.https-address |
The secure HTTPS address of the NameNode |
0.0.0.0:9871 |
dfs.ha.automatic-failover.enabled |
Defines whether automatic failover is enabled |
true |
dfs.ha.fencing.methods |
A list of scripts or Java classes that will be used to fence the Active NameNode during a failover |
shell(/bin/true) |
dfs.journalnode.edits.dir |
The directory where to store journal edit files |
/srv/hadoop-hdfs/journalnode |
dfs.namenode.shared.edits.dir |
The directory on shared storage between the multiple NameNodes in an HA cluster.
This directory will be written by the active and read by the standby in order to keep the namespaces synchronized.
This directory does not need to be listed in |
--- |
dfs.internal.nameservices |
A unique nameservices identifier for a cluster or federation. For a single cluster, specify the name that will be used as an alias. For HDFS federation, specify, separated by commas, all namespaces associated with this cluster. This option allows you to use an alias instead of an IP address or FQDN for some commands, for example: |
— |
dfs.block.access.token.enable |
If set to |
false |
dfs.namenode.kerberos.principal |
The NameNode service principal.
This is typically set to |
nn/_HOST@REALM |
dfs.namenode.keytab.file |
The keytab file used by each NameNode daemon to login as its service principal.
The principal name is configured with |
/etc/security/keytabs/nn.service.keytab |
dfs.namenode.kerberos.internal.spnego.principal |
HTTP Kerberos principal name for the NameNode |
HTTP/_HOST@REALM |
dfs.web.authentication.kerberos.principal |
Kerberos principal name for the WebHDFS |
HTTP/_HOST@REALM |
dfs.web.authentication.kerberos.keytab |
Kerberos keytab file for WebHDFS |
/etc/security/keytabs/HTTP.service.keytab |
dfs.journalnode.kerberos.principal |
The JournalNode service principal.
This is typically set to |
jn/_HOST@REALM |
dfs.journalnode.keytab.file |
The keytab file used by each JournalNode daemon to login as its service principal.
The principal name is configured with |
/etc/security/keytabs/jn.service.keytab |
dfs.journalnode.kerberos.internal.spnego.principal |
The server principal used by the JournalNode HTTP Server for SPNEGO authentication when Kerberos security is enabled.
This is typically set to |
HTTP/_HOST@REALM |
dfs.datanode.data.dir.perm |
Permissions for the directories on the local filesystem where the DFS DataNode stores its blocks. The permissions can either be octal or symbolic |
700 |
dfs.datanode.kerberos.principal |
The DataNode service principal.
This is typically set to |
dn/_HOST@REALM.TLD |
dfs.datanode.keytab.file |
The keytab file used by each DataNode daemon to login as its service principal.
The principal name is configured with |
/etc/security/keytabs/dn.service.keytab |
dfs.http.policy |
Defines if HTTPS (SSL) is supported on HDFS.
This configures the HTTP endpoint for HDFS daemons.
The following values are supported: |
HTTP_ONLY |
dfs.data.transfer.protection |
A comma-separated list of SASL protection values used for secured connections to the DataNode when reading or writing block data. The possible values are:
If |
— |
dfs.encrypt.data.transfer |
Defines whether or not actual block data that is read/written from/to HDFS should be encrypted on the wire.
This only needs to be set on the NameNodes and DataNodes, clients will deduce this automatically.
It is possible to override this setting per connection by specifying custom logic via |
false |
dfs.encrypt.data.transfer.algorithm |
This value may be set to either |
3des |
dfs.encrypt.data.transfer.cipher.suites |
This value can be either undefined or |
— |
dfs.encrypt.data.transfer.cipher.key.bitlength |
The key bitlength negotiated by dfsclient and datanode for encryption.
This value may be set to either |
128 |
ignore.secure.ports.for.testing |
Allows to skip HTTPS requirements in the SASL mode |
false |
dfs.client.https.need-auth |
Whether SSL client certificate authentication is required |
false |
Parameter | Description | Default value |
---|---|---|
httpfs.http.administrators |
The ACL for the admins.
This configuration is used to control who can access the default servlets for HttpFS server.
The value should be a comma-separated list of users and groups.
The user list comes first and is separated by a space, followed by the group list, for example: |
* |
hadoop.http.temp.dir |
The HttpFS temp directory |
${hadoop.tmp.dir}/httpfs |
httpfs.ssl.enabled |
Defines whether SSL is enabled.
Default is |
false |
httpfs.hadoop.config.dir |
The location of the Hadoop configuration directory |
/etc/hadoop/conf |
httpfs.hadoop.authentication.type |
Defines the authentication mechanism used by httpfs for its HTTP clients.
Valid values are |
simple |
httpfs.hadoop.authentication.kerberos.keytab |
The Kerberos keytab file with the credentials for the HTTP Kerberos principal used by httpfs in the HTTP endpoint.
|
/etc/security/keytabs/httpfs.service.keytab |
httpfs.hadoop.authentication.kerberos.principal |
The HTTP Kerberos principal used by HttpFS in the HTTP endpoint.
The HTTP Kerberos principal MUST start with |
HTTP/${httpfs.hostname}@${kerberos.realm} |
Parameter | Description | Default value |
---|---|---|
xasecure.audit.destination.solr.batch.filespool.dir |
The spool directory path |
/srv/ranger/hdfs_plugin/audit_solr_spool |
xasecure.audit.destination.solr.urls |
Leave this property value empty or set it to |
— |
xasecure.audit.destination.solr.zookeepers |
Specifies the ZooKeeper connection string for the Solr destination |
— |
xasecure.audit.destination.solr.force.use.inmemory.jaas.config |
Uses in-memory JAAS configuration file to connect to Solr |
— |
xasecure.audit.is.enabled |
Enables Ranger audit |
true |
xasecure.audit.jaas.Client.loginModuleControlFlag |
Specifies whether the success of the module is |
— |
xasecure.audit.jaas.Client.loginModuleName |
The name of the authenticator class |
— |
xasecure.audit.jaas.Client.option.keyTab |
The name of the keytab file to get the principal’s secret key |
— |
xasecure.audit.jaas.Client.option.principal |
The name of the principal to be used |
— |
xasecure.audit.jaas.Client.option.serviceName |
Represents a user or a service that wants to log in |
— |
xasecure.audit.jaas.Client.option.storeKey |
Set this to |
false |
xasecure.audit.jaas.Client.option.useKeyTab |
Set this to |
false |
Parameter | Description | Default value |
---|---|---|
ranger.plugin.hdfs.policy.rest.url |
The URL to Ranger Admin |
— |
ranger.plugin.hdfs.service.name |
The name of the Ranger service containing policies for this instance |
— |
ranger.plugin.hdfs.policy.cache.dir |
The directory where Ranger policies are cached after successful retrieval from the source |
/srv/ranger/hdfs/policycache |
ranger.plugin.hdfs.policy.pollIntervalMs |
Defines how often to poll for changes in policies |
30000 |
ranger.plugin.hdfs.policy.rest.client.connection.timeoutMs |
The HDFS Plugin RangerRestClient connection timeout (in milliseconds) |
120000 |
ranger.plugin.hdfs.policy.rest.client.read.timeoutMs |
The HDFS Plugin RangerRestClient read timeout (in milliseconds) |
30000 |
ranger.plugin.hdfs.policy.rest.ssl.config.file |
The path to the RangerRestClient SSL config file for the HDFS plugin |
/etc/hadoop/conf/ranger-hdfs-policymgr-ssl.xml |
Parameter | Description | Default value |
---|---|---|
HADOOP_CONF_DIR |
Hadoop configuration directory |
/etc/hadoop/conf |
HADOOP_LOG_DIR |
Location of the log directory |
${HTTPFS_LOG} |
HADOOP_PID_DIR |
PID file directory location |
${HTTPFS_TEMP} |
HTTPFS_SSL_ENABLED |
Defines if SSL is enabled for httpfs |
false |
HTTPFS_SSL_KEYSTORE_FILE |
The path to the keystore file |
admin |
HTTPFS_SSL_KEYSTORE_PASS |
The password to access the keystore |
admin |
Parameter | Description | Default value |
---|---|---|
HDFS_NAMENODE_OPTS |
NameNode Heap Memory. Sets initial (-Xms) and maximum (-Xmx) Java heap memory size and environment options for the NameNode |
-Xms1G -Xmx8G |
HDFS_DATANODE_OPTS |
DataNode Heap Memory. Sets initial (-Xms) and maximum (-Xmx) Java heap memory size and environment options for the DataNode |
-Xms700m -Xmx8G |
HDFS_HTTPFS_OPTS |
HttpFS Heap Memory. Sets initial (-Xms) and maximum (-Xmx) Java heap memory size and environment options for the httpfs server |
-Xms700m -Xmx8G |
HDFS_JOURNALNODE_OPTS |
JournalNode Heap Memory. Sets initial (-Xms) and maximum (-Xmx) Java heap memory size and environment options for the JournalNode |
-Xms700m -Xmx8G |
HDFS_ZKFC_OPTS |
ZKFC Heap Memory. Sets initial (-Xms) and maximum (-Xmx) Java heap memory size and environment options for ZKFC |
-Xms500m -Xmx8G |
Parameter | Description | Default value |
---|---|---|
ssl.server.truststore.location |
The truststore to be used by NameNodes and DataNodes |
— |
ssl.server.truststore.password |
The password to the truststore |
— |
ssl.server.truststore.type |
The truststore file format |
jks |
ssl.server.truststore.reload.interval |
The truststore reload check interval (in milliseconds) |
10000 |
ssl.server.keystore.location |
The path to the keystore file used by NameNodes and DataNodes |
— |
ssl.server.keystore.password |
The password to the keystore |
— |
ssl.server.keystore.keypassword |
The password to the key in the keystore |
— |
ssl.server.keystore.type |
The keystore file format |
— |
Parameter | Description | Default value |
---|---|---|
ssl.client.truststore.location |
The truststore to be used by NameNodes and DataNodes |
— |
ssl.client.truststore.password |
The password to the truststore |
— |
ssl.client.truststore.location |
The truststore to be used by NameNodes and DataNodes |
— |
ssl.client.truststore.type |
The truststore file format |
jks |
ssl.client.truststore.reload.interval |
The truststore reload check interval (in milliseconds) |
10000 |
ssl.client.keystore.location |
The path to the keystore file used by NameNodes and DataNodes |
— |
ssl.client.keystore.password |
The password to the keystore |
— |
ssl.client.keystore.keypassword |
The password to the key in the keystore |
— |
ssl.client.keystore.type |
The keystore file format |
— |
Parameter | Description | Default value |
---|---|---|
DECOMMISSIONED |
When an administrator decommissions a DataNode, the DataNode will first be transitioned into |
— |
IN_MAINTENANCE |
Sometimes administrators only need to take DataNodes down for minutes/hours to perform short-term repair/maintenance.
For such scenarios, the HDFS block replication overhead, incurred by decommission, might not be necessary and a light-weight process is desirable.
And that is what maintenance state is used for.
When an administrator puts a DataNode in the maintenance state, the DataNode will first be transitioned to |
— |
Parameter | Description | Default value |
---|---|---|
Custom core-site.xml |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file core-site.xml |
— |
Custom hdfs-site.xml |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file hdfs-site.xml |
— |
Custom httpfs-site.xml |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file httpfs-site.xml |
— |
Ranger plugin enabled |
Whether or not Ranger plugin is enabled |
— |
Custom ranger-hdfs-audit.xml |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ranger-hdfs-audit.xml |
— |
Custom ranger-hdfs-security.xml |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ranger-hdfs-security.xml |
— |
Custom ranger-hdfs-policymgr-ssl.xml |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ranger-hdfs-policymgr-ssl.xml |
— |
Custom httpfs-env.sh |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file httpfs-env.sh |
— |
Custom ssl-server.xml |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ssl-server.xml |
— |
Custom ssl-client.xml |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ssl-client.xml |
— |
Topology script |
The topology script used in HDFS |
— |
Topology data |
An otional text file to map host names to the rack number for topology script. Stored to /etc/hadoop/conf/topology.data |
— |
Custom log4j.properties |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file log4j.properties |
|
Custom httpfs-log4j.properties |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file httpfs-log4j.properties |
Hive
Parameter | Description | Default value |
---|---|---|
HADOOP_CLASSPATH |
A colon-delimited list of directories, files, or wildcard locations that include all necessary classes |
/etc/tez/conf/:/usr/lib/tez/:/usr/lib/tez/lib/ |
HIVE_HOME |
The Hive home directory |
/usr/lib/hive |
METASTORE_PORT |
The Hive Metastore port |
9083 |
Parameter | Description | Default value |
---|---|---|
HiveServer2 Heap Memory |
Sets initial (-Xms) and maximum (-Xmx) Java heap size for HiveServer2 |
-Xms256m -Xmx256m |
Hive Metastore Heap Memory |
Sets initial (-Xms) and maximum (-Xmx) Java heap size for Hive Metastore |
-Xms256m -Xmx256m |
Parameter | Description | Default value |
---|---|---|
hive.cbo.enable |
When set to |
true |
hive.compute.query.using.stats |
When set to |
false |
hive.execution.engine |
Selects the execution engine.
Supported values are: |
Tez |
hive.log.explain.output |
When enabled, logs the |
true |
hive.metastore.event.db.notification.api.auth |
Defines whether the Metastore should perform the authorization against database notification related APIs such as |
false |
hive.metastore.uris |
The Metastore URI used to access metadata in a remote metastore setup. For a remote metastore, you should specify the Thrift metastore server URI: thrift://<hostname>:<port> where <hostname> is a name or IP address of the Thrift metastore server, <port> is the port, on which the Thrift server is listening |
— |
hive.metastore.warehouse.dir |
The absolute HDFS file path of the default database for the warehouse, that is local to the cluster |
/apps/hive/warehouse |
hive.server2.enable.doAs |
Impersonate the connected user |
false |
hive.stats.fetch.column.stats |
Annotation of the operator tree with statistics information requires column statistics. Column statistics are fetched from the Metastore. Fetching column statistics for each needed column can be expensive, when the number of columns is high. This flag can be used to disable fetching of column statistics from the Metastore |
— |
hive.tez.container.size |
By default, Tez will spawn containers of the size of a mapper. This parameter can be used to overwrite the default value |
— |
hive.support.concurrency |
Defines whether Hive should support concurrency or not. A ZooKeeper instance must be up and running for the default Hive Lock Manager to support read/write locks |
false |
hive.txn.manager |
Set this to |
— |
javax.jdo.option.ConnectionUserName |
The metastore database user name |
APP |
javax.jdo.option.ConnectionPassword |
The password for the metastore user name |
— |
javax.jdo.option.ConnectionURL |
The JDBC connection URI used to access the data stored in the local Metastore setup. Use the following connection URI: jdbc:<datastore type>://<node name>:<port>/<database name> where:
For example, the following URI specifies a local metastore that uses MySQL as a data store: |
jdbc:mysql://{{ groups['mysql.master'][0] | d(omit) }}:3306/hive |
javax.jdo.option.ConnectionDriverName |
The JDBC driver class name used to access Hive Metastore |
com.mysql.jdbc.Driver |
hive.server2.transport.mode |
Sets the transport mode |
tcp |
hive.server2.thrift.http.port |
The port number for Thrift Server2 to listen on |
10001 |
hive.server2.thrift.http.path |
The HTTP endpoint of the Thrift Server2 service |
cliservice |
hive.server2.authentication.kerberos.principal |
Hive server Kerberos principal |
hive/_HOST@EXAMPLE.COM |
hive.server2.authentication.kerberos.keytab |
The path to the Kerberos keytab file containing the Hive server service principal |
/etc/security/keytabs/hive.service.keytab |
hive.server2.authentication.spnego.principal |
The SPNEGO Kerberos principal |
HTTP/_HOST@EXAMPLE.COM |
hive.server2.webui.spnego.principal |
The SPNEGO Kerberos principal to access Web UI |
— |
hive.server2.webui.spnego.keytab |
The SPNEGO Kerberos keytab file to access Web UI |
— |
hive.server2.webui.use.spnego |
Defines whether to use Kerberos SPNEGO for Web UI access |
false |
hive.server2.authentication.spnego.keytab |
The path to SPNEGO principal |
/etc/security/keytabs/HTTP.service.keytab |
hive.server2.authentication |
Sets the authentication mode |
NONE |
hive.metastore.sasl.enabled |
If |
false |
hive.metastore.kerberos.principal |
The service principal for the metastore Thrift server.
The |
hive/_HOST@EXAMPLE.COM |
hive.metastore.kerberos.keytab.file |
The path to the Kerberos keytab file containing the metastore Thrift server’s service principal |
/etc/security/keytabs/hive.service.keytab |
hive.server2.use.SSL |
Defines whether to use SSL for HiveServer2 |
false |
hive.server2.keystore.path |
The keystore to be used by Hive |
— |
hive.server2.keystore.password |
The password to the Hive keystore |
— |
hive.server2.truststore.path |
The truststore to be used by Hive |
— |
hive.server2.webui.use.ssl |
Defines whether to use SSL for the Hive web UI |
false |
hive.server2.webui.keystore.path |
The path to the keystore file used to access the Hive web UI |
— |
hive.server2.webui.keystore.password |
The password to the keystore file used to access the Hive web UI |
— |
hive.server2.support.dynamic.service.discovery |
Defines whether to support dynamic service discovery via ZooKeeper |
false |
hive.zookeeper.quorum |
A comma-separated list of ZooKeeper servers (<host>:<port>) running in the cluster |
zookeeper:2181 |
hive.server2.zookeeper.namespace |
Specifies the root namespace on ZooKeeper |
hiveserver2 |
Parameter | Description | Default value |
---|---|---|
xasecure.audit.destination.solr.batch.filespool.dir |
The spool directory path |
/srv/ranger/hdfs_plugin/audit_solr_spool |
xasecure.audit.destination.solr.urls |
Leave this property value empty or set it to |
— |
xasecure.audit.destination.solr.zookeepers |
Specifies the ZooKeeper connection string for the Solr destination |
— |
xasecure.audit.destination.solr.force.use.inmemory.jaas.config |
Uses in-memory JAAS configuration file to connect to Solr |
— |
xasecure.audit.is.enabled |
Enables Ranger audit |
true |
xasecure.audit.jaas.Client.loginModuleControlFlag |
Specifies whether the success of the module is |
— |
xasecure.audit.jaas.Client.loginModuleName |
The name of the authenticator class |
— |
xasecure.audit.jaas.Client.option.keyTab |
The name of the keytab file to get the principal’s secret key |
— |
xasecure.audit.jaas.Client.option.principal |
The name of the principal to be used |
— |
xasecure.audit.jaas.Client.option.serviceName |
Represents a user or a service that wants to log in |
— |
xasecure.audit.jaas.Client.option.storeKey |
Set this to |
false |
xasecure.audit.jaas.Client.option.useKeyTab |
Set this to |
false |
Parameter | Description | Default value |
---|---|---|
ranger.plugin.hive.policy.rest.url |
The URL to Ranger Admin |
— |
ranger.plugin.hive.service.name |
The name of the Ranger service containing policies for this instance |
— |
ranger.plugin.hive.policy.cache.dir |
The directory where Ranger policies are cached after successful retrieval from the source |
/srv/ranger/hive/policycache |
ranger.plugin.hive.policy.pollIntervalMs |
Defines how often to poll for changes in policies |
30000 |
ranger.plugin.hive.policy.rest.client.connection.timeoutMs |
The Hive Plugin RangerRestClient connection timeout (in milliseconds) |
120000 |
ranger.plugin.hive.policy.rest.client.read.timeoutMs |
The Hive Plugin RangerRestClient read timeout (in milliseconds) |
30000 |
xasecure.hive.update.xapolicies.on.grant.revoke |
Controls Hive Ranger policy update from SQL Grant/Revoke commands |
true |
ranger.plugin.hive.policy.rest.ssl.config.file |
The path to the RangerRestClient SSL config file for the Hive plugin |
/etc/hive/conf/ranger-hive-policymgr-ssl.xml |
Parameter | Description | Default value |
---|---|---|
xasecure.policymgr.clientssl.keystore |
The path to the keystore file used by Ranger |
— |
xasecure.policymgr.clientssl.keystore.credential.file |
The path to the keystore credentials file |
/etc/hive/conf/ranger-hive.jceks |
xasecure.policymgr.clientssl.truststore.credential.file |
The path to the truststore credentials file |
/etc/hive/conf/ranger-hive.jceks |
xasecure.policymgr.clientssl.truststore |
The path to the truststore file used by Ranger |
— |
xasecure.policymgr.clientssl.keystore.password |
The password to the keystore file |
— |
xasecure.policymgr.clientssl.truststore.password |
The password to the truststore file |
— |
Parameter | Description | Default value |
---|---|---|
tez.am.resource.memory.mb |
The amount of memory in MB, that YARN will allocate to the Tez Application Master. The size increases with the size of the DAG |
— |
tez.history.logging.service.class |
Enables Tez to use the Timeline Server for History Logging |
org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService |
tez.lib.uris |
HDFS paths containing the Tez JAR files |
${fs.defaultFS}/apps/tez/tez-0.9.2.tar.gz |
tez.task.resource.memory.mb |
The amount of memory used by launched tasks in TEZ containers. Usually this value is set in the DAG |
— |
tez.tez-ui.history-url.base |
The URL where the Tez UI is hosted |
— |
tez.use.cluster.hadoop-libs |
Specifies, whether Tez will use the cluster Hadoop libraries |
true |
Parameter | Description | Default value |
---|---|---|
ssl_certificate |
The path to the SSL certificate for NGINX |
/etc/ssl/certs/host_cert.cert |
ssl_certificate_key |
The path to the SSL certificate key for NGINX |
/etc/ssl/host_cert.key |
Parameter | Description | Default value |
---|---|---|
ACID Transactions |
Defines whether to enable ACID transactions |
false |
Database type |
The type of the external database used for Hive Metastore |
mysql |
Custom hive-site.xml |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file hive-site.xml |
— |
Custom hive-env.sh |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file hive-env.sh |
— |
Ranger plugin enabled |
Whether or not Ranger plugin is enabled |
false |
Custom ranger-hive-audit.xml |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ranger-hive-audit.xml |
— |
Custom ranger-hive-security.xml |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ranger-hive-security.xml |
— |
Custom ranger-hive-policymgr-ssl.xml |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ranger-hive-policymgr-ssl.xml |
— |
Custom tez-site.xml |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file tez-site.xml |
— |
Impala
Parameter | Description | Default value |
---|---|---|
impala-env.sh |
The contents of the impala-env.sh file that contains Impala environment settings |
Parameter | Description | Default value |
---|---|---|
hostname |
The hostname to use for the Impala daemon. If Kerberos is enabled, it is also used as a part of the Kerberos principal. If this option is not set, the system default is used |
— |
beeswax_port |
The port on which Impala daemons serve Beeswax client requests |
21000 |
fe_port |
The frontend port of the Impala daemon |
21000 |
be_port |
Internal use only. Impala daemons use this port for Thrift-based communication with each other |
22000 |
krpc_port |
Internal use only. Impala daemons use this port for KRPC-based communication with each other |
27000 |
hs2_port |
The port on which Impala daemons serve HiveServer2 client requests |
21050 |
hs2_http_port |
The port is used by client applications to transmit commands and receive results over HTTP via the HiveServer2 protocol |
28000 |
enable_webserver |
Enables or disables the Impala daemon web server. Its Web UI contains information about configuration settings, running and completed queries, and associated resource usage for them. It is primarily used for diagnosing query problems that can be traced to a particular node |
True |
webserver_require_spnego |
Enables the Kerberos authentication for Hadoop HTTP web consoles for all roles of this service using the SPNEGO protocol. Use this option only if Kerberos is enabled for the HDFS service |
False |
webserver_port |
The port where the Impala daemon web server is running |
25000 |
catalog_service_host |
The host where the Impala Catalog Service component is running |
— |
catalog_service_port |
The port on which the Impala Catalog Service component listens |
26000 |
state_store_host |
The host where the Impala Statestore component is running |
— |
state_store_port |
The port on which the Impala Statestore component is running |
24000 |
state_store_subscriber_port |
The port where StateStoreSubscriberService is running. StateStoreSubscriberService listens on this port for updates from the Statestore daemon |
23030 |
scratch_dirs |
The directory where Impala Daemons writes data to free up memory during large sort, join, aggregation, and other operations. The files are removed when the operation finishes. This can potentially be large amounts of data |
/srv/impala/ |
log_dir |
The directory where an Impala daemon places its log files |
/var/log/impala/impalad/ |
log_filename |
The Prefix of the log filename — the full path is |
impalad |
max_log_files |
The number of log files that are kept for each severity level ( |
10 |
audit_event_log_dir |
The directory in which Impala daemon audit event log files are written if the |
/var/log/impala/impalad/audit |
minidump_path |
The directory for storing Impala daemon Breakpad dumps |
/var/log/impala-minidumps |
lineage_event_log_dir |
The directory in which the Impala daemon generates its lineage log files if the |
/var/log/impala/impalad/lineage |
local_library_dir |
The local directory into which an Impala daemon copies user-defined function (UDF) libraries from HDFS |
/usr/lib/impala/udfs |
max_lineage_log_file_size |
The maximum size (in entries) of the Impala daemon lineage log file. When the size is exceeded, a new file is created |
5000 |
max_audit_event_log_file_size |
The maximum size (in queries) of the Impala Daemon audit event log file. When the size is exceeded, a new file is created |
5000 |
fe_service_threads |
The maximum number of concurrent client connections allowed. The parameter determines how many queries can run simultaneously. When more clients try to connect to Impala, the later arriving clients have to wait until previous clients disconnect. Setting the |
64 |
mem_limit |
The memory limit (in bytes) for an Impala daemon enforced by the daemon itself. This limit does not include memory consumed by the daemon’s embedded JVM. The Impala daemon uses up this amount of memory for query processing, cached data, network buffers, background operations, etc. If the limit is exceeded, queries will be killed until the used memory becomes under the limit |
1473249280 |
idle_query_timeout |
The time in seconds after which an idle query (no processing work is done and no updates are received from the client) is cancelled. If set to |
0 |
idle_session_timeout |
The time in seconds after which Impala closes an idle session and cancels all running queries. If set to |
0 |
max_result_cache_size |
The maximum number of query results a client can request to be cached on a per-query basis to support restarting fetches. This option guards against unreasonably large result caches. Requests exceeding this maximum are rejected |
100000 |
max_cached_file_handles |
The maximum number of cached HDFS file handles. Caching HDFS file handles reduces the number of new file handles opened and thus reduces the load on a HDFS NameNode. Each cached file handle consumes a small amount of memory. If set to |
20000 |
unused_file_handle_timeout_sec |
The maximum time in seconds during which an unused HDFS file handle remains in the HDFS file handle cache. When the underlying file for a cached file handle is deleted, the disk space may not be freed until the cached file handle is removed from the cache. This timeout allows the disk space occupied by deleted files to be freed in a predictable period of time. If set to |
21600 |
statestore_subscriber_timeout_seconds |
The timeout in seconds for Impala Daemon and Catalog Server connections to Statestore |
30 |
default_query_options |
A list of key/value pairs representing additional query options to pass to the Impala Daemon command line, separated by commas |
default_file_format=parquet,default_transactional_type=none |
load_auth_to_local_rules |
If checked (True) and Kerberos is enabled for Impala, Impala uses the |
True |
catalog_topic_mode |
The granularity of on-demand metadata fetches between the Impala Daemon coordinator and Impala Catalog Service. See Metadata management |
minimal |
use_local_catalog |
Allows coordinators to cache metadata from Impala Catalog Service. If this is set to |
True |
abort_on_failed_audit_event |
Specifies whether shutdown Impala if there is a problem with recording an audit event |
False |
max_minidumps |
The maximum number of Breakpad dump files stored by the Impala daemon. A negative value or |
9 |
authorized_proxy_user_config |
Specifies the set of authorized proxy users (the users who can impersonate other users during authorization), and users who they are allowed to impersonate. The example of syntax for the option is: |
knox=*;zeppelin=* |
queue_wait_timeout_ms |
The maximum amount of time (in milliseconds) that a request waits to be admitted before timing out. Must be a positive integer |
60000 |
disk_spill_encryption |
Specifies whether to encrypt and verify the integrity of all data spilled to the disk as part of a query |
False |
abort_on_config_error |
Specifies whether to abort Impala startup if there are incorrect configs or Impala is running on unsupported hardware |
True |
kerberos_reinit_interval |
The number of minutes between reestablishing the ticket with the Kerberos server |
60 |
principal |
The service Kerberos principal |
— |
keytab_file |
The service Kerberos keytab file |
— |
ssl_server_certificate |
The path to the TLS/SSL file with the server certificate key used for TLS/SSL. It is used when Impala operates as a TLS/SSL server. The certificate file must be in the PEM format |
— |
ssl_private_key |
The path to the TLS/SSL file with the private key used for TLS/SSL. It is used when Impala operates as a TLS/SSL server. The file must be in the PEM format |
— |
ssl_client_ca_certificate |
The path to the certificate, in the PEM format, used to confirm the authenticity of SSL/TLS servers that the Impala daemons can connect to. Since the Impala daemons connect to each other, it should also include the CA certificate used to sign all the SSL/TLS certificates. SSL/TLS between Impala daemons cannot be enabled without this parameter |
— |
webserver_certificate_file |
The path to the TLS/SSL file with the server certificate key used for TLS/SSL. It is used when the Impala daemon web server operates as a TLS/SSL server. The certificate file must be in the PEM format |
— |
webserver_private_key_file |
The path to the TLS/SSL file with the private key used for TLS/SSL. It is used when the Impala daemon web server operates as a TLS/SSL server. The certificate file must be in the PEM format |
— |
ssl_minimum_version |
The minimum version of TLS |
TLSv1.2 |
Parameter | Description | Default value |
---|---|---|
log4j.properties |
Apache Log4j utility settings |
log.threshold=INFO main.logger=FA impala.root.logger=DEBUG,FA log4j.rootLogger=DEBUG,FA log.dir=/var/log/impala/impalad max.log.file.size=200MB log4j.appender.FA=org.apache.log4j.FileAppender log4j.appender.FA.File=/var/log/impalad/impalad.INFO log4j.appender.FA.layout=org.apache.log4j.PatternLayout log4j.appender.FA.layout.ConversionPattern=%p%d{MMdd HH:mm:ss.SSS'000'} %t %c] %m%n log4j.appender.console=org.apache.log4j.ConsoleAppender log4j.appender.console.target=System.err log4j.appender.console.layout=org.apache.log4j.PatternLayout log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{2}: %m%n |
Enable custom ulimits |
Switch on the corresponding toggle button to specify resource limits (ulimits) for the current process. If you do not set these values, the default system settings are used. Ulimit settings are described in the table below |
[Manager] DefaultLimitCPU= DefaultLimitFSIZE= DefaultLimitDATA= DefaultLimitSTACK= DefaultLimitCORE= DefaultLimitRSS= DefaultLimitNOFILE= DefaultLimitAS= DefaultLimitNPROC= DefaultLimitMEMLOCK= DefaultLimitLOCKS= DefaultLimitSIGPENDING= DefaultLimitMSGQUEUE= DefaultLimitNICE= DefaultLimitRTPRIO= DefaultLimitRTTIME= |
Parameter | Description | Corresponding option of the ulimit command in CentOS |
---|---|---|
DefaultLimitCPU |
A limit in seconds on the amount of CPU time that a process can consume |
cpu time ( -t) |
DefaultLimitFSIZE |
The maximum size of files that a process can create, in 512-byte blocks |
file size ( -f) |
DefaultLimitDATA |
The maximum size of a process’s data segment, in kilobytes |
data seg size ( -d) |
DefaultLimitSTACK |
The maximum stack size allocated to a process, in kilobytes |
stack size ( -s) |
DefaultLimitCORE |
The maximum size of a core dump file allowed for a process, in 512-byte blocks |
core file size ( -c) |
DefaultLimitRSS |
The maximum of resident set size, in kilobytes |
max memory size ( -m) |
DefaultLimitNOFILE |
The maximum number of open file descriptors allowed for the process |
open files ( -n) |
DefaultLimitAS |
The maximum size of the process virtual memory (address space), in kilobytes |
virtual memory ( -v) |
DefaultLimitNPROC |
The maximum number of processes |
max user processes ( -u) |
DefaultLimitMEMLOCK |
The maximum memory size that can be locked for the process, in kilobytes. Memory locking ensures the memory is always in RAM and a swap file is not used |
max locked memory ( -l) |
DefaultLimitLOCKS |
The maximum number of files locked by a process |
file locks ( -x) |
DefaultLimitSIGPENDING |
The maximum number of signals that are pending for delivery to the calling thread |
pending signals ( -i) |
DefaultLimitMSGQUEUE |
The maximum number of bytes in POSIX message queues. POSIX message queues allow processes to exchange data in the form of messages |
POSIX message queues ( -q) |
DefaultLimitNICE |
The maximum NICE priority level that can be assigned to a process |
scheduling priority ( -e) |
DefaultLimitRTPRIO |
The maximum real-time scheduling priority level |
real-time priority ( -r) |
DefaultLimitRTTIME |
The maximum pipe buffer size, in 512-byte blocks |
pipe size ( -p) |
Parameter | Description | Default value |
---|---|---|
hostname |
The hostname to use for the Statestore daemon. If Kerberos is enabled, it is also used as a part of the Kerberos principal. If this option is not set, the system default is used |
— |
state_store_host |
The host where the Impala Statestore component is running |
— |
state_store_port |
The port on which the Impala Statestore component is running |
24000 |
catalog_service_host |
The host where the Impala Catalog Service component is running |
— |
catalog_service_port |
The port on which the Impala Catalog Service component listens |
26000 |
enable_webserver |
Enables or disables the Statestore daemon web server. Its Web UI contains information about memory usage, configuration settings, and ongoing health checks performed by Statestore |
True |
webserver_require_spnego |
Enables the Kerberos authentication for Hadoop HTTP web consoles for all roles of this service using the SPNEGO protocol. Use this option only if Kerberos is enabled for the HDFS service |
False |
webserver_port |
The port on which the Statestore web server is running |
25010 |
log_dir |
The directory where the Statestore daemon places its log files |
/var/log/impala/statestored/ |
log_filename |
The Prefix of the log filename — the full path is |
statestored |
max_log_files |
The number of log files that are kept for each severity level ( |
10 |
minidump_path |
The directory for storing Statestore daemon Breakpad dumps |
/var/log/impala-minidumps |
max_minidumps |
The maximum number of Breakpad dump files stored by Statestore daemon. A negative value or |
9 |
state_store_num_server_worker_threads |
The number of worker threads for the thread manager of the Statestore Thrift server |
4 |
state_store_pending_task_count_max |
The maximum number of tasks allowed to be pending by the thread manager of the Statestore Thrift server. The |
0 |
kerberos_reinit_interval |
The number of minutes between reestablishing the ticket with the Kerberos server |
60 |
principal |
The service Kerberos principal |
— |
keytab_file |
The service Kerberos keytab file |
— |
ssl_server_certificate |
The path to the TLS/SSL file with the server certificate key used for TLS/SSL. It is used when Impala operates as a TLS/SSL server. The certificate file must be in the PEM format |
— |
ssl_private_key |
The path to the TLS/SSL file with the private key used for TLS/SSL. It is used when Impala operates as a TLS/SSL server. The file must be in the PEM format |
— |
ssl_client_ca_certificate |
The path to the certificate, in the PEM format, used to confirm the authenticity of SSL/TLS servers that the Impala daemons can connect to. Since the Impala daemons connect to each other, it should also include the CA certificate used to sign all the SSL/TLS certificates. SSL/TLS between Impala daemons cannot be enabled without this parameter |
— |
webserver_certificate_file |
The path to the TLS/SSL file with the server certificate key used for TLS/SSL. It is used when the Statestore web server operates as a TLS/SSL server. The certificate file must be in the PEM format |
— |
webserver_private_key_file |
The path to the TLS/SSL file with the private key used for TLS/SSL. It is used when the Statestore web server operates as a TLS/SSL server. The certificate file must be in the PEM format |
— |
ssl_minimum_version |
The minimum version of TLS |
TLSv1.2 |
Parameter | Description | Default value |
---|---|---|
Custom statestore.conf |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file statestore.conf |
— |
Enable custom ulimits |
Switch on the corresponding toggle button to specify resource limits (ulimits) for the current process. If you do not set these values, the default system settings are used. Ulimit settings are described in the table below |
[Manager] DefaultLimitCPU= DefaultLimitFSIZE= DefaultLimitDATA= DefaultLimitSTACK= DefaultLimitCORE= DefaultLimitRSS= DefaultLimitNOFILE= DefaultLimitAS= DefaultLimitNPROC= DefaultLimitMEMLOCK= DefaultLimitLOCKS= DefaultLimitSIGPENDING= DefaultLimitMSGQUEUE= DefaultLimitNICE= DefaultLimitRTPRIO= DefaultLimitRTTIME= |
Parameter | Description | Corresponding option of the ulimit command in CentOS |
---|---|---|
DefaultLimitCPU |
A limit in seconds on the amount of CPU time that a process can consume |
cpu time ( -t) |
DefaultLimitFSIZE |
The maximum size of files that a process can create, in 512-byte blocks |
file size ( -f) |
DefaultLimitDATA |
The maximum size of a process’s data segment, in kilobytes |
data seg size ( -d) |
DefaultLimitSTACK |
The maximum stack size allocated to a process, in kilobytes |
stack size ( -s) |
DefaultLimitCORE |
The maximum size of a core dump file allowed for a process, in 512-byte blocks |
core file size ( -c) |
DefaultLimitRSS |
The maximum of resident set size, in kilobytes |
max memory size ( -m) |
DefaultLimitNOFILE |
The maximum number of open file descriptors allowed for the process |
open files ( -n) |
DefaultLimitAS |
The maximum size of the process virtual memory (address space), in kilobytes |
virtual memory ( -v) |
DefaultLimitNPROC |
The maximum number of processes |
max user processes ( -u) |
DefaultLimitMEMLOCK |
The maximum memory size that can be locked for the process, in kilobytes. Memory locking ensures the memory is always in RAM and a swap file is not used |
max locked memory ( -l) |
DefaultLimitLOCKS |
The maximum number of files locked by a process |
file locks ( -x) |
DefaultLimitSIGPENDING |
The maximum number of signals that are pending for delivery to the calling thread |
pending signals ( -i) |
DefaultLimitMSGQUEUE |
The maximum number of bytes in POSIX message queues. POSIX message queues allow processes to exchange data in the form of messages |
POSIX message queues ( -q) |
DefaultLimitNICE |
The maximum NICE priority level that can be assigned to a process |
scheduling priority ( -e) |
DefaultLimitRTPRIO |
The maximum real-time scheduling priority level |
real-time priority ( -r) |
DefaultLimitRTTIME |
The maximum pipe buffer size, in 512-byte blocks |
pipe size ( -p) |
Parameter | Description | Default value |
---|---|---|
hostname |
The hostname to use for the Catalog Service daemon. If Kerberos is enabled, it is also used as a part of the Kerberos principal. If this option is not set, the system default is used |
— |
state_store_host |
The host where the Impala Statestore component is running |
— |
state_store_port |
The port on which the Impala Statestore component is running |
24000 |
catalog_service_host |
The host where the Impala Catalog Service component is running |
— |
catalog_service_port |
The port on which the Impala Catalog Service component listens |
26000 |
enable_webserver |
Enables or disables the Catalog Service web server. Its Web UI includes information about the databases, tables, and other objects managed by Impala, in addition to the resource usage and configuration settings of the Catalog Service |
True |
webserver_require_spnego |
Enables the Kerberos authentication for Hadoop HTTP web consoles for all roles of this service using the SPNEGO protocol. Use this option only if Kerberos is enabled for the HDFS service |
False |
webserver_port |
The port on which the Catalog Service web server is running |
25020 |
log_dir |
The directory where the Catalog Service daemon places its log files |
/var/log/impala/catalogd/ |
log_filename |
The Prefix of the log filename — the full path is |
catalogd |
max_log_files |
The number of log files that are kept for each severity level ( |
10 |
minidump_path |
The directory for storing the Catalog Service daemon Breakpad dumps |
/var/log/impala-minidumps |
max_minidumps |
The maximum number of Breakpad dump files stored by Catalog Service. A negative value or |
9 |
hms_event_polling_interval_s |
When this parameter is set to a positive integer, Catalog Service fetches new notifications from Hive Metastore at the specified interval in seconds. If |
2 |
load_auth_to_local_rules |
If checked (True) and Kerberos is enabled for Impala, Impala uses the |
True |
load_catalog_in_background |
If it is set to |
False |
catalog_topic_mode |
The granularity of on-demand metadata fetches between the Impala Daemon coordinator and Impala Catalog Service. See Metadata management |
minimal |
statestore_subscriber_timeout_seconds |
The timeout in seconds for Impala Daemon and Catalog Server connections to Statestore |
30 |
state_store_subscriber_port |
The port where StateStoreSubscriberService is running. StateStoreSubscriberService listens on this port for updates from the Statestore daemon |
23020 |
kerberos_reinit_interval |
The number of minutes between reestablishing the ticket with the Kerberos server |
60 |
principal |
The service Kerberos principal |
— |
keytab_file |
The service Kerberos keytab file |
— |
ssl_server_certificate |
The path to the TLS/SSL file with the server certificate key used for TLS/SSL. It is used when Impala operates as a TLS/SSL server. The certificate file must be in the PEM format |
— |
ssl_private_key |
The path to the TLS/SSL file with the private key used for TLS/SSL. It is used when Impala operates as a TLS/SSL server. The file must be in the PEM format |
— |
ssl_client_ca_certificate |
The path to the certificate, in the PEM format, used to confirm the authenticity of SSL/TLS servers that the Impala daemons can connect to. Since the Impala daemons connect to each other, it should also include the CA certificate used to sign all the SSL/TLS certificates. SSL/TLS between Impala daemons cannot be enabled without this parameter |
— |
webserver_certificate_file |
The path to the TLS/SSL file with the server certificate key used for TLS/SSL. It is used when the Catalog Service web server operates as a TLS/SSL server. The certificate file must be in the PEM format |
— |
webserver_private_key_file |
The path to the TLS/SSL file with the private key used for TLS/SSL. It is used when the Catalog Service web server operates as a TLS/SSL server. The certificate file must be in the PEM format |
— |
ssl_minimum_version |
The minimum version of TLS |
TLSv1.2 |
Parameter | Description | Default value |
---|---|---|
Custom catalogstore.conf |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file catalogstore.conf |
— |
Enable custom ulimits |
Switch on the corresponding toggle button to specify resource limits (ulimits) for the current process. If you do not set these values, the default system settings are used. Ulimit settings are described in the table below |
[Manager] DefaultLimitCPU= DefaultLimitFSIZE= DefaultLimitDATA= DefaultLimitSTACK= DefaultLimitCORE= DefaultLimitRSS= DefaultLimitNOFILE= DefaultLimitAS= DefaultLimitNPROC= DefaultLimitMEMLOCK= DefaultLimitLOCKS= DefaultLimitSIGPENDING= DefaultLimitMSGQUEUE= DefaultLimitNICE= DefaultLimitRTPRIO= DefaultLimitRTTIME= |
Parameter | Description | Corresponding option of the ulimit command in CentOS |
---|---|---|
DefaultLimitCPU |
A limit in seconds on the amount of CPU time that a process can consume |
cpu time ( -t) |
DefaultLimitFSIZE |
The maximum size of files that a process can create, in 512-byte blocks |
file size ( -f) |
DefaultLimitDATA |
The maximum size of a process’s data segment, in kilobytes |
data seg size ( -d) |
DefaultLimitSTACK |
The maximum stack size allocated to a process, in kilobytes |
stack size ( -s) |
DefaultLimitCORE |
The maximum size of a core dump file allowed for a process, in 512-byte blocks |
core file size ( -c) |
DefaultLimitRSS |
The maximum of resident set size, in kilobytes |
max memory size ( -m) |
DefaultLimitNOFILE |
The maximum number of open file descriptors allowed for the process |
open files ( -n) |
DefaultLimitAS |
The maximum size of the process virtual memory (address space), in kilobytes |
virtual memory ( -v) |
DefaultLimitNPROC |
The maximum number of processes |
max user processes ( -u) |
DefaultLimitMEMLOCK |
The maximum memory size that can be locked for the process, in kilobytes. Memory locking ensures the memory is always in RAM and a swap file is not used |
max locked memory ( -l) |
DefaultLimitLOCKS |
The maximum number of files locked by a process |
file locks ( -x) |
DefaultLimitSIGPENDING |
The maximum number of signals that are pending for delivery to the calling thread |
pending signals ( -i) |
DefaultLimitMSGQUEUE |
The maximum number of bytes in POSIX message queues. POSIX message queues allow processes to exchange data in the form of messages |
POSIX message queues ( -q) |
DefaultLimitNICE |
The maximum NICE priority level that can be assigned to a process |
scheduling priority ( -e) |
DefaultLimitRTPRIO |
The maximum real-time scheduling priority level |
real-time priority ( -r) |
DefaultLimitRTTIME |
The maximum pipe buffer size, in 512-byte blocks |
pipe size ( -p) |
Kyuubi
Parameter | Description | Default value |
---|---|---|
kyuubi.frontend.rest.bind.port |
Port on which the REST frontend service runs |
10099 |
kyuubi.frontend.thrift.binary.bind.port |
Port on which the Thrift frontend service runs via a binary protocol |
10099 |
kyuubi.frontend.thrift.http.bind.port |
Port on which the Thrift frontend service runs via HTTP |
10010 |
kyuubi.frontend.thrift.http.path |
The |
cliservice |
kyuubi.engine.share.level |
An engine share level. Possible values: |
USER |
kyuubi.engine.type |
An engine type supported by Kyuubi. Possible values: |
SPARK_SQL |
kyuubi.operation.language |
Programming language used to interpret inputs. Possible values: |
SQL |
kyuubi.frontend.protocols |
A comma-separated list for supported frontend protocols. Possible values: |
THRIFT_BINARY |
kyuubi.frontend.thrift.binary.ssl.disallowed.protocols |
Forbidden SSL versions for Thrift binary frontend |
SSLv2,SSLv3,TLSv1.1 |
kyuubi.frontend.thrift.http.ssl.protocol.blacklist |
Forbidden SSL versions for Thrift HTTP frontend |
SSLv2,SSLv3,TLSv1.1 |
kyuubi.ha.addresses |
External Kyuubi instance addresses |
<hostname_1>:2181, …, <hostname_N>:2181 |
kyuubi.ha.namespace |
The root directory for the service to deploy its instance URI |
kyuubi |
kyuubi.metadata.store.jdbc.database.type |
A database type for the server metadata store. Possible values: |
POSTGRESQL |
kyuubi.metadata.store.jdbc.url |
A JDBC URL for the server metadata store |
jdbc:postgresql://{{ groups['adpg.adpg'][0] | d(omit) }}:5432/kyuubi |
kyuubi.metadata.store.jdbc.driver |
A JDBC driver classname for the server metadata store |
org.postgresql.Driver |
kyuubi.metadata.store.jdbc.user |
A username for the server metadata store |
kyuubi |
kyuubi.metadata.store.jdbc.password |
A password for the server metadata store |
— |
kyuubi.frontend.thrift.binary.ssl.enabled |
Indicates whether to use the SSL encryption in the Thrift binary mode |
false |
kyuubi.frontend.thrift.http.use.SSL |
Indicates whether to use the SSL encryption in the Thrift HTTP mode |
false |
kyuubi.frontend.ssl.keystore.type |
Type of the SSL certificate keystore |
— |
kyuubi.frontend.ssl.keystore.path |
Path to the SSL certificate keystore |
— |
kyuubi.frontend.ssl.keystore.password |
Password for the SSL certificate keystore |
— |
kyuubi.frontend.thrift.http.ssl.keystore.path |
Path to the SSL certificate keystore |
— |
kyuubi.frontend.thrift.http.ssl.keystore.password |
Password for the SSL certificate keystore |
— |
kyuubi.authentication |
Authentication type. Possible values: |
NONE |
kyuubi.ha.zookeeper.acl.enabled |
Indicates whether the ZooKeeper ensemble is kerberized |
false |
kyuubi.ha.zookeeper.auth.type |
ZooKeeper authentication type. Possible values: |
NONE |
kyuubi.ha.zookeeper.auth.principal |
Kerberos principal name used for ZooKeeper authentication |
— |
kyuubi.ha.zookeeper.auth.keytab |
Path to Kyuubi Server’s keytab used for ZooKeeper authentication |
— |
kyuubi.kinit.principal |
Name of the Kerberos principal |
— |
kyuubi.kinit.keytab |
Path to Kyuubi Server’s keytab |
— |
kyuubi.spnego.principal |
Name of the SPNego service principal. Set only if using SPNego in authentication |
— |
kyuubi.spnego.keytab |
Path to the SPNego service keytab. Set only if using SPNego in authentication |
— |
kyuubi.engine.hive.java.options |
Extra Java options for the Hive query engine |
— |
Parameter | Description | Default value |
---|---|---|
KYUUBI_HOME |
Kyuubi home directory |
/usr/lib/kyuubi |
KYUUBI_CONF_DIR |
Directory that stores Kyuubi configurations |
/etc/kyuubi/conf |
KYUUBI_LOG_DIR |
Kyuubi server log directory |
/var/log/kyuubi |
KYUUBI_PID_DIR |
Directory that stores the Kyuubi instance .pid-file |
/var/run/kyuubi |
KYUUBI_ADDITIONAL_CLASSPATH |
Path to a directory with additional SSM libraries |
/usr/lib/ssm/lib/smart* |
HADOOP_HOME |
Hadoop home directory |
/usr/lib/hadoop |
HADOOP_LIB_DIR |
Directory that stores Hadoop libraries |
${HADOOP_HOME}/lib |
KYUUBI_JAVA_OPTS |
Java parameters for Kyuubi |
-Djava.library.path=${HADOOP_LIB_DIR}/native/ -Djava.io.tmpdir={{ cluster.config.java_tmpdir | d('/tmp') }} |
HADOOP_CLASSPATH |
A common |
$HADOOP_CLASSPATH:/usr/lib/ssm/lib/smart* |
HADOOP_CONF_DIR |
Directory that stores Hadoop configurations |
/etc/hadoop/conf |
SPARK_HOME |
Spark home directory |
/usr/lib/spark3 |
SPARK_CONF_DIR |
Directory that stores Spark configurations |
/etc/spark3/conf |
FLINK_HOME |
Flink home directory |
/usr/lib/flink |
FLINK_CONF_DIR |
Directory that stores Flink configurations |
/etc/flink/conf |
FLINK_HADOOP_CLASSPATH |
Additional Hadoop .jar files required to use the Kyuubi Flink engine |
$(hadoop classpath):/usr/lib/ssm/lib/smart* |
HIVE_HOME |
Hive home directory |
/usr/lib/hive |
HIVE_CONF_DIR |
Directory that stores Hive configurations |
/etc/hive/conf |
HIVE_HADOOP_CLASSPATH |
Additional Hadoop .jar files required to use the Kyuubi Hive engine |
$(hadoop classpath):/etc/tez/conf/:/usr/lib/tez/*:/usr/lib/tez/lib/*:/usr/lib/ssm/lib/smart* |
Solr
Parameter | Description | Default value |
---|---|---|
SOLR_HOME |
The location for index data and configs |
/srv/solr/server |
SOLR_AUTH_TYPE |
Specifies the authentication type for Solr |
— |
SOLR_AUTHENTICATION_OPTS |
Solr authentication options |
— |
GC_TUNE |
JVM parameters for Solr |
-XX:-UseLargePages |
SOLR_SSL_KEY_STORE: |
The path to the Solr keystore file (.jks) |
— |
SOLR_SSL_KEY_STORE_PASSWORD |
The password to the Solr keystore file |
— |
SOLR_SSL_TRUST_STORE |
The path to the Solr truststore file (.jks) |
— |
SOLR_SSL_TRUST_STORE_PASSWORD |
The password to the Solr truststore file |
— |
SOLR_SSL_NEED_CLIENT_AUTH |
Defines if client authentication is enabled |
false |
SOLR_SSL_WANT_CLIENT_AUTH |
Enables clients to authenticate (but not requires) |
false |
SOLR_SSL_CLIENT_HOSTNAME_VERIFICATION |
Defines whether to enable hostname verification |
false |
SOLR_HOST |
Specifies the host name of the Solr server |
— |
Parameter | Description | Default value |
---|---|---|
ZK_HOST |
Comma-separated locations of all servers in the ensemble and the ports on which they communicate.
You can put ZooKeeper chroot at the end of your |
— |
Parameter | Description | Default value |
---|---|---|
Solr Server Heap Memory |
Sets initial (-Xms) and maximum (-Xmx) Java heap size for Solr Server |
-Xms512m -Xmx512m |
Parameter | Description | Default value |
---|---|---|
xasecure.audit.solr.solr_url |
A path to a Solr collection to store audit logs |
— |
xasecure.audit.solr.async.max.queue.size |
The maximum size of internal queue used for storing audit logs |
1 |
xasecure.audit.solr.async.max.flush.interval.ms |
The maximum time interval between flushes to disk (in milliseconds) |
100 |
Parameter | Description | Default value |
---|---|---|
ranger.plugin.solr.policy.rest.url |
The URL to Ranger Admin |
— |
ranger.plugin.solr.service.name |
The name of the Ranger service containing policies for this instance |
— |
ranger.plugin.solr.policy.cache.dir |
The directory where Ranger policies are cached after successful retrieval from the source |
/srv/ranger/yarn/policycache |
ranger.plugin.solr.policy.pollIntervalMs |
Defines how often to poll for changes in policies |
30000 |
ranger.plugin.solr.policy.rest.client.connection.timeoutMs |
The Solr Plugin RangerRestClient connection timeout (in milliseconds) |
120000 |
ranger.plugin.solr.policy.rest.client.read.timeoutMs |
The Solr Plugin RangerRestClient read timeout (in milliseconds) |
30000 |
Parameter | Description | Default value |
---|---|---|
xasecure.policymgr.clientssl.keystore |
The path to the keystore file used by Ranger |
— |
xasecure.policymgr.clientssl.keystore.credential.file |
The path to the keystore credentials file |
/etc/solr/conf/ranger-solr.jceks |
xasecure.policymgr.clientssl.truststore.credential.file |
The path to the truststore credentials file |
/etc/solr/conf/ranger-solr.jceks |
xasecure.policymgr.clientssl.truststore |
The path to the truststore file used by Ranger |
— |
xasecure.policymgr.clientssl.keystore.password |
The password to the keystore file |
— |
xasecure.policymgr.clientssl.truststore.password |
The password to the truststore file |
— |
Parameter | Description | Default value |
---|---|---|
solr.xml |
The content of solr.xml |
|
Custom solr-env.sh |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file solr-env.sh |
— |
Ranger plugin enabled |
Enables the Ranger plugin |
false |
Spark
Parameter | Description | Default value |
---|---|---|
Dynamic allocation (spark.dynamicAllocation.enabled) |
Defines whether to use dynamic resource allocation that scales the number of executors, registered with this application, up and down, based on the workload |
false |
Parameter | Description | Default value |
---|---|---|
spark.yarn.archive |
The archive containing needed Spark JARs for distribution to the YARN cache.
If set, this configuration replaces |
hdfs:///apps/spark/spark-yarn-archive.tgz |
spark.master |
The cluster manager to connect to |
yarn |
spark.yarn.historyServer.address |
Spark History server address |
— |
spark.dynamicAllocation.enabled |
Defines whether to use dynamic resource allocation that scales the number of executors, registered with this application, up and down, based on the workload |
false |
spark.shuffle.service.enabled |
Enables the external shuffle service. This service preserves the shuffle files written by executors so that executors can be safely removed, or so that shuffle fetches can continue in the event of executor failure. The external shuffle service must be set up in order to enable it |
false |
spark.eventLog.enabled |
Defines whether to log Spark events, useful for reconstructing the Web UI after the application has finished |
true |
spark.eventLog.dir |
The base directory where Spark events are logged, if |
hdfs:///var/log/spark/apps |
spark.serializer |
The class to use for serializing objects that will be sent over the network or need to be cached in serialized form.
The default of Java serialization works with any |
org.apache.spark.serializer.KryoSerializer |
spark.dynamicAllocation.executorIdleTimeout |
If dynamic allocation is enabled and an executor has been idle for more than this duration, the executor will be removed. For more details, see Spark documentation |
120s |
spark.dynamicAllocation.cachedExecutorIdleTimeout |
If dynamic allocation is enabled and an executor which has cached data blocks has been idle for more than this duration, the executor will be removed. For more details, see Spark documentation |
600s |
spark.history.provider |
The name of the class that implements the application history backend. Currently there is only one implementation provided with Spark that looks for application logs stored in the file system |
org.apache.spark.deploy.history.FsHistoryProvider |
spark.history.fs.cleaner.enabled |
Specifies whether the History Server should periodically clean up event logs from storage |
true |
spark.history.store.path |
A local directory where to cache application history data. If set, the History Server will store application data on disk instead of keeping it in memory. The data written to disk will be re-used in case of the History Server restart |
/var/log/spark/history |
spark.driver.extraClassPath |
Extra classpath entries to prepend to the classpath of the driver |
/usr/lib/hive/lib/hive-shims-scheduler.jar:/usr/lib/hadoop-yarn/hadoop-yarn-server-resourcemanager.jar |
spark.history.ui.port |
The port number of the History Server web UI |
18082 |
spark.history.fs.logDirectory |
The log directory of the History Server |
hdfs:///var/log/spark/apps |
spark.sql.hive.metastore.jars |
The location of the JARs that should be used to instantiate HiveMetastoreClient |
/usr/lib/hive/lib/* |
spark.sql.hive.metastore.version |
The Hive Metastore version |
3.0.0 |
spark.driver.extraLibraryPath: |
The path to extra native libraries for driver |
/usr/lib/hadoop/lib/native/ |
spark.yarn.am.extraLibraryPath: |
The path to extra native libraries for Application Master |
/usr/lib/hadoop/lib/native/ |
spark.executor.extraLibraryPath |
The path to extra native libraries for Executor |
/usr/lib/hadoop/lib/native/ |
spark.yarn.appMasterEnv.HIVE_CONF_DIR |
A directory on the Application Master with Hive configs required for running Hive in the cluster mode |
/etc/spark/conf |
spark.yarn.historyServer.allowTracking |
Allows to use Spark History Server for tracking UI even if web UI is disabled for a job |
True |
spark.ssl.enabled |
Defines whether to use SSL for Spark |
false |
spark.ssl.protocol |
TLS protocol to be used. The protocol must be supported by JVM |
TLSv1.2 |
spark.ssl.ui.port |
The port where the SSL service will listen on |
4040 |
spark.ssl.historyServer.port |
The port to access History Server web UI |
18082 |
spark.ssl.keyPassword |
The password to the private key in the key store |
— |
spark.ssl.keyStore |
The path to the keystore file |
— |
spark.ssl.keyStoreType |
The type of the keystore |
JKS |
spark.ssl.trustStorePassword |
The password to the truststore used by Spark |
— |
spark.ssl.trustStore |
The path to the truststore file |
— |
spark.ssl.trustStoreType |
The type of the truststore |
JKS |
spark.history.kerberos.enabled |
Indicates whether the History Server should use Kerberos to login. This is required if the History Server is accessing HDFS files on a secure Hadoop cluster |
false |
spark.acls.enable |
Enables Spark ACL |
false |
spark.modify.acls |
Defines who has access to modify a running Spark application |
spark,hdfs |
spark.modify.acls.groups |
A comma-separated list of user groups that have modify access to the Spark application |
spark,hdfs |
spark.history.ui.acls.enable |
Specifies whether ACLs should be checked to authorize users viewing the applications in the History Server.
If enabled, access control checks are performed regardless of what the individual applications had set for |
false |
spark.history.ui.admin.acls |
A comma-separated list of users that have view access to all the Spark applications in History Server |
spark,hdfs,dr.who |
spark.history.ui.admin.acls.groups |
A comma-separated list of groups that have view access to all the Spark applications in History Server |
spark,hdfs,dr.who |
spark.ui.view.acls |
A comma-separated list of users that have view access to the Spark application.
By default, only the user that started the Spark job has view access.
Using |
spark,hdfs,dr.who |
spark.ui.view.acls.groups |
A comma-separated list of groups that have view access to the Spark web UI to view the Spark Job details.
This can be used if you have a set of administrators or developers or users who can monitor the Spark job submitted.
Using |
spark,hdfs,dr.who |
Parameter | Description | Default value |
---|---|---|
Spark History Server Heap Memory |
Sets initial (-Xms) and maximum (-Xmx) Java heap size for Spark History Server |
1G |
Spark Thrift Server Heap Memory |
Sets initial (-Xms) and maximum (-Xmx) Java heap size for Spark Thrift Server |
1G |
Livy Server Heap Memory |
Sets initial (-Xms) and maximum (-Xmx) Java heap size for Livy Server |
-Xms300m -Xmx4G |
Parameter | Description | Default value |
---|---|---|
livy.server.host |
The host address to start the Livy server. By default, Livy will bind to all network interfaces |
0.0.0.0 |
livy.server.port |
The port to run the Livy server |
8998 |
livy.spark.master |
The Spark master to use for Livy sessions |
yarn-cluster |
livy.impersonation.enabled |
Defines if Livy should impersonate users when creating a new session |
false |
livy.server.csrf-protection.enabled |
Defines whether to enable the CSRF protection.
If enabled, clients should add the |
true |
livy.repl.enable-hive-context |
Defines whether to enable HiveContext in the Livy interpreter.
If set to |
true |
livy.server.recovery.mode |
Sets the recovery mode for Livy |
recovery |
livy.server.recovery.state-store |
Defines where Livy should store the state for recovery |
filesystem |
livy.server.recovery.state-store.url |
For the |
/livy-recovery |
livy.server.auth.type |
Sets the Livy authentication type |
— |
livy.server.access_control.enabled |
Defines whether to enable the access control for a Livy server.
If set to |
false |
livy.server.access_control.users |
Users allowed to access Livy. By default, any user is allowed to access Livy. If a user wants to limit the access, the user should list all the permitted users separated by a comma |
livy,hdfs,spark |
livy.superusers |
A list of comma-separated users that have the permissions to change other user’s submitted session, like submitting statements, deleting session, and so on |
livy,hdfs,spark |
livy.keystore |
A path to the keystore file. The path can be absolute or relative to the directory in which the process is started |
— |
livy.keystore.password |
The password to access the keystore |
— |
livy.key-password |
The password to access the key in the keystore |
— |
livy.server.thrift.ssl.protocol.blacklist |
The list of banned TLS protocols |
SSLv2,SSLv3,TLSv1,TLSv1.1 |
Parameter | Description | Default value |
---|---|---|
Custom spark-defaults.conf |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file spark-defaults.conf |
— |
spark-env.sh |
Enter the contents for the spark-env.sh file that is used to initialize environment variables on worker nodes |
|
Custom livy.conf |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file livy.conf |
— |
livy-env.sh |
Enter the contents for the livy-env.sh file that is used to prepare the environment for Livy startup |
|
thriftserver-env.sh |
Enter the contents for the thriftserver-env.sh file that is used to prepare the environment for Thrift server startup |
|
spark-history-env.sh |
Enter the contents for the spark-history-env.sh file that is used to prepare the environment for History Server startup |
Spark3
Parameter | Description | Default value |
---|---|---|
Dynamic allocation (spark.dynamicAllocation.enabled) |
Defines whether to use dynamic resource allocation that scales the number of executors, registered with this application, up and down, based on the workload |
false |
Parameter | Description | Default value |
---|---|---|
spark.yarn.archive |
The archive containing all the required Spark JARs for distribution to the YARN cache.
If set, this configuration replaces |
hdfs:///apps/spark/spark3-yarn-archive.tgz |
spark.yarn.historyServer.address |
Spark History server address |
— |
spark.master |
The cluster manager to connect to |
yarn |
spark.dynamicAllocation.enabled |
Defines whether to use dynamic resource allocation that scales the number of executors, registered with this application, up and down, based on the workload |
false |
spark.shuffle.service.enabled |
Enables the external shuffle service. This service preserves the shuffle files written by executors so that executors can be safely removed, or so that shuffle fetches can continue in the event of executor failure. The external shuffle service must be set up in order to enable it |
false |
spark.eventLog.enabled |
Defines whether to log Spark events, useful for reconstructing the Web UI after the application has finished |
true |
spark.eventLog.dir |
The base directory where Spark events are logged, if |
hdfs:///var/log/spark/apps |
spark.dynamicAllocation.executorIdleTimeout |
If dynamic allocation is enabled and an executor has been idle for more than this duration, the executor will be removed. For more details, see Spark documentation |
120s |
spark.dynamicAllocation.cachedExecutorIdleTimeout |
If dynamic allocation is enabled and an executor which has cached data blocks has been idle for more than this duration, the executor will be removed. For more details, see Spark documentation |
600s |
spark.history.provider |
The name of the class that implements the application history backend. Currently there is only one implementation provided with Spark that looks for application logs stored in the file system |
org.apache.spark.deploy.history.FsHistoryProvider |
spark.history.fs.cleaner.enabled |
Specifies whether the History Server should periodically clean up event logs from storage |
true |
spark.history.store.path |
A local directory where to cache application history data. If set, the History Server will store application data on disk instead of keeping it in memory. The data written to disk will be re-used in case of the History Server restart |
/var/log/spark3/history |
spark.serializer |
The class used for serializing objects that will be sent over the network or need to be cached in the serialized form.
By default, works with any Serializable Java object but it may be quite slow, so we recommend using |
org.apache.spark.serializer.KryoSerializer |
spark.driver.extraClassPath |
Extra classpath entries to prepend to the classpath of the driver |
/usr/lib/hive/lib/hive-shims-scheduler.jar:/usr/lib/hadoop-yarn/hadoop-yarn-server-resourcemanager.jar |
spark.history.ui.port |
The port number of the History Server web UI |
18092 |
spark.ui.port |
The port number of the Thrift Server web UI |
4140 |
spark.history.fs.logDirectory |
The log directory of the History Server |
hdfs:///var/log/spark/apps |
spark.sql.hive.metastore.jars |
The location of the JARs that should be used to instantiate HiveMetastoreClient |
path |
spark.sql.hive.metastore.jars.path |
A list of comma-separated paths to JARs used to instantiate HiveMetastoreClient |
file:///usr/lib/hive/lib/*.jar |
spark.sql.hive.metastore.version |
The Hive Metastore version |
3.1.2 |
spark.driver.extraLibraryPath |
The path to extra native libraries for driver |
/usr/lib/hadoop/lib/native/ |
spark.yarn.am.extraLibraryPath |
The path to extra native libraries for Application Master |
/usr/lib/hadoop/lib/native/ |
spark.executor.extraLibraryPath |
The path to extra native libraries for Executor |
/usr/lib/hadoop/lib/native/ |
spark.yarn.appMasterEnv.HIVE_CONF_DIR |
A directory on the Application Master with Hive configs required for running Hive in the cluster mode |
/etc/spark3/conf |
spark.yarn.historyServer.allowTracking |
Allows to use Spark History Server for tracking UI even if web UI is disabled for a job |
True |
spark.connect.grpc.binding.port |
The port number to connect to Spark Connect via gRPC |
15002 |
spark.history.kerberos.enabled |
Indicates whether the History Server should use Kerberos to login. This is required if the History Server is accessing HDFS files on a secure Hadoop cluster |
false |
spark.acls.enable |
Defines whether Spark ACLs should be enabled.
If enabled, checks to see if the user has access permissions to view or modify the job.
Note this requires the user to be known, so if the user comes across as |
false |
spark.modify.acls |
Defines who has access to modify a running Spark application |
spark,hdfs |
spark.modify.acls.groups |
A comma-separated list of user groups that have modify access to the Spark application |
spark,hdfs |
spark.history.ui.acls.enable |
Specifies whether ACLs should be checked to authorize users viewing the applications in the History Server.
If enabled, access control checks are performed regardless of what the individual applications had set for |
false |
spark.history.ui.admin.acls |
A comma-separated list of users that have view access to all the Spark applications in History Server |
spark,hdfs,dr.who |
spark.history.ui.admin.acls.groups |
A comma-separated list of groups that have view access to all the Spark applications in History Server |
spark,hdfs,dr.who |
spark.ui.view.acls |
A comma-separated list of users that have view access to the Spark application.
By default, only the user that started the Spark job has view access.
Using |
spark,hdfs,dr.who |
spark.ui.view.acls.groups |
A comma-separated list of groups that have view access to the Spark web UI to view the Spark Job details.
This can be used if you have a set of administrators or developers or users who can monitor the Spark job submitted.
Using |
spark,hdfs,dr.who |
spark.ssl.keyPassword |
The password to the private key in the keystore |
— |
spark.ssl.keyStore |
Path to the keystore file. The path can be absolute or relative to the directory in which the process is started |
— |
spark.ssl.keyStoreType |
The type of keystore used |
JKS |
spark.ssl.trustStorePassword |
The password to the private key in the truststore |
— |
spark.ssl.trustStoreType |
The type of the truststore |
JKS |
spark.ssl.enabled |
Defines whether to use SSL for Spark |
— |
spark.ssl.protocol |
Defines the TLS protocol to use. The protocol must be supported by JVM |
TLSv1.2 |
spark.ssl.ui.port |
The port number used by Spark web UI in case of active SSL |
4041 |
spark.ssl.historyServer.port |
The port number used by Spark History Server web UI in case of active SSL |
18092 |
Parameter | Description | Default value |
---|---|---|
livy.server.host |
The host address to start the Livy server. By default, Livy will bind to all network interfaces |
0.0.0.0 |
livy.server.port |
The port to run the Livy server |
8999 |
livy.spark.master |
The Spark master to use for Livy sessions |
yarn |
livy.impersonation.enabled |
Defines if Livy should impersonate users when creating a new session |
true |
livy.server.csrf-protection.enabled |
Defines whether to enable the CSRF protection.
If enabled, clients should add the |
true |
livy.repl.enable-hive-context |
Defines whether to enable HiveContext in the Livy interpreter.
If set to |
true |
livy.server.recovery.mode |
Sets the recovery mode for Livy |
recovery |
livy.server.recovery.state-store |
Defines where Livy should store the state for recovery |
filesystem |
livy.server.recovery.state-store.url |
For the |
/livy-recovery |
livy.server.auth.type |
Sets the Livy authentication type |
— |
livy.server.access_control.enabled |
Defines whether to enable the access control for a Livy server.
If set to |
false |
livy.server.access_control.users |
Users allowed to access Livy. By default, any user is allowed to access Livy. If a user wants to limit the access, the user should list all the permitted users separated by a comma |
livy,hdfs,spark |
livy.superusers |
A list of comma-separated users that have the permissions to change other user’s submitted sessions, for example, submitting statements, deleting the session, and so on |
livy,hdfs,spark |
livy.keystore |
A path to the keystore file. The path can be absolute or relative to the directory in which the process is started |
— |
livy.keystore.password |
The password to access the keystore |
— |
livy.key-password |
The password to access the key in the keystore |
— |
livy.server.thrift.ssl.protocol.blacklist |
The list of banned TLS protocols |
SSLv2,SSLv3,TLSv1,TLSv1.1 |
Parameter | Description | Default value |
---|---|---|
thrift.server.port |
The port number used for communication with Spark3 Thrift Server |
10116 |
Parameter | Description | Default value |
---|---|---|
Spark History Server Heap Memory |
Sets the maximum Java heap size for Spark History Server |
1G |
Parameter | Description | Default value |
---|---|---|
Custom spark-defaults.conf |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file spark-defaults.conf |
— |
Custom log4j2.properties |
The contents of the log4j2.properties file used for logging the Spark3 activity |
|
spark-env.sh |
The contents of the spark-env.sh file used to initialize environment variables on worker nodes |
|
Custom livy.conf |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file livy.conf |
— |
livy-env.sh |
The contents of the livy-env.sh file used to initialize environment variables for the Livy server operation |
|
spark-history-env.sh |
The contents of the spark-history-env.sh file used to initialize environment variables for the Spark3 History Server operation |
|
thriftserver-env.sh |
The contents of the thriftserver-env.sh file used to initialize environment variables for the Spark3 Thrift Server operation |
SSM
Parameter | Description | Default value |
---|---|---|
Credential provider path |
The path to a keystore file used to encrypt credentials |
jceks://file/etc/ssm/conf/ssm.jceks |
Custom jceks |
Set to |
false |
Password file name |
The name of the file that stores a password to access the keystore |
ssm_credstore_pass |
Parameter | Description | Default value |
---|---|---|
smart.hadoop.conf.path |
The path to the Hadoop configuration directory |
/etc/hadoop/conf |
smart.conf.dir |
The path to the SSM configuration directory |
/etc/ssm/conf |
smart.server.rpc.address |
The RPC address of the SSM Server |
0.0.0.0:7042 |
smart.server.http.address |
The HTTP address (web UI) of the SSM Server |
0.0.0.0:7045 |
smart.agent.master.address |
The active SSM server’s address |
<hostname> |
smart.agent.address |
Defines the address of SSM Agent components on each host |
0.0.0.0 |
smart.agent.port |
The port number used by SSM agents to communicate with the SSM Server |
7048 |
smart.agent.master.port |
The port number used by the SSM Server to communicate with SSM agents |
7051 |
smart.ignore.dirs |
A list of comma-separated HDFS directories to ignore. SSM will ignore all files under the given HDFS directories |
— |
smart.cover.dirs |
A list of comma-separated HDFS directories where SSM scans for files. By default, all HDFS files are covered |
— |
smart.work.dir |
The HDFS directory used by SSM as a working directory to store temporary files.
SSM will ignore HDFS |
/system/ssm |
smart.client.concurrent.report.enabled |
Used to enable/disable concurrent reports for Smart Client. If enabled, Smart Client concurrently attempts to connect to multiple configured Smart Servers to find the active Smart Server, which is an optimization. Only the active Smart Server will respond to establish the connection. If the report has been successfully delivered to the active Smart Server, connection attempts to other Smart Servers are canceled |
— |
smart.server.rpc.handler.count |
The number of RPC handlers on the server |
80 |
smart.namespace.fetcher.batch |
The batch size of the namespace fetcher. SSM fetches namespaces from the NameNode during the startup. Large namespaces may lead to long startup time. A larger batch size can speed up the fetcher efficiency and reduce the startup time |
500 |
smart.namespace.fetcher.producers.num |
The number of producers in the namespace fetcher |
3 |
smart.namespace.fetcher.consumers.num |
The number of consumers in the namespace fetcher |
6 |
smart.rule.executors |
The maximum number of rules that can be executed in parallel |
5 |
smart.cmdlet.executors |
The maximum number of cmdlets that can be executed in parallel |
10 |
smart.dispatch.cmdlets.extra.num |
The number of extra cmdlets dispatched by Smart Server |
10 |
smart.cmdlet.dispatchers |
The maximum number of cmdlet dispatchers that work in parallel |
3 |
smart.cmdlet.mover.max.concurrent.blocks.per.srv.inst |
The maximum number of file mover cmdlets that can be executed in parallel per SSM service.
The |
0 |
smart.action.move.throttle.mb |
The throughput limit (in MB) for the SSM move operation |
0 |
smart.action.copy.throttle.mb |
The throughput limit (in MB) for the SSM copy operation |
0 |
smart.action.ec.throttle.mb |
The throughput limit (in MB) for the SSM EC operation |
0 |
smart.action.local.execution.disabled |
Defines whether the active Smart Server can also execute actions like an agent.
If set to |
false |
smart.cmdlet.max.num.pending |
The maximum number of pending cmdlets in an SSM Server |
20000 |
smart.cmdlet.hist.max.num.records |
The maximum number of historic cmdlet records kept in an SSM server. SSM deletes the oldest cmdlets when this threshold is exceeded |
100000 |
smart.cmdlet.hist.max.record.lifetime |
The maximum lifetime of historic cmdlet records kept in an SSM server.
The SSM Server deletes cmdlet records after the specified interval.
Valid time units are |
30day |
smart.cmdlet.cache.batch |
The maximum batch size of the cmdlet batch insert |
600 |
smart.copy.scheduler.base.sync.batch |
The maximum batch size of the Copy Scheduler base sync batch insert |
500 |
smart.file.diff.max.num.records |
The maximum file diff records with useless state |
10000 |
smart.status.report.period |
The status report period for actions in milliseconds |
10 |
smart.status.report.period.multiplier |
The report period multiplied by this value defines the largest report interval |
50 |
smart.status.report.ratio |
If the finished actions ratio equals or exceeds this value, a status report will be triggered |
0.2 |
smart.top.hot.files.num |
The number of top hot files displayed in web UI |
200 |
smart.cmdlet.dispatcher.log.disp.result |
Defines whether to log dispatch results for each cmdlet dispatched |
false |
smart.cmdlet.dispatcher.log.disp.metrics.interval |
The time interval in milliseconds to log statistic metrics of the cmdlet dispatcher.
If no cmdlets were dispatched within this interval, no output is generated for this interval.
The |
5000 |
smart.compression.codec |
The default compression codec for SSM compression (Zlib, Lz4, Bzip2, snappy). You can also specify codecs as action arguments, which overrides this setting |
Zlib |
smart.compression.max.split |
The maximum number of chunks split for compression |
1000 |
smart.compact.batch.size |
The maximum number of small files to be compacted by the compact action |
200 |
smart.compact.container.file.threshold.mb |
The maximum size of a container file in MB |
1024 |
smart.access.count.day.tables.num |
The maximum number of tables that can be created in the Metastore database to store the file access count per day |
30 |
smart.access.count.hour.tables.num |
The maximum number of tables that can be created in the Metastore database to store the file access count per hour |
48 |
smart.access.count.minute.tables.num |
The maximum number of tables that can be created in the Metastore database to store the file access count per minute |
120 |
smart.access.count.second.tables.num |
The maximum number of tables that can be created in the Metastore database to store the file access count per second |
30 |
smart.access.event.fetch.interval.ms |
The interval in milliseconds between access event fetches |
1000 |
smart.cached.file.fetch.interval.ms |
The interval in milliseconds between fetches of cached files from HDFS |
5000 |
smart.namespace.fetch.interval.ms |
The interval in milliseconds between namespace fetches from HDFS |
1 |
smart.mover.scheduler.storage.report.fetch.interval.ms |
The interval in milliseconds between fetches of storage reports from HDFS DataNodes in the mover scheduler |
120000 |
smart.metastore.small-file.insert.batch.size |
The maximum size of the Metastore insert batch with information about small files |
200 |
smart.agent.master.ask.timeout.ms |
The maximum time in milliseconds for a Smart Agent to wait for a response from the Smart Server during the submission action |
5000 |
smart.ignore.path.templates |
A list of comma-separated regex templates of HDFS paths to be completely ignored by SSM |
— |
smart.internal.path.templates |
A list of comma-separated regex templates of internal files to be completely ignored by SSM |
.*/\..*,.*/__.*,.*_COPYING_.* |
smart.security.enable |
Enables Kerberos authentication for SSM |
false |
smart.server.keytab.file |
The path to the SSM Server’s keytab file |
— |
smart.server.kerberos.principal |
The SSM Server’s Kerberos principal |
— |
smart.agent.keytab.file |
The path to the SSM Agent’s keytab file |
— |
smart.agent.kerberos.principal |
The SSM Agent’s Kerberos principal |
— |
Parameter | Description | Default value |
---|---|---|
db_url |
The URL to the Metastore database |
jdbc:postgresql://{{ groups['adpg.adpg'][0] | d(omit) }}:5432/ssm |
db_user |
The user name to connect to the database |
ssm |
db_password |
The user password to connect to the database |
— |
initialSize |
The initial number of connections created when the pool is started |
10 |
minIdle |
The minimum number of established connections that should be kept in the pool at all times. The connection pool can shrink below this number if validation queries fail |
4 |
maxActive |
The maximum number of active connections that can be allocated from this pool at the same time |
50 |
maxWait |
The maximum time in milliseconds the pool will wait (when there are no available connections) for a connection to be returned before throwing an exception |
60000 |
timeBetweenEvictionRunsMillis |
The time in milliseconds to sleep between the runs of the idle connection validation/cleaner thread. This value should not be set less than 1 second. It specifies how often to check for idle and abandoned connections, and how often to validate idle connections |
90000 |
minEvictableIdleTimeMillis |
The minimum amount of time an object may remain idle in the pool before it is eligible for eviction |
300000 |
validationQuery |
The SQL query used to validate connections from the pool before returning them to the caller |
SELECT 1 |
testWhileIdle |
Indicates whether connection objects are validated by the idle object evictor (if any) |
true |
testOnBorrow |
Indicates whether objects are validated before being borrowed from the pool |
false |
testOnReturn |
Indicates whether objects are validated before being returned to the pool |
false |
poolPreparedStatements |
Enables the prepared statement pooling |
true |
maxPoolPreparedStatementPerConnectionSize |
The maximum number of prepared statements that can be pooled per connection |
30 |
removeAbandoned |
A flag to remove abandoned connections if they exceed |
true |
removeAbandonedTimeout |
The timeout in seconds before an abandoned (in use) connection can be removed |
180 |
logAbandoned |
A flag to log stack traces for application code which abandoned a connection. Logging of abandoned connections adds extra overhead for every borrowed connection |
true |
filters |
Sets the filters that are applied to the data source |
stat |
Parameter | Description | Default value |
---|---|---|
LD_LIBRARY_PATH |
The path to extra native libraries for SSM |
/usr/lib/hadoop/lib/native |
HADOOP_HOME |
The path to the Hadoop home directory |
/usr/lib/hadoop |
Parameter | Description | Default value |
---|---|---|
Enable SmartFileSystem for Hadoop |
When enabled, requests from different clients (Spark, HDFS, Hive, etc.) are taken into account when calculating |
false |
log4j.properties |
The contents of the log4j.properties configuration file |
— |
zeppelin-site.xml |
The contents of the zeppelin-site.xml configuration file. SSM uses a Zeppelin configuration for web UI |
— |
Sqoop
Parameter | Description | Default value |
---|---|---|
sqoop.metastore.client.autoconnect.url |
The connection string to use when connecting to a job-management metastore. If not set, uses ~/.sqoop/ |
— |
sqoop.metastore.server.location |
The path to the shared metastore database files. If not set, uses ~/.sqoop/ |
/srv/sqoop/metastore.db |
sqoop.metastore.server.port |
The port that this metastore should listen on |
16100 |
Parameter | Description | Default value |
---|---|---|
HADOOP_OPTS |
Sets initial (-Xms) and maximum (-Xmx) Java heap size for Sqoop |
-Xms800M -Xmx10G |
Parameter | Description | Default value |
---|---|---|
Custom sqoop-site.xml |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file sqoop-site.xml |
— |
Custom sqoop-metastore-env.sh |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file sqoop-metastore-env.sh |
— |
YARN
Parameter | Description | Default value |
---|---|---|
mapreduce.application.classpath |
The CLASSPATH for MapReduce applications.
A comma-separated list of CLASSPATH entries.
If Parameter expansion marker will be replaced by NodeManager on container launch, based on the underlying OS accordingly |
/etc/hadoop/conf/*:/usr/lib/hadoop/*:/usr/lib/hadoop/lib/*:/usr/lib/hadoop-hdfs/*:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-yarn/*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-mapreduce/*:/usr/lib/hadoop-mapreduce/lib/* |
mapreduce.cluster.local.dir |
The local directory where MapReduce stores intermediate data files. May be a comma-separated list of directories on different devices in order to spread disk I/O. Directories that do not exist, are ignored |
/srv/hadoop-yarn/mr-local |
mapreduce.framework.name |
The runtime framework for executing MapReduce jobs.
Can be one of |
yarn |
mapreduce.jobhistory.address |
MapReduce JobHistory Server IPC (<host>:<port>) |
— |
mapreduce.jobhistory.bind-host |
Setting the value to |
0.0.0.0 |
mapreduce.jobhistory.webapp.address |
MapReduce JobHistory Server Web UI (<host>:<port>) |
— |
mapreduce.map.env |
Environment variables for the map task processes added by a user, specified as a comma separated list.
Example: |
HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce |
mapreduce.reduce.env |
Environment variables for the reduce task processes added by a user, specified as a comma separated list.
Example: |
HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce |
yarn.app.mapreduce.am.env |
Environment variables for the MapReduce App Master processes added by a user. Examples:
|
HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce |
yarn.app.mapreduce.am.staging-dir |
The staging directory used while submitting jobs |
/user |
mapreduce.jobhistory.keytab |
The location of the Kerberos keytab file for the MapReduce JobHistory Server |
/etc/security/keytabs/mapreduce-historyserver.service.keytab |
mapreduce.jobhistory.principal |
Kerberos principal name for the MapReduce JobHistory Server |
mapreduce-historyserver/_HOST@REALM |
mapreduce.jobhistory.http.policy |
Configures the HTTP endpoint for JobHistoryServer web UI. The following values are supported:
|
HTTP_ONLY |
mapreduce.jobhistory.webapp.https.address |
The HTTPS address where MapReduce JobHistory Server WebApp is running |
0.0.0.0:19890 |
mapreduce.shuffle.ssl.enabled |
Defines whether to use SSL for for the Shuffle HTTP endpoints |
false |
Parameter | Description | Default value |
---|---|---|
xasecure.audit.destination.solr.batch.filespool.dir |
The spool directory path |
/srv/ranger/hdfs_plugin/audit_solr_spool |
xasecure.audit.destination.solr.urls |
Leave this property value empty or set it to |
— |
xasecure.audit.destination.solr.zookeepers |
Specifies the ZooKeeper connection string for the Solr destination |
— |
xasecure.audit.destination.solr.force.use.inmemory.jaas.config |
Uses in-memory JAAS configuration file to connect to Solr |
— |
xasecure.audit.is.enabled |
Enables Ranger audit |
true |
xasecure.audit.jaas.Client.loginModuleControlFlag |
Specifies whether the success of the module is |
— |
xasecure.audit.jaas.Client.loginModuleName |
The name of the authenticator class |
— |
xasecure.audit.jaas.Client.option.keyTab |
The name of the keytab file to get the principal’s secret key |
— |
xasecure.audit.jaas.Client.option.principal |
The name of the principal to be used |
— |
xasecure.audit.jaas.Client.option.serviceName |
Represents a user or a service that wants to log in |
— |
xasecure.audit.jaas.Client.option.storeKey |
Set this to |
false |
xasecure.audit.jaas.Client.option.useKeyTab |
Set this to |
false |
Parameter | Description | Default value |
---|---|---|
ranger.plugin.yarn.policy.rest.url |
The URL to Ranger Admin |
— |
ranger.plugin.yarn.service.name |
The name of the Ranger service containing policies for this instance |
— |
ranger.plugin.yarn.policy.cache.dir |
The directory where Ranger policies are cached after successful retrieval from the source |
/srv/ranger/yarn/policycache |
ranger.plugin.yarn.policy.pollIntervalMs |
Defines how often to poll for changes in policies |
30000 |
ranger.plugin.yarn.policy.rest.client.connection.timeoutMs |
The YARN Plugin RangerRestClient connection timeout (in milliseconds) |
120000 |
ranger.plugin.yarn.policy.rest.client.read.timeoutMs |
The YARN Plugin RangerRestClient read timeout (in milliseconds) |
30000 |
ranger.add-yarn-authorization |
Set |
false |
ranger.plugin.yarn.policy.rest.ssl.config.file |
The path to the RangerRestClient SSL config file for the YARN plugin |
/etc/yarn/conf/ranger-yarn-policymgr-ssl.xml |
Parameter | Description | Default value |
---|---|---|
yarn.application.classpath |
The CLASSPATH for YARN applications. A comma-separated list of CLASSPATH entries. When this value is empty, the following default CLASSPATH for YARN applications would be used.
|
/etc/hadoop/conf/*:/usr/lib/hadoop/*:/usr/lib/hadoop/lib/*:/usr/lib/hadoop-hdfs/*:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-yarn/*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-mapreduce/*:/usr/lib/hadoop-mapreduce/lib/* |
yarn.cluster.max-application-priority |
Defines the maximum application priority in a cluster. Leaf Queue-level priority: each leaf queue provides default priority by the administrator. The queue default priority will be used for any application submitted without a specified priority. $HADOOP_HOME/etc/hadoop/capacity-scheduler.xml is the configuration file for queue-level priority |
0 |
yarn.log.server.url |
The URL for log aggregation Server |
— |
yarn.log-aggregation-enable |
Whether to enable log aggregation.
Log aggregation collects logs from each container and moves these logs onto a file system, for example HDFS, after the application processing completes.
Users can configure the |
true |
yarn.log-aggregation.retain-seconds |
Defines how long to keep aggregation logs before deleting them.
The value of |
172800 |
yarn.nodemanager.local-dirs |
The list of directories to store localized. An application localized file directory will be found in: ${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}. Individual containers work directories, called container_${contid}, will be subdirectories of this |
/srv/hadoop-yarn/nm-local |
yarn.node-labels.enabled |
Enables node labels feature |
true |
yarn.node-labels.fs-store.root-dir |
The URI for NodeLabelManager.
The default value is |
hdfs:///system/yarn/node-labels |
yarn.timeline-service.bind-host |
The actual address the server will bind to.
If this optional address is set, the RPC and Webapp servers will bind to this address and the port, specified in |
0.0.0.0 |
yarn.timeline-service.leveldb-timeline-store.path |
Stores file name for leveldb Timeline store |
/srv/hadoop-yarn/leveldb-timeline-store |
yarn.nodemanager.address |
The address of the container manager in the NodeManager |
0.0.0.0:8041 |
yarn.nodemanager.aux-services |
A comma-separated list of services, where service name should only contain |
mapreduce_shuffle,spark2_shuffle,spark_shuffle |
yarn.nodemanager.aux-services.mapreduce_shuffle.class |
The auxiliary service class to use |
org.apache.hadoop.mapred.ShuffleHandler |
yarn.nodemanager.aux-services.spark2_shuffle.class |
The class name of YarnShuffleService — an external shuffle service for Spark 2 on YARN |
org.apache.spark.network.yarn.YarnShuffleService |
yarn.nodemanager.aux-services.spark2_shuffle.classpath |
The path to YarnShuffleService — an external shuffle service for Spark 2 on YARN |
/usr/lib/spark/yarn/lib/* |
yarn.nodemanager.aux-services.spark_shuffle.class |
The class name of YarnShuffleService — an external shuffle service for Spark 3 on YARN |
org.apache.spark.network.yarn.YarnShuffleService |
yarn.nodemanager.aux-services.spark_shuffle.classpath |
The path to YarnShuffleService — an external shuffle service for Spark 3 on YARN |
/usr/lib/spark3/yarn/lib/* |
yarn.nodemanager.recovery.enabled |
Enables the NodeManager to recover after starting |
true |
yarn.nodemanager.recovery.dir |
The local filesystem directory, in which the NodeManager will store state, when recovery is enabled |
/srv/hadoop-yarn/nm-recovery |
yarn.nodemanager.remote-app-log-dir |
Defines a directory for logs aggregation |
/logs |
yarn.nodemanager.resource-plugins |
Enables additional discovery/isolation of resources on the NodeManager.
By default, this parameters is empty.
Acceptable values: |
— |
yarn.nodemanager.resource-plugins.gpu.path-to-discovery-executables |
When |
/usr/bin/nvidia-smi |
yarn.nodemanager.resource.detect-hardware-capabilities |
Enables auto-detection of node capabilities such as memory and CPU |
true |
yarn.nodemanager.vmem-check-enabled |
Whether virtual memory limits will be enforced for containers |
false |
yarn.resource-types |
The resource types to be used for scheduling. Use resource-types.xml to specify details about the individual resource types |
— |
yarn.resourcemanager.bind-host |
The actual address, the server will bind to.
If this optional address is set, the RPC and Webapp servers will bind to this address and the port, specified in |
0.0.0.0 |
yarn.resourcemanager.cluster-id |
The name of the cluster. In the High Availability mode, this parameter is used to ensure that Resource Manager participates in leader election for this cluster and ensures that it does not affect other clusters |
— |
yarn.resource-types.memory-mb.increment-allocation |
The FairScheduler grants memory equal to increments of this value.
If you submit a task with a resource request which is not a multiple of |
1024 |
yarn.resource-types.vcores.increment-allocation |
The FairScheduler grants vcores in increments of this value.
If you submit a task with resource request, that is not a multiple of |
1 |
yarn.resourcemanager.ha.enabled |
Enables Resource Manager High Availability. When enabled:
|
false |
yarn.resourcemanager.ha.rm-ids |
The list of Resource Manager nodes in the cluster when the High Availability is enabled.
See description of |
— |
yarn.resourcemanager.hostname |
The host name of the Resource Manager |
— |
yarn.resourcemanager.leveldb-state-store.path |
The Local path, where the Resource Manager state will be stored, when using |
/srv/hadoop-yarn/leveldb-state-store |
yarn.resourcemanager.monitor.capacity.queue-management.monitoring-interval |
The time between invocations of this QueueManagementDynamicEditPolicy policy (in milliseconds) |
1500 |
yarn.resourcemanager.reservation-system.enable |
Enables the ReservationSystem in the ResourceManager |
false |
yarn.resourcemanager.reservation-system.planfollower.time-step |
The frequency of the PlanFollower timer (in milliseconds). A large value is expected |
1000 |
Resource scheduler |
The type of a pluggable scheduler for Hadoop.
Available values: |
CapacityScheduler |
yarn.resourcemanager.scheduler.monitor.enable |
Enables a set of periodic monitors (specified in |
false |
yarn.resourcemanager.scheduler.monitor.policies |
The list of SchedulingEditPolicy classes that interact with the Scheduler. A particular module may be incompatible with the Scheduler, other policies, or a configuration of either |
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy |
yarn.resourcemanager.monitor.capacity.preemption.observe_only |
If set to |
false |
yarn.resourcemanager.monitor.capacity.preemption.monitoring_interval |
The time between invocations of this ProportionalCapacityPreemptionPolicy policy (in milliseconds) |
3000 |
yarn.resourcemanager.monitor.capacity.preemption.max_wait_before_kill |
The time between requesting a preemption from an application and killing the container (in milliseconds) |
15000 |
yarn.resourcemanager.monitor.capacity.preemption.total_preemption_per_round |
The maximum percentage of resources, preempted in a single round. By controlling this value one can throttle the pace, at which containers are reclaimed from the cluster. After computing the total desired preemption, the policy scales it back within this limit |
0.1 |
yarn.resourcemanager.monitor.capacity.preemption.max_ignored_over_capacity |
The maximum amount of resources above the target capacity ignored for preemption. This defines a deadzone around the target capacity, that helps to prevent thrashing and oscillations around the computed target balance. High values would slow the time to capacity and (absent natural.completions) it might prevent convergence to guaranteed capacity |
0.1 |
yarn.resourcemanager.monitor.capacity.preemption.natural_termination_factor |
Given a computed preemption target, account for containers naturally expiring and preempt only this percentage of the delta.
This determines the rate of geometric convergence into the deadzone ( |
0.2 |
yarn.resourcemanager.nodes.exclude-path |
The path to the file with nodes to exclude |
/etc/hadoop/conf/exclude-path.xml |
yarn.resourcemanager.nodes.include-path |
The path to the file with nodes to include |
/etc/hadoop/conf/include-path |
yarn.resourcemanager.recovery.enabled |
Enables Resource Manager to recover state after starting.
If set to |
true |
yarn.resourcemanager.store.class |
The class to use as the persistent store.
If |
— |
yarn.resourcemanager.system-metrics-publisher.enabled |
The setting that controls whether YARN system metrics are published on the Timeline Server or not by Resource Manager |
true |
yarn.scheduler.fair.user-as-default-queue |
Defines whether to use the username, associated with the allocation as the default queue name, in the event, that a queue name is not specified.
If this is set to |
true |
yarn.scheduler.fair.preemption |
Defines whether to use preemption |
false |
yarn.scheduler.fair.preemption.cluster-utilization-threshold |
The utilization threshold after which the preemption kicks in. The utilization is computed as the maximum ratio of usage to capacity among all resources |
0.8f |
yarn.scheduler.fair.sizebasedweight |
Defines whether to assign shares to individual apps based on their size, rather than providing an equal share to all apps regardless of size.
When set to |
false |
yarn.scheduler.fair.assignmultiple |
Defines whether to allow multiple container assignments in one heartbeat |
false |
yarn.scheduler.fair.dynamic.max.assign |
If |
true |
yarn.scheduler.fair.max.assign |
If |
-1 |
yarn.scheduler.fair.locality.threshold.node |
For applications that request containers on particular nodes, this parameter defines the number of scheduling opportunities since the last container assignment to wait before accepting a placement on another node.
Expressed as a floating number between |
-1.0 |
yarn.scheduler.fair.locality.threshold.rack |
For applications, that request containers on particular racks, the number of scheduling opportunities since the last container assignment to wait before accepting a placement on another rack.
Expressed as a floating point between |
-1.0 |
yarn.scheduler.fair.allow-undeclared-pools |
If set to |
true |
yarn.scheduler.fair.update-interval-ms |
The time interval, at which to lock the scheduler and recalculate fair shares, recalculate demand, and check whether anything is due for preemption |
500 |
yarn.scheduler.minimum-allocation-mb |
The minimum allocation for every container request at the Resource Manager (in MB).
Memory requests, lower than this, will throw |
1024 |
yarn.scheduler.maximum-allocation-mb |
The maximum allocation for every container request at the Resource Manager (in MB).
Memory requests, higher than this, will throw |
4096 |
yarn.scheduler.minimum-allocation-vcores |
The minimum allocation for every container request at the Resource Manager, in terms of virtual CPU cores.
Requests, lower than this, will throw |
1 |
yarn.scheduler.maximum-allocation-vcores |
The maximum allocation for every container request at the Resource Manager, in terms of virtual CPU cores.
Requests, higher than this, will throw |
2 |
yarn.timeline-service.enabled |
On the server side this parameter indicates, whether Timeline service is enabled or not. And on the client side, this parameter can be used to indicate whether client wants to use Timeline service. If this parameter is set on the client side along with security, then YARN Client tries to fetch the delegation tokens for the Timeline Server |
true |
yarn.timeline-service.hostname |
The hostname of the Timeline service Web application |
— |
yarn.timeline-service.http-cross-origin.enabled |
Enables cross origin support (CORS) for Timeline Server |
true |
yarn.webapp.ui2.enable |
In the Server side it indicates, whether the new YARN UI v2 is enabled or not |
true |
yarn.resourcemanager.proxy-user-privileges.enabled |
If set to |
false |
yarn.resourcemanager.webapp.spnego-principal |
The Kerberos principal to be used for SPNEGO filter for the Resource Manager web UI |
HTTP/_HOST@REALM |
yarn.resourcemanager.webapp.spnego-keytab-file |
The Kerberos keytab file to be used for SPNEGO filter for the Resource Manager web UI |
/etc/security/keytabs/HTTP.service.keytab |
yarn.nodemanager.linux-container-executor.group |
The UNIX group that the linux-container-executor should run as |
yarn |
yarn.resourcemanager.webapp.delegation-token-auth-filter.enabled |
A flag to enable override of the default Kerberos authentication filter with the RM authentication filter to allow authentication using delegation tokens (fallback to Kerberos if the tokens are missing).
Only applicable when the http authentication type is |
false |
yarn.resourcemanager.principal |
The Kerberos principal for the Resource Manager |
yarn-resourcemanager/_HOST@REALM |
yarn.resourcemanager.keytab |
The keytab for the Resource Manager |
/etc/security/keytabs/yarn-resourcemanager.service.keytab |
yarn.resourcemanager.webapp.https.address |
The https address of the Resource Manager web application. If only a host is provided as the value, the webapp will be served on a random port |
${yarn.resourcemanager.hostname}:8090 |
yarn.nodemanager.principal |
The Kerberos principal for the NodeManager |
yarn-nodemanager/_HOST@REALM |
yarn.nodemanager.keytab |
Keytab for NodeManager |
/etc/security/keytabs/yarn-nodemanager.service.keytab |
yarn.nodemanager.webapp.spnego-principal |
The Kerberos principal to be used for SPNEGO filter for the NodeManager web interface |
HTTP/_HOST@REALM |
yarn.nodemanager.webapp.spnego-keytab-file |
The Kerberos keytab file to be used for SPNEGO filter for the NodeManager web interface |
/etc/security/keytabs/HTTP.service.keytab |
yarn.nodemanager.webapp.cross-origin.enabled |
A flag to enable cross-origin (CORS) support in the NodeManager. This flag requires the CORS filter initializer to be added to the filter initializers list in core-site.xml |
false |
yarn.nodemanager.webapp.https.address |
The HTTPS address of the NodeManager web application |
0.0.0.0:8044 |
yarn.timeline-service.http-authentication.type |
Defines the authentication used for the Timeline Server HTTP endpoint.
Supported values are: |
simple |
yarn.timeline-service.http-authentication.simple.anonymous.allowed |
Indicates if anonymous requests are allowed by the Timeline Server when using |
true |
yarn.timeline-service.http-authentication.kerberos.keytab |
The Kerberos keytab to be used for the Timeline Server (Collector/Reader) HTTP endpoint |
/etc/security/keytabs/HTTP.service.keytab |
yarn.timeline-service.http-authentication.kerberos.principal |
The Kerberos principal to be used for the Timeline Server (Collector/Reader) HTTP endpoint |
HTTP/_HOST@REALM |
yarn.timeline-service.principal |
The Kerberos principal for the timeline reader. NodeManager principal would be used for timeline collector as it runs as an auxiliary service inside NodeManager |
yarn/_HOST@REALM |
yarn.timeline-service.keytab |
The Kerberos keytab for the timeline reader. NodeManager keytab would be used for timeline collector as it runs as an auxiliary service inside NodeManager |
/etc/security/keytabs/yarn.service.keytab |
yarn.timeline-service.delegation.key.update-interval |
The update interval for delegation keys |
86400000 |
yarn.timeline-service.delegation.token.renew-interval |
The time to renew delegation tokens |
86400000 |
yarn.timeline-service.delegation.token.max-lifetime |
The maxim token lifetime |
86400000 |
yarn.timeline-service.client.best-effort |
Defines, whether a failure to obtain a delegation token should be considered as an application failure ( |
false |
yarn.timeline-service.webapp.https.address |
The HTTPS address of the Timeline service web application |
${yarn.timeline-service.hostname}:8190 |
yarn.http.policy |
This configures the HTTP endpoint for Yarn Daemons. The following values are supported:
|
HTTP_ONLY |
yarn.nodemanager.container-executor.class |
The name of the container-executor Java class |
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor |
CAUTION
In AstraLinux, regular user UIDs can start from 100. For YARN to work correctly on AstraLinux, set the |
Parameter | Description | Default value |
---|---|---|
banned.users |
A comma-separated list of users who cannot run applications |
bin |
min.user.id |
Prevents other super-users |
500 |
Parameter | Description | Default value |
---|---|---|
ResourceManager Heap Memory |
Sets initial (-Xms) and maximum (-Xmx) Java heap size for Resource Manager |
-Xms1G -Xmx8G |
NodeManager Heap Memory |
Sets initial (-Xms) and maximum (-Xmx) Java heap size for NodeManager |
— |
Timelineserver Heap Memory |
Sets initial (-Xms) and maximum (-Xmx) Java heap size for Timeline server |
-Xms700m -Xmx8G |
History server Heap Memory |
Sets initial (-Xms) and maximum (-Xmx) Java heap size for History server |
-Xms700m -Xmx8G |
Parameter | Description | Default value |
---|---|---|
DECOMMISSIONED |
The list of hosts in the |
— |
Parameter | Description | Default value |
---|---|---|
xasecure.policymgr.clientssl.keystore |
The path to the keystore file used by Ranger |
— |
xasecure.policymgr.clientssl.keystore.credential.file |
The path to the keystore credentials file |
/etc/yarn/conf/ranger-yarn.jceks |
xasecure.policymgr.clientssl.truststore.credential.file |
The path to the truststore credentials file |
/etc/yarn/conf/ranger-yarn.jceks |
xasecure.policymgr.clientssl.truststore |
The path to the truststore file used by Ranger |
— |
xasecure.policymgr.clientssl.keystore.password |
The password to the keystore file |
— |
xasecure.policymgr.clientssl.truststore.password |
The password to the truststore file |
— |
Parameter | Description | Default value |
---|---|---|
GPU on YARN |
Defines, whether to use GPU on YARN |
false |
capacity-scheduler.xml |
The content of capacity-scheduler.xml, which is used by CapacityScheduler |
|
fair-scheduler.xml |
The content of fair-scheduler.xml, which is used by FairScheduler |
|
Custom mapred-site.xml |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file mapred-site.xml |
— |
Ranger plugin enabled |
Whether or not Ranger plugin is enabled |
false |
Custom yarn-site.xml |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file yarn-site.xml |
— |
Custom ranger-yarn-audit.xml |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ranger-yarn-audit.xml |
— |
Custom ranger-yarn-security.xml |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ranger-yarn-security.xml |
— |
Custom ranger-yarn-policymgr-ssl.xml |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file ranger-yarn-policymgr-ssl.xml |
— |
Zeppelin
Parameter | Description | Default value |
---|---|---|
Allow user-managed interpreters |
Allows to use Zeppelin interpreters with the |
True |
Custom interpreter.json |
Allows to provide a custom JSON definition of interpreters to be available in the Zeppelin web UI. Defining interpreters in this way overwrites all (both user and system) interpreters settings |
|
Custom interpreter.sh |
Allows to provide custom contents of the interpreter.sh script. This script is invoked on the Zeppelin startup and is used to prepare the environment for proper Zeppelin operation |
Parameter | Description | Default value |
---|---|---|
zeppelin.dep.localrepo |
The local repository for the dependency loader |
/srv/zeppelin/local-repo |
zeppelin.server.port |
The server port |
8180 |
zeppelin.server.kerberos.principal |
The principal name to load from the keytab |
— |
zeppelin.server.kerberos.keytab |
The path to the keytab file |
— |
zeppelin.shell.auth.type |
Sets the authentication type.
Possible values are |
— |
zeppelin.shell.principal |
The principal name to load from the keytab |
— |
zeppelin.shell.keytab.location |
The path to the keytab file |
— |
zeppelin.jdbc.auth.type |
Sets the authentication type.
Possible values are |
— |
zeppelin.jdbc.keytab.location |
The path to the keytab file |
— |
zeppelin.jdbc.principal |
The principal name to load from the keytab |
— |
zeppelin.jdbc.auth.kerberos.proxy.enable |
When the |
true |
spark.yarn.keytab |
The full path to the file that contains the keytab for the principal. This keytab will be copied to the node running the YARN Application Master via the Secure Distributed Cache, for renewing the login tickets and the delegation tokens periodically |
— |
spark.yarn.principal |
The principal to be used to login to KDC, while running on secure HDFS |
— |
zeppelin.livy.keytab |
The path to the keytab file |
— |
zeppelin.livy.principal |
The principal name to load from the keytab |
— |
zeppelin.server.ssl.port |
The port number for SSL communication |
8180 |
zeppelin.ssl |
Defines whether to use SSL |
false |
zeppelin.ssl.keystore.path |
The path to the keystore used by Zeppelin |
— |
zeppelin.ssl.keystore.password |
The password to access the keystore file |
— |
zeppelin.ssl.truststore.path |
The path to the truststore used by Zeppelin |
— |
zeppelin.ssl.truststore.password |
The password to access the truststore file |
— |
Parameter | Description | Default value |
---|---|---|
Zeppelin Server Heap Memory |
Sets initial (-Xms) and maximum (-Xmx) Java heap size for Zeppelin Server |
-Xms700m -Xmx1024m |
Parameter | Description | Default value |
---|---|---|
Users/password map |
A map of type <username: password,role>.
For example, |
— |
Parameter | Description | Default value |
---|---|---|
ldapRealm |
Extends the Apache Shiro provider to allow for LDAP searches and to provide group membership to the authorization provider |
org.apache.zeppelin.realm.LdapRealm |
ldapRealm.contextFactory.authenticationMechanism |
Specifies the authentication mechanism used by the LDAP service |
simple |
ldapRealm.contextFactory.url |
The URL of the source LDAP. For example, ldap://ldap.example.com:389 |
— |
ldapRealm.userDnTemplate |
Optional.
Knox uses this value to construct the UserDN for the authentication bind.
Specify the UserDN where the first attribute is |
— |
ldapRealm.pagingSize |
Allows to set the LDAP paging size |
100 |
ldapRealm.authorizationEnabled |
Enables authorization for Shiro ldapRealm |
true |
ldapRealm.contextFactory.systemAuthenticationMechanism |
Defines the authentication mechanism to use for Shiro ldapRealm context factory.
Possible values are |
simple |
ldapRealm.userLowerCase |
Forces username returned from LDAP to be lower-cased |
true |
ldapRealm.memberAttributeValueTemplate |
The attribute that identifies a user in the group.
For exmaple: |
— |
ldapRealm.searchBase |
The starting DN in the LDAP DIT for the search.
Only subtrees of the specified subtree are searched.
For example: |
— |
ldapRealm.userSearchBase |
Search base for user bind DN.
Defaults to the value of |
— |
ldapRealm.groupSearchBase |
Search base used to search for groups.
Defaults to the value of |
— |
ldapRealm.groupObjectClass |
Set the value to the Objectclass that identifies group entries in LDAP |
groupofnames |
ldapRealm.userSearchAttributeName |
Specify the attribute that corresponds to the user login token. This attribute is used with the search results to compute the UserDN for the authentication bind |
sAMAccountName |
ldapRealm.memberAttribute |
Set the value to the attribute that defines group membership.
When the value is |
member |
ldapRealm.userSearchScope |
Allows to define searchScopes.
Possible values are |
subtree |
ldapRealm.groupSearchScope |
Allows to define groupSearchScope.
Possible values are |
subtree |
ldapRealm.contextFactory.systemUsername |
Set to the LDAP Service Account that the Zeppelin uses for LDAP searches.
If required, specify the full account UserDN.
For example: |
— |
ldapRealm.contextFactory.systemPassword |
Sets the password for systemUsername.
This password will be added to the keystore using |
— |
ldapRealm.groupSearchEnableMatchingRuleInChain |
Enables support for nested groups using the LDAP_MATCHING_RULE_IN_CHAIN operator |
true |
ldapRealm.rolesByGroup |
Optional mapping from physical groups to logical application roles.
For example: |
— |
ldapRealm.allowedRolesForAuthentication |
Optional list of roles that are allowed to authenticate. If not specified, all groups are allowed to authenticate (login). This changes nothing for url-specific permissions that will continue to work as specified in [urls]. For example: "admin_role,user_role" |
— |
ldapRealm.permissionsByRole |
Optional.
Sets permissions by role.
For example: |
— |
securityManager.realms |
Specifies a list of Apache Shiro Realms |
$ldapRealm |
Parameter | Description | Default value |
---|---|---|
Additional main section in shiro.ini |
Allows to add additional key/value pairs to the |
— |
Additional roles section in shiro.ini |
Allows to add additional key/value pairs to the |
— |
Additional urls section in shiro.ini |
Allows to add additional key/value pairs to the |
— |
Parameter | Description | Default value |
---|---|---|
Custom zeppelin-site.xml |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file zeppelin-site.xml |
— |
Custom zeppelin-env.sh |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file zeppelin-env.sh |
|
Custom log4j.properties |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file log4j.properties |
ZooKeeper
Parameter | Description | Default value |
---|---|---|
connect |
The ZooKeeper connection string used by other services or clusters. It is generated automatically |
— |
dataDir |
The location where ZooKeeper stores the in-memory database snapshots and, unless specified otherwise, the transaction log of updates to the database |
/var/lib/zookeeper |
Parameter | Description | Default value |
---|---|---|
clientPort |
The port to listen for client connections, that is the port that clients attempt to connect to |
2181 |
tickTime |
The basic time unit used by ZooKeeper (in milliseconds).
It is used for heartbeats.
The minimum session timeout will be twice the |
2000 |
initLimit |
The timeouts that ZooKeeper uses to limit the length of the time for ZooKeeper servers in quorum to connect to the leader |
5 |
syncLimit |
Defines the maximum date skew between server and the leader |
2 |
maxClientCnxns |
This property limits the number of active connections from the host, specified by IP address, to a single ZooKeeper Server |
0 |
autopurge.snapRetainCount |
When enabled, ZooKeeper auto-purge feature retains the |
3 |
autopurge.purgeInterval |
The time interval, for which the purge task has to be triggered (in hours).
Set to a positive integer ( |
24 |
Add key,value |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file zoo.cfg |
— |
Parameter | Description | Default value |
---|---|---|
ZOO_LOG_DIR |
The directory to store logs |
/var/log/zookeeper |
ZOOPIDFILE |
The directory to store the ZooKeeper process ID |
/var/run/zookeeper/zookeeper_server.pid |
SERVER_JVMFLAGS |
Used for setting different JVM parameters connected, for example, with garbage collecting |
-Xmx1024m |
JAVA |
A path to Java |
$JAVA_HOME/bin/java |
ZOO_LOG4J_PROP |
Used for setting the log4j logging level and defines, which log appenders to turn on.
Enabling the log appender |
INFO, CONSOLE, ROLLINGFILE |