Spark3 configuration parameters

To configure the service, use the following configuration parameters in ADCM.

NOTE
  • Some of the parameters become visible in the ADCM UI after the Advanced flag has been set.

  • The parameters that are set in the Custom group will overwrite the existing parameters even if they are read-only.

Common
Parameter Description Default value

Dynamic allocation (spark.dynamicAllocation.enabled)

Defines whether to use dynamic resource allocation that scales the number of executors, registered with this application, up and down, based on the workload

false

Credential Encryption
Parameter Description Default value

Encryption enable

Enables or disables the credential encryption feature. When enabled, Spark3 stores configuration passwords and credentials required for interacting with other services in the encrypted form

false

Credential provider path

Path to a keystore file with secrets

jceks://hdfs/apps/spark/security/spark.jceks

Custom jceks

Set to true to use a custom JCEKS file. Set to false to use the default auto-generated JCEKS file

false

spark3_iceberg_extensions
Parameter Description Default value

version

Version of the spark-iceberg extension package

1.5.2_arenadata1

spark-defaults.conf
Parameter Description Default value

spark.yarn.archive

Archive containing all the required Spark JARs for distribution to the YARN cache. If set, this configuration replaces spark.yarn.jars and the archive is used in all the application containers. The archive should contain JAR files in its root directory. The archive can also be hosted on HDFS to speed up file distribution

hdfs:///apps/spark/spark3-yarn-archive.tgz

spark.yarn.historyServer.address

Spark History server address

 — 

spark.master

Cluster manager to connect to

yarn

spark.dynamicAllocation.enabled

Defines whether to use dynamic resource allocation that scales the number of executors, registered with this application, up and down, based on the workload

false

spark.shuffle.service.enabled

Enables the external shuffle service. This service preserves the shuffle files written by executors so that executors can be safely removed, or so that shuffle fetches can continue in the event of executor failure. The external shuffle service must be set up in order to enable it

false

spark.eventLog.enabled

Defines whether to log Spark events, useful for reconstructing the Web UI after the application has finished

true

spark.eventLog.dir

Base directory where Spark events are logged, if spark.eventLog.enabled=true. Within this base directory, Spark creates a sub-directory for each application, and logs the events specific to the application in this directory. You may want to set this to a unified location like an HDFS directory so history files can be read by the History Server

hdfs:///var/log/spark/apps

spark.dynamicAllocation.executorIdleTimeout

If dynamic allocation is enabled and an executor has been idle for more than this duration, the executor will be removed. For more details, see Spark documentation

120s

spark.dynamicAllocation.cachedExecutorIdleTimeout

If dynamic allocation is enabled and an executor which has cached data blocks has been idle for more than this duration, the executor will be removed. For more details, see Spark documentation

600s

spark.history.provider

Name of the class that implements the application history backend. Currently there is only one implementation provided with Spark that looks for application logs stored in the file system

org.apache.spark.deploy.history.FsHistoryProvider

spark.history.fs.cleaner.enabled

Specifies whether the History Server should periodically clean up event logs from storage

true

spark.history.store.path

A local directory where to cache application history data. If set, the History Server will store application data on disk instead of keeping it in memory. The data written to disk will be re-used in case of the History Server restart

/var/log/spark3/history

spark.serializer

Class used for serializing objects that will be sent over the network or need to be cached in the serialized form. By default, works with any Serializable Java object but it may be quite slow, so it is recommended to use org.apache.spark.serializer.KryoSerializer and configure Kryo serialization when speed is necessary. Can be any subclass of org.apache.spark.Serializer

org.apache.spark.serializer.KryoSerializer

spark.driver.extraClassPath

Extra classpath entries to be added to the classpath of the driver

  • /usr/lib/hive/lib/hive-shims-scheduler.jar

  • /usr/lib/hadoop-yarn/hadoop-yarn-server-resourcemanager.jar

  • /usr/lib/spark3/jars/adb-spark-connector-assembly-release-1.0.5-spark-3.5.2_arenadata1.jar

  • /usr/lib/spark3/jars/adqm-spark-connector-assembly-release-1.0.0-spark-3.5.2_arenadata1.jar

spark.executor.extraClassPath

Extra classpath entries to add to the classpath of the executors

  • /usr/lib/spark3/jars/adb-spark-connector-assembly-release-1.0.5-spark-3.5.2_arenadata1.jar

  • /usr/lib/spark3/jars/adqm-spark-connector-assembly-release-1.0.0-spark-3.5.2_arenadata1.jar

spark.history.ui.port

Port number of the History Server web UI

18092

spark.ui.port

Port number of the Spark web UI

4140

spark.history.fs.logDirectory

Log directory of the History Server

hdfs:///var/log/spark/apps

spark.sql.extensions

A comma-separated list of Iceberg SQL extensions classes

org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions

spark.sql.catalog.spark_catalog

The Iceberg catalog implementation class

org.apache.iceberg.spark.SparkSessionCatalog

spark.sql.hive.metastore.jars

The location of the JARs that should be used to instantiate HiveMetastoreClient

path

spark.sql.hive.metastore.jars.path

A list of comma-separated paths to JARs used to instantiate HiveMetastoreClient

file:///usr/lib/hive/lib/*.jar

spark.driver.extraLibraryPath

Path to extra native libraries for driver

/usr/lib/hadoop/lib/native/

spark.yarn.am.extraLibraryPath

Path to extra native libraries for Application Master

/usr/lib/hadoop/lib/native/

spark.executor.extraLibraryPath

Path to extra native libraries for Executor

/usr/lib/hadoop/lib/native/

spark.yarn.appMasterEnv.HIVE_CONF_DIR

A directory on the Application Master with Hive configs required for running Hive in the cluster mode

/etc/spark3/conf

spark.yarn.historyServer.allowTracking

Allows using Spark History Server for tracking UI even if web UI is disabled for a job

True

spark.connect.grpc.binding.port

The port number to connect to Spark Connect via gRPC

15002

spark.artifactory.dir.path

Path to an artifact directory used by Spark Connect

tmp

spark.sql.security.confblacklist

Prevents overriding specified parameters from an application point of view or for information security reasons

spark.sql.extensions

spark.history.kerberos.enabled

Indicates whether the History Server should use Kerberos to login. This is required if the History Server is accessing HDFS files on a secure Hyperwave cluster

false

spark.acls.enable

Defines whether Spark ACLs should be enabled. If enabled, checks if the user has access permissions to view or modify jobs. Note: this requires the user to be known. If the user is null, no checks will be made. Filters can be used within the UI to authenticate and set the user

false

spark.modify.acls

Defines who has access to modify a running Spark application

spark,hdfs

spark.modify.acls.groups

A comma-separated list of user groups that have modify access to the Spark application

spark,hdfs

spark.history.ui.acls.enable

Specifies whether ACLs should be checked to authorize users viewing the applications in the History Server. If enabled, access control checks are performed regardless of what the individual applications had set for spark.ui.acls.enable. If disabled, no access control checks are made for any application UIs available through the History Server

false

spark.history.ui.admin.acls

A comma-separated list of users that have view access to all the Spark applications in History Server

spark,hdfs,dr.who

spark.history.ui.admin.acls.groups

A comma-separated list of groups that have view access to all the Spark applications in History Server

spark,hdfs,dr.who

spark.ui.view.acls

A comma-separated list of users that have view access to the Spark application. By default, only the user that started the Spark job has view access. Using * as a value means that any user can have view access to this Spark job

spark,hdfs,dr.who

spark.ui.view.acls.groups

A comma-separated list of groups that have view access to the Spark web UI to view the Spark Job details. This can be used if you have a set of administrators or developers or users who can monitor the Spark job submitted. Using * in the list means any user in any group can view the Spark job details on the Spark web UI. The user groups are obtained from the instance of the groups mapping provider specified by spark.user.groups.mapping

spark,hdfs,dr.who

spark.ssl.keyPassword

The password to the private key in the keystore

 — 

spark.ssl.keyStore

Path to the keystore file. The path can be absolute or relative to the directory in which the process is started

 — 

spark.ssl.keyStoreType

The type of keystore used

JKS

spark.ssl.trustStorePassword

The password to the private key in the truststore

 — 

spark.ssl.trustStoreType

The type of the truststore

JKS

spark.ssl.enabled

Defines whether to use SSL for Spark

 — 

spark.ssl.protocol

Defines the TLS protocol to use. The protocol must be supported by JVM

TLSv1.2

spark.ssl.ui.port

The port number used by Spark web UI in case of active SSL

4141

spark.ssl.historyServer.port

The port number used by Spark History Server web UI in case of active SSL

18092

Custom log4j.properties
Parameter Description Default value

Spark3 spark-log4j2.properties

Stores the Log4j configuration used for logging Spark3’s activity

spark-log4j2.properties

Livy livy-log4j.properties

Stores the Log4j configuration used for logging Livy’s activity

livy-log4j.properties

livy.conf
Parameter Description Default value

livy.server.host

Host address to start the Livy server. By default, Livy will bind to all network interfaces

0.0.0.0

livy.server.port

Port to run the Livy server

8999

livy.spark.master

Spark master to use for Livy sessions

yarn

livy.impersonation.enabled

Defines if Livy should impersonate users when creating a new session

true

livy.server.csrf-protection.enabled

Defines whether to enable the CSRF protection. If enabled, clients should add the X-Requested-By HTTP header for POST/DELETE/PUT/PATCH HTTP methods

true

livy.repl.enable-hive-context

Defines whether to enable HiveContext in the Livy interpreter. If set to true, hive-site.xml and the Livy server classpath will be detected on user request automatically

true

livy.server.recovery.mode

Sets the recovery mode for Livy

recovery

livy.server.recovery.state-store

Defines where Livy should store the state for recovery

filesystem

livy.server.recovery.state-store.url

For the filesystem state store, the path of the state store directory. Do not use a filesystem that does not support atomic rename like S3. For example: file:///tmp/livy or hdfs:///. For ZooKeeper, specify the address to the ZooKeeper servers. For example: host1:port1,host2:port2

/livy-recovery

livy.server.auth.type

Sets the Livy authentication type

 — 

livy.server.access_control.enabled

Defines whether to enable the access control for a Livy server. If set to true, then all the incoming requests will be checked if the requested user has permission

false

livy.server.access_control.users

Users allowed to access Livy. By default, any user is allowed to access Livy. If a user wants to limit the access, the user should list all the permitted users separated by a comma

livy,hdfs,spark

livy.superusers

A list of comma-separated users that have the permissions to change other user’s submitted sessions, for example, submitting statements, deleting the session, and so on

livy,hdfs,spark

livy.keystore

A path to the keystore file. The path can be absolute or relative to the directory in which the process is started

 — 

livy.keystore.password

Password to access the keystore

 — 

livy.key-password

Password to access the key in the keystore

 — 

livy.server.thrift.ssl.protocol.blacklist

List of banned TLS protocols

SSLv2,SSLv3,TLSv1,TLSv1.1

Spark heap memory settings
Parameter Description Default value

Spark History Server Heap Memory

Sets the maximum Java heap size for Spark History Server

1G

Spark3 Connect Heap Memory

Sets the maximum Java heap size for a Spark Connect server

1G

ranger-spark-audit.xml
Parameter Description Default value

xasecure.audit.destination.solr.batch.filespool.dir

Spool directory path

/srv/ranger/hdfs_plugin/audit_solr_spool

xasecure.audit.destination.solr.urls

A URL of the Solr server to store audit events. Leave this property value empty or set it to NONE when using ZooKeeper to connect to Solr

 — 

xasecure.audit.destination.solr.zookeepers

Specifies the ZooKeeper connection string for the Solr destination

 — 

xasecure.audit.destination.solr.force.use.inmemory.jaas.config

Whether to use in-memory JAAS configuration file to connect to Solr

 — 

xasecure.audit.is.enabled

Enables Ranger audit

true

xasecure.audit.jaas.Client.loginModuleControlFlag

Specifies whether the success of the module is required, requisite, sufficient, or optional

 — 

xasecure.audit.jaas.Client.loginModuleName

Name of the authenticator class

 — 

xasecure.audit.jaas.Client.option.keyTab

Name of the keytab file to get the principal’s secret key

 — 

xasecure.audit.jaas.Client.option.principal

Name of the principal to be used

 — 

xasecure.audit.jaas.Client.option.serviceName

Name of a user or a service that wants to log in

 — 

xasecure.audit.jaas.Client.option.storeKey

Set this to true if you want the keytab or the principal’s key to be stored in the subject’s private credentials

false

xasecure.audit.jaas.Client.option.useKeyTab

Set this to true if you want the module to get the principal’s key from the keytab

false

ranger-spark-security.xml
Parameter Description Default value

ranger.plugin.spark.policy.rest.url

The URL to Ranger Admin

 — 

ranger.plugin.spark.service.name

The name of the Ranger service containing policies for this instance

 — 

ranger.plugin.spark.policy.cache.dir

The directory where Ranger policies are cached after successful retrieval from the source

/srv/ranger/spark/policycache

ranger.plugin.hive.policy.cache.dir

The directory where Ranger policies are cached after successful retrieval from the source

The directory where Ranger policies for Hive are cached after successful retrieval from the source

ranger.plugin.spark.policy.pollIntervalMs

Defines how often to poll for changes in policies

30000

ranger.plugin.spark.policy.rest.client.connection.timeoutMs

The Spark plugin RangerRestClient connection timeout (in milliseconds)

120000

ranger.plugin.spark.policy.rest.client.read.timeoutMs

The Spark plugin RangerRestClient read timeout (in milliseconds)

30000

ranger.add-yarn-authorization

Set true to use only Ranger ACLs (i.e. ignore YARN ACLs)

false

ranger.plugin.spark.enable.implicit.userstore.enricher

Enables UserStoreEnricher for fetching user and group attributes when using macros or scripts in row filters (Ranger 2.3+)

true

ranger.plugin.spark.policy.rest.ssl.config.file

Path to the RangerRestClient SSL configuration file for the Spark plugin

/etc/spark3/conf/ranger-spark-policymgr-ssl.xml

ranger-spark-policymgr-ssl.xml
Parameter Description Default value

xasecure.policymgr.clientssl.keystore

Path to the keystore file used by Ranger

 — 

xasecure.policymgr.clientssl.keystore.credential.file

Path to the keystore credentials file

/etc/spark/conf/ranger-spark.jceks

xasecure.policymgr.clientssl.truststore.credential.file

Path to the truststore credentials file

/etc/spark/conf/ranger-spark.jceks

xasecure.policymgr.clientssl.truststore

Path to the truststore file used by Ranger

 — 

xasecure.policymgr.clientssl.keystore.password

Password to the keystore file

 — 

xasecure.policymgr.clientssl.truststore.password

Password to the truststore file

 — 

Other
Parameter Description Default value

Custom spark-defaults.conf

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file spark-defaults.conf

 — 

spark-env.sh

The contents of the spark-env.sh file used to initialize environment variables on worker nodes

spark-env.sh

Custom livy.conf

In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file livy.conf

 — 

livy-env.sh

The contents of the livy-env.sh file used to initialize environment variables for the Livy server operation

livy-env.sh

spark-history-env.sh

The contents of the spark-history-env.sh file used to initialize environment variables for the Spark History Server

spark-history-env.sh

Ranger plugin enabled

Enables or disables the Ranger plugin

false

Spark3 Client component
Parameter Description Default value

adb_spark3_connector

Version of the adb-spark3-connector package to be installed

1.0.5_3.5.x

adqm_spark3_connector

Version of the adqm-spark3-connector package to be installed

1.0.0_3.5.x

adh_pyspark

Version of the adh-pyspark package to be installed

3.10.4

Found a mistake? Seleсt text and press Ctrl+Enter to report it