Spark3 configuration parameters
To configure the service, use the following configuration parameters in ADCM.
|
NOTE
|
| Parameter | Description | Default value |
|---|---|---|
Dynamic allocation (spark.dynamicAllocation.enabled) |
Defines whether to use dynamic resource allocation that scales the number of executors, registered with this application, up and down, based on the workload |
false |
| Parameter | Description | Default value |
|---|---|---|
Encryption enable |
Enables or disables the credential encryption feature. When enabled, Spark3 stores configuration passwords and credentials required for interacting with other services in the encrypted form |
false |
Credential provider path |
Path to a keystore file with secrets |
jceks://hdfs/apps/spark/security/spark.jceks |
Custom jceks |
Set to |
false |
| Parameter | Description | Default value |
|---|---|---|
version |
Version of the spark-iceberg extension package |
1.5.2_arenadata1 |
| Parameter | Description | Default value |
|---|---|---|
spark.yarn.archive |
Archive containing all the required Spark JARs for distribution to the YARN cache.
If set, this configuration replaces |
hdfs:///apps/spark/spark3-yarn-archive.tgz |
spark.yarn.historyServer.address |
Spark History server address |
— |
spark.master |
Cluster manager to connect to |
yarn |
spark.dynamicAllocation.enabled |
Defines whether to use dynamic resource allocation that scales the number of executors, registered with this application, up and down, based on the workload |
false |
spark.shuffle.service.enabled |
Enables the external shuffle service. This service preserves the shuffle files written by executors so that executors can be safely removed, or so that shuffle fetches can continue in the event of executor failure. The external shuffle service must be set up in order to enable it |
false |
spark.eventLog.enabled |
Defines whether to log Spark events, useful for reconstructing the Web UI after the application has finished |
true |
spark.eventLog.dir |
Base directory where Spark events are logged, if |
hdfs:///var/log/spark/apps |
spark.dynamicAllocation.executorIdleTimeout |
If dynamic allocation is enabled and an executor has been idle for more than this duration, the executor will be removed. For more details, see Spark documentation |
120s |
spark.dynamicAllocation.cachedExecutorIdleTimeout |
If dynamic allocation is enabled and an executor which has cached data blocks has been idle for more than this duration, the executor will be removed. For more details, see Spark documentation |
600s |
spark.history.provider |
Name of the class that implements the application history backend. Currently there is only one implementation provided with Spark that looks for application logs stored in the file system |
org.apache.spark.deploy.history.FsHistoryProvider |
spark.history.fs.cleaner.enabled |
Specifies whether the History Server should periodically clean up event logs from storage |
true |
spark.history.store.path |
A local directory where to cache application history data. If set, the History Server will store application data on disk instead of keeping it in memory. The data written to disk will be re-used in case of the History Server restart |
/var/log/spark3/history |
spark.serializer |
Class used for serializing objects that will be sent over the network or need to be cached in the serialized form.
By default, works with any |
org.apache.spark.serializer.KryoSerializer |
spark.driver.extraClassPath |
Extra classpath entries to be added to the classpath of the driver |
|
spark.executor.extraClassPath |
Extra classpath entries to add to the classpath of the executors |
|
spark.history.ui.port |
Port number of the History Server web UI |
18092 |
spark.ui.port |
Port number of the Spark web UI |
4140 |
spark.history.fs.logDirectory |
Log directory of the History Server |
hdfs:///var/log/spark/apps |
spark.sql.extensions |
A comma-separated list of Iceberg SQL extensions classes |
org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions |
spark.sql.catalog.spark_catalog |
The Iceberg catalog implementation class |
org.apache.iceberg.spark.SparkSessionCatalog |
spark.sql.hive.metastore.jars |
The location of the JARs that should be used to instantiate HiveMetastoreClient |
path |
spark.sql.hive.metastore.jars.path |
A list of comma-separated paths to JARs used to instantiate HiveMetastoreClient |
file:///usr/lib/hive/lib/*.jar |
spark.driver.extraLibraryPath |
Path to extra native libraries for driver |
/usr/lib/hadoop/lib/native/ |
spark.yarn.am.extraLibraryPath |
Path to extra native libraries for Application Master |
/usr/lib/hadoop/lib/native/ |
spark.executor.extraLibraryPath |
Path to extra native libraries for Executor |
/usr/lib/hadoop/lib/native/ |
spark.yarn.appMasterEnv.HIVE_CONF_DIR |
A directory on the Application Master with Hive configs required for running Hive in the cluster mode |
/etc/spark3/conf |
spark.yarn.historyServer.allowTracking |
Allows using Spark History Server for tracking UI even if web UI is disabled for a job |
True |
spark.connect.grpc.binding.port |
The port number to connect to Spark Connect via gRPC |
15002 |
spark.artifactory.dir.path |
Path to an artifact directory used by Spark Connect |
tmp |
spark.sql.security.confblacklist |
Prevents overriding specified parameters from an application point of view or for information security reasons |
spark.sql.extensions |
spark.history.kerberos.enabled |
Indicates whether the History Server should use Kerberos to login. This is required if the History Server is accessing HDFS files on a secure Hyperwave cluster |
false |
spark.acls.enable |
Defines whether Spark ACLs should be enabled.
If enabled, checks if the user has access permissions to view or modify jobs.
Note: this requires the user to be known. If the user is |
false |
spark.modify.acls |
Defines who has access to modify a running Spark application |
spark,hdfs |
spark.modify.acls.groups |
A comma-separated list of user groups that have modify access to the Spark application |
spark,hdfs |
spark.history.ui.acls.enable |
Specifies whether ACLs should be checked to authorize users viewing the applications in the History Server.
If enabled, access control checks are performed regardless of what the individual applications had set for |
false |
spark.history.ui.admin.acls |
A comma-separated list of users that have view access to all the Spark applications in History Server |
spark,hdfs,dr.who |
spark.history.ui.admin.acls.groups |
A comma-separated list of groups that have view access to all the Spark applications in History Server |
spark,hdfs,dr.who |
spark.ui.view.acls |
A comma-separated list of users that have view access to the Spark application.
By default, only the user that started the Spark job has view access.
Using |
spark,hdfs,dr.who |
spark.ui.view.acls.groups |
A comma-separated list of groups that have view access to the Spark web UI to view the Spark Job details.
This can be used if you have a set of administrators or developers or users who can monitor the Spark job submitted.
Using |
spark,hdfs,dr.who |
spark.ssl.keyPassword |
The password to the private key in the keystore |
— |
spark.ssl.keyStore |
Path to the keystore file. The path can be absolute or relative to the directory in which the process is started |
— |
spark.ssl.keyStoreType |
The type of keystore used |
JKS |
spark.ssl.trustStorePassword |
The password to the private key in the truststore |
— |
spark.ssl.trustStoreType |
The type of the truststore |
JKS |
spark.ssl.enabled |
Defines whether to use SSL for Spark |
— |
spark.ssl.protocol |
Defines the TLS protocol to use. The protocol must be supported by JVM |
TLSv1.2 |
spark.ssl.ui.port |
The port number used by Spark web UI in case of active SSL |
4141 |
spark.ssl.historyServer.port |
The port number used by Spark History Server web UI in case of active SSL |
18092 |
| Parameter | Description | Default value |
|---|---|---|
Spark3 spark-log4j2.properties |
Stores the Log4j configuration used for logging Spark3’s activity |
|
Livy livy-log4j.properties |
Stores the Log4j configuration used for logging Livy’s activity |
| Parameter | Description | Default value |
|---|---|---|
livy.server.host |
Host address to start the Livy server. By default, Livy will bind to all network interfaces |
0.0.0.0 |
livy.server.port |
Port to run the Livy server |
8999 |
livy.spark.master |
Spark master to use for Livy sessions |
yarn |
livy.impersonation.enabled |
Defines if Livy should impersonate users when creating a new session |
true |
livy.server.csrf-protection.enabled |
Defines whether to enable the CSRF protection.
If enabled, clients should add the |
true |
livy.repl.enable-hive-context |
Defines whether to enable HiveContext in the Livy interpreter.
If set to |
true |
livy.server.recovery.mode |
Sets the recovery mode for Livy |
recovery |
livy.server.recovery.state-store |
Defines where Livy should store the state for recovery |
filesystem |
livy.server.recovery.state-store.url |
For the |
/livy-recovery |
livy.server.auth.type |
Sets the Livy authentication type |
— |
livy.server.access_control.enabled |
Defines whether to enable the access control for a Livy server.
If set to |
false |
livy.server.access_control.users |
Users allowed to access Livy. By default, any user is allowed to access Livy. If a user wants to limit the access, the user should list all the permitted users separated by a comma |
livy,hdfs,spark |
livy.superusers |
A list of comma-separated users that have the permissions to change other user’s submitted sessions, for example, submitting statements, deleting the session, and so on |
livy,hdfs,spark |
livy.keystore |
A path to the keystore file. The path can be absolute or relative to the directory in which the process is started |
— |
livy.keystore.password |
Password to access the keystore |
— |
livy.key-password |
Password to access the key in the keystore |
— |
livy.server.thrift.ssl.protocol.blacklist |
List of banned TLS protocols |
SSLv2,SSLv3,TLSv1,TLSv1.1 |
| Parameter | Description | Default value |
|---|---|---|
Spark History Server Heap Memory |
Sets the maximum Java heap size for Spark History Server |
1G |
Spark3 Connect Heap Memory |
Sets the maximum Java heap size for a Spark Connect server |
1G |
| Parameter | Description | Default value |
|---|---|---|
xasecure.audit.destination.solr.batch.filespool.dir |
Spool directory path |
/srv/ranger/hdfs_plugin/audit_solr_spool |
xasecure.audit.destination.solr.urls |
A URL of the Solr server to store audit events.
Leave this property value empty or set it to |
— |
xasecure.audit.destination.solr.zookeepers |
Specifies the ZooKeeper connection string for the Solr destination |
— |
xasecure.audit.destination.solr.force.use.inmemory.jaas.config |
Whether to use in-memory JAAS configuration file to connect to Solr |
— |
xasecure.audit.is.enabled |
Enables Ranger audit |
true |
xasecure.audit.jaas.Client.loginModuleControlFlag |
Specifies whether the success of the module is |
— |
xasecure.audit.jaas.Client.loginModuleName |
Name of the authenticator class |
— |
xasecure.audit.jaas.Client.option.keyTab |
Name of the keytab file to get the principal’s secret key |
— |
xasecure.audit.jaas.Client.option.principal |
Name of the principal to be used |
— |
xasecure.audit.jaas.Client.option.serviceName |
Name of a user or a service that wants to log in |
— |
xasecure.audit.jaas.Client.option.storeKey |
Set this to |
false |
xasecure.audit.jaas.Client.option.useKeyTab |
Set this to |
false |
| Parameter | Description | Default value |
|---|---|---|
ranger.plugin.spark.policy.rest.url |
The URL to Ranger Admin |
— |
ranger.plugin.spark.service.name |
The name of the Ranger service containing policies for this instance |
— |
ranger.plugin.spark.policy.cache.dir |
The directory where Ranger policies are cached after successful retrieval from the source |
/srv/ranger/spark/policycache |
ranger.plugin.hive.policy.cache.dir |
The directory where Ranger policies are cached after successful retrieval from the source |
The directory where Ranger policies for Hive are cached after successful retrieval from the source |
ranger.plugin.spark.policy.pollIntervalMs |
Defines how often to poll for changes in policies |
30000 |
ranger.plugin.spark.policy.rest.client.connection.timeoutMs |
The Spark plugin RangerRestClient connection timeout (in milliseconds) |
120000 |
ranger.plugin.spark.policy.rest.client.read.timeoutMs |
The Spark plugin RangerRestClient read timeout (in milliseconds) |
30000 |
ranger.add-yarn-authorization |
Set |
false |
ranger.plugin.spark.enable.implicit.userstore.enricher |
Enables UserStoreEnricher for fetching user and group attributes when using macros or scripts in row filters (Ranger 2.3+) |
true |
ranger.plugin.spark.policy.rest.ssl.config.file |
Path to the RangerRestClient SSL configuration file for the Spark plugin |
/etc/spark3/conf/ranger-spark-policymgr-ssl.xml |
| Parameter | Description | Default value |
|---|---|---|
xasecure.policymgr.clientssl.keystore |
Path to the keystore file used by Ranger |
— |
xasecure.policymgr.clientssl.keystore.credential.file |
Path to the keystore credentials file |
/etc/spark/conf/ranger-spark.jceks |
xasecure.policymgr.clientssl.truststore.credential.file |
Path to the truststore credentials file |
/etc/spark/conf/ranger-spark.jceks |
xasecure.policymgr.clientssl.truststore |
Path to the truststore file used by Ranger |
— |
xasecure.policymgr.clientssl.keystore.password |
Password to the keystore file |
— |
xasecure.policymgr.clientssl.truststore.password |
Password to the truststore file |
— |
| Parameter | Description | Default value |
|---|---|---|
Custom spark-defaults.conf |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file spark-defaults.conf |
— |
spark-env.sh |
The contents of the spark-env.sh file used to initialize environment variables on worker nodes |
|
Custom livy.conf |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file livy.conf |
— |
livy-env.sh |
The contents of the livy-env.sh file used to initialize environment variables for the Livy server operation |
|
spark-history-env.sh |
The contents of the spark-history-env.sh file used to initialize environment variables for the Spark History Server |
|
Ranger plugin enabled |
Enables or disables the Ranger plugin |
false |
| Parameter | Description | Default value |
|---|---|---|
adb_spark3_connector |
Version of the adb-spark3-connector package to be installed |
1.0.5_3.5.x |
adqm_spark3_connector |
Version of the adqm-spark3-connector package to be installed |
1.0.0_3.5.x |
adh_pyspark |
Version of the adh-pyspark package to be installed |
3.10.4 |