Spark4 configuration parameters
To configure the service, use the following configuration parameters in ADCM.
|
NOTE
|
| Parameter | Description | Default value |
|---|---|---|
Dynamic allocation (spark.dynamicAllocation.enabled) |
Defines whether to use dynamic resource allocation that scales the number of executors, registered with this application, up and down, based on the workload |
false |
| Parameter | Description | Default value |
|---|---|---|
Encryption enable |
Enables or disables the credential encryption feature. When enabled, Spark4 stores configuration passwords and credentials required for interacting with other services in the encrypted form |
false |
Credential provider path |
Path to a keystore file with secrets |
jceks://hdfs/apps/spark/security/spark4.jceks |
Custom jceks |
Set to |
false |
| Parameter | Description | Default value |
|---|---|---|
spark.yarn.archive |
Archive containing all the required Spark JARs for distribution to the YARN cache.
If set, this configuration replaces |
hdfs:///apps/spark/spark4-yarn-archive.tgz |
spark.yarn.appMasterEnv.JAVA_HOME |
Value of |
/usr/lib/jvm/java-arenadata-openjdk-17 |
spark.executorEnv.JAVA_HOME |
Value of |
/usr/lib/jvm/java-arenadata-openjdk-17 |
spark.yarn.historyServer.address |
Spark History server address |
— |
spark.master |
Cluster manager to connect to |
yarn |
spark.dynamicAllocation.enabled |
Defines whether to use dynamic resource allocation that scales the number of executors, registered with this application, up and down, based on the workload |
false |
spark.shuffle.service.enabled |
Enables the external shuffle service. This service preserves the shuffle files written by executors so that executors can be safely removed, or so that shuffle fetches can continue in the event of executor failure. The external shuffle service must be set up in order to enable it |
false |
spark.eventLog.enabled |
Defines whether to log Spark events, useful for reconstructing the Web UI after the application has finished |
true |
spark.eventLog.dir |
Base directory where Spark events are logged, if |
hdfs:///var/log/spark4/apps |
spark.dynamicAllocation.executorIdleTimeout |
If dynamic allocation is enabled and an executor has been idle for more than this duration, the executor will be removed. For more details, see Spark documentation |
120s |
spark.dynamicAllocation.cachedExecutorIdleTimeout |
If dynamic allocation is enabled and an executor which has cached data blocks has been idle for more than this duration, the executor will be removed. For more details, see Spark documentation |
600s |
spark.history.provider |
Name of the class that implements the application history backend. Currently there is only one implementation provided with Spark that looks for application logs stored in the file system |
org.apache.spark.deploy.history.FsHistoryProvider |
spark.history.fs.cleaner.enabled |
Specifies whether the History Server should periodically clean up event logs from storage |
true |
spark.history.store.path |
A local directory where to cache application history data. If set, History Server will store application data on disk instead of keeping it in memory. The data written to disk will be re-used in case of the History Server restart |
/var/log/spark4/history |
spark.serializer |
Class used for serializing objects that will be sent over the network or need to be cached in the serialized form.
By default, works with any serializable Java object but it may be quite slow, so it is recommended to use |
org.apache.spark.serializer.KryoSerializer |
spark.driver.extraClassPath |
Extra classpath entries to be added to the classpath of the driver |
|
spark.executor.extraClassPath |
Extra classpath entries to add to the classpath of the executors |
|
spark.history.ui.port |
Port number of the History Server web UI |
18094 |
spark.ui.port |
Port number of the Spark web UI |
4150 |
spark.history.fs.logDirectory |
Log directory of the History Server |
hdfs:///var/log/spark4/apps |
spark.sql.extensions |
A comma-separated list of Iceberg SQL extensions classes |
org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions |
spark.sql.catalog.spark_catalog |
Iceberg catalog implementation class |
org.apache.iceberg.spark.SparkSessionCatalog |
spark.sql.hive.metastore.jars |
Location of the JARs that should be used to instantiate HiveMetastoreClient |
path |
spark.sql.hive.metastore.jars.path |
A list of comma-separated paths to JARs used to instantiate HiveMetastoreClient |
file:///usr/lib/hive/lib/*.jar |
spark.driver.extraLibraryPath |
Path to extra native libraries for driver |
/usr/lib/hadoop/lib/native/ |
spark.yarn.am.extraLibraryPath |
Path to extra native libraries for Application Master |
/usr/lib/hadoop/lib/native/ |
spark.executor.extraLibraryPath |
Path to extra native libraries for Executor |
/usr/lib/hadoop/lib/native/ |
spark.yarn.appMasterEnv.HIVE_CONF_DIR |
A directory on the Application Master with Hive configs required for running Hive in the cluster mode |
/etc/spark4/conf |
spark.yarn.historyServer.allowTracking |
Allows using Spark History Server for tracking UI even if web UI is disabled for a job |
True |
spark.connect.grpc.binding.port |
Port number to connect to Spark Connect via gRPC |
15012 |
spark.artifactory.dir.path |
Path to an artifact directory used by Spark Connect |
tmp |
spark.sql.security.confblacklist |
Prevents overriding specified parameters from an application point of view or for information security reasons |
spark.sql.extensions |
spark.history.kerberos.enabled |
Indicates whether the History Server should use Kerberos to login. This is required if the History Server is accessing HDFS files on a secure Hyperwave cluster |
false |
spark.acls.enable |
Defines whether Spark ACLs should be enabled.
If enabled, checks if the user has access permissions to view or modify jobs.
Note: this requires the user to be known. If the user is |
false |
spark.modify.acls |
Defines who has access to modify a running Spark application |
spark,hdfs |
spark.modify.acls.groups |
A comma-separated list of user groups that have modify access to the Spark application |
spark,hdfs |
spark.history.ui.acls.enable |
Specifies whether ACLs should be checked to authorize users viewing the applications in the History Server.
If enabled, access control checks are performed regardless of what the individual applications had set for |
false |
spark.history.ui.admin.acls |
A comma-separated list of users that have view access to all the Spark applications in History Server |
spark,hdfs,dr.who |
spark.history.ui.admin.acls.groups |
A comma-separated list of groups that have view access to all the Spark applications in History Server |
spark,hdfs,dr.who |
spark.ui.view.acls |
A comma-separated list of users that have view access to the Spark application.
By default, only the user that started the Spark job has view access.
Using |
spark,hdfs,dr.who |
spark.ui.view.acls.groups |
A comma-separated list of groups that have view access to the Spark web UI to view the Spark Job details.
This can be used if you have a set of administrators or developers or users who can monitor the Spark job submitted.
Using |
spark,hdfs,dr.who |
spark.ssl.keyPassword |
Password to the private key in the keystore |
— |
spark.ssl.keyStore |
Path to the keystore file. The path can be absolute or relative to the directory in which the process is started |
— |
spark.ssl.keyStoreType |
Type of keystore used |
JKS |
spark.ssl.trustStorePassword |
Password to the private key in the truststore |
— |
spark.ssl.trustStoreType |
Type of the truststore |
JKS |
spark.ssl.enabled |
Defines whether to use SSL for Spark |
— |
spark.ssl.protocol |
Defines the TLS protocol to use. The protocol must be supported by JVM |
TLSv1.2 |
spark.ssl.ui.port |
Port number used by Spark web UI in case of active SSL |
4151 |
spark.ssl.historyServer.port |
Port number used by Spark History Server web UI in case of active SSL |
18094 |
spark.executorEnv.PYTHONPATH |
Value of the |
./pyspark.zip:./py4j.zip |
spark.yarn.appMasterEnv.PYTHONPATH |
Value of the |
./pyspark.zip:./py4j.zip |
spark.yarn.dist.archives |
Comma-separated list of archives to be extracted into the working directory of each Executor |
hdfs:///apps/spark4/pyspark.zip#pyspark.zip,hdfs:///apps/spark4/py4j.zip#py4j.zip |
| Parameter | Description | Default value |
|---|---|---|
Spark4 spark-log4j2.properties |
Stores the Log4j configuration used for logging Spark4’s activity |
| Parameter | Description | Default value |
|---|---|---|
Spark History Server Heap Memory |
Sets the maximum Java heap size for Spark History Server |
1G |
Spark4 Connect Heap Memory |
Sets the maximum Java heap size for a Spark Connect server |
1G |
| Parameter | Description | Default value |
|---|---|---|
ad-runtime-utils |
Java configuration to be used by the service |
— |
Custom spark-defaults.conf |
In this section you can define values for custom parameters that are not displayed in ADCM UI, but are allowed in the configuration file spark-defaults.conf |
— |
spark-env.sh |
Contents of the spark-env.sh file used to initialize environment variables on worker nodes |
|
spark-history-env.sh |
Contents of the spark-history-env.sh file used to initialize environment variables for the Spark History Server |
|
Ranger plugin enabled |
Enables or disables the Ranger plugin |
false |
| Parameter | Description | Default value |
|---|---|---|
adb_spark4_connector |
Version of the adb-spark4-connector package to be installed |
1.2.0_4.0.x |
adqm_spark4_connector |
Version of the adqm-spark4-connector package to be installed |
1.1.0_4.0.x |
adh_pyspark |
Version of the adh-pyspark package to be installed |
3.10.4 |