ADB Spark 3 Connector options

Mikhail Serov

Read/write parameters and parameters that are necessary to connect to the ADB database are set via key/value pairs.

The connector supports the following options.

ADB Spark 3 Connector options
Key	Description	Type	Required	Default value
spark.adb.url	Database connection string	Read/write	Yes	—
spark.adb.dbschema	Name of the database schema to which the table belongs	Read/write	Yes	public
spark.adb.dbtable	Name of the database table	Read/write	Yes	—
spark.adb.driver	The full path to the JDBC driver when using a custom driver	Read/write	No	org.postgresql.Driver
spark.adb.user	User/ADB role	Read/write	Yes	—
spark.adb.password	User password in ADB	Read/write	No	—
spark.adb.server.usehostname	Allows to use the Spark 3 executor node name as a `gpfdist` server address	Read/write	No	false
spark.adb.server.env.name	Name of the environment variable whose value determines the name of the Spark 3 executor node or the IP address on which the `gpfdist` server process runs	Read/write	No	—
spark.adb.server.port	A port number or a port range of the `gpfdist` server process on the Spark 3 executor node	Read/write	No	—
spark.adb.pool.maxsize	Maximum number of connections in the connection pool	Read/write	No	4
spark.adb.pool.timeoutms	Time in milliseconds after which an inactive connection is considered idle	Read/write	No	10000
spark.adb.pool.minidle	Minimum number of idle connections supported in the connection pool	Read/write	No	0
spark.adb.debugmode	Enables event logging mode in the `adb_spark_debug_query_log` ADB table	Read/write	No	false
spark.adb.partition.column	Name of the table column used for partitioning in Spark 3. This column should be of the `integer` or `date/time` data types	Read	No	—
spark.adb.partition.count	Number of partitions in Spark 3. Can be specified either independently or simulataneously with `spark.add.partition.column` or `spark.adb.partition.hash`	Read	No	—
spark.adb.partition.hash	Expression used as partitioning key when reading data in Spark 3. Specified simultaneously with `spark.adb.partition.count`. This expression should return a value of the `integer` data type	Read	No	—
spark.adb.table.truncate	Used when writing in the `Overwrite` mode. Performs the truncate table operation, if the value is `true`, otherwise drops the table	Write	No	false
spark.adb.create.table.with	Used when writing in the `Overwrite` and `errorIfExists` modes. Stores parameters when creating a table via the `WITH` expression	Write	No	—
spark.adb.create.table.distributedby	Used when writing in the `Overwrite` and `errorIfExists` modes. Works as distribution key when creating the target table via the `DISTRIBUTED BY` expression	Write	No	RANDOMLY
spark.adb.read.mode	Sets the mode for reading data from ADB. The possible values: `GPFDIST` — Spark fetches data from ADB via the `gpfdist` utility. `PARALLEL_CURSOR` — Spark reads data directly from ADB segments using the ADB `gp_parallel_retrieve_cursor` cursor. Using this cursor implementation is effective for retrieving large data sets in parallel, as it assumes fewer data transformations.	Read	No	GPFDIST

Found a mistake? Seleсt text and press Ctrl+Enter to report it