Spark connector options

The necessary parameters for connecting to the ADB database and read/write parameters are set via key-value pairs.

The connector supports the following options:

Key Description Type Required (default)

spark.adb.url

Database Connection string

Read/Write

Yes

spark.adb.dbschema

The name of the database schema which the table belongs

Read/Write

Yes (public)

spark.adb.dbtable

The name of the database table

Read/Write

Yes

spark.adb.driver

The full path to the JDBC driver, when using a custom driver

Read/Write

No (org.postgresql.Driver)

spark.adb.user

User/ADB role

Read/Write

Yes

spark.adb.password

The user’s password in ADB

Read/Write

No

spark.adb.server.usehostname

Using the Spark executor node name as a gpfdist server address

Read/Write

No (false)

spark.adb.server.env.name

The name of the environment variable whose value determines the name of the Spark executor node or the IP address on which the gpfdist server process runs

Read/Write

No

spark.adb.server.port

A port number or a port range of the gpfdist server process on the Spark executor node

Read/Write

No

spark.adb.pool.maxsize

The maximum number of connections in the connection pool

Read/Write

No (4)

spark.adb.pool.timeoutms

The time in milliseconds after which an inactive connection is considered idle

Read/Write

No (10 000)

spark.adb.pool.minidle

The minimum number of idle connections supported in the connection pool

Read/Write

No (0)

spark.adb.debugmode

Enabling event logging mode in the adb_spark_debug_query_log ADB table

Read/Write

No (false)

spark.adb.partition.column

The name of the table column used in Spark for partitioning. This column should be one of the integer data types or a date/time

Read

No

spark.adb.partition.count

The number of partitions in Spark. It can be specified either independently or in conjunction with spark.add.partition.column or spark.adb.partition.hash

Read

No

spark.adb.partition.hash

An expression used as a partitioning key when reading data in Spark. Specified in conjunction with spark.adb.partition.count. This expression should return an integer data type

Read

No

spark.adb.batch.enable

Enabling batch mode. It is assumed when reading data from ADB to Spark, the ColumnarBatch structure is used, which is optimized for working with scan-type operations. Experimentally

Read

No (false)

spark.adb.batch.memorymode

The type of data structure used when organizing the ColumnarBatch: JVM array in memory or offheap. Acceptable values: OFF_HEAP, ON_HEAP. Specified in conjunction with spark.adb.batch.enable

Read

No

spark.adb.table.truncate

Used when writing in Overwrite mode. Performs the truncate table operation, if true, otherwise drops table

Write

No (false)

spark.adb.create.table.with

Used when writing in Overwrite and errorIfExists modes. Storage parameters when creating a table (WITH expression)

Write

No

spark.adb.create.table.distributedby

Used when writing in Overwrite and errorIfExists modes. Distribution key when creating the target table (expression DISTRIBUTED BY)

Write

No (RANDOMLY)

Found a mistake? Seleсt text and press Ctrl+Enter to report it