NiFi ADB Connector overview
Overview
NiFi ADB Connector provides high-speed parallel writing of data from NiFi to Arenadata DB (ADB).
Starting with ADS 3.9.0.1.b1, in the NiFi user interface, Arenadata development components for creating NiFI ADB Connector are available — PutGreenplumRecord processor and StandartGpfdistService controller service.
NiFi ADB Connector architecture
PutGreenplumRecord processors collect data and pass it to the StandartGpfdistService controller service, which, using the built-in gpfdist protocol, connects to ADB via the GreenplumDBCPConnectionPool database connection pool service.
Greenplum Parallel File Server (gpfdist) is a Greenplum utility for reading and writing data from files located on remote servers. It is installed on all hosts of the ADB cluster and provides parallel loading of data, distributing it between segments evenly or according to the specified data distribution key.
PutGreenplumRecord
PutGreenplumRecord — NIFI processor that processes input records through a gpfdist receiver created by StandartGpfdistService.
The processor parameters are listed below.
Parameter | Required | Description | Default value |
---|---|---|---|
Gpfdist Service |
true |
Link to the running service of the StandartGpfdistService controller |
— |
Record Reader |
true |
Link to start one of the controller services: CSVReader, AvroReader, or another, depending on the source of input data |
— |
Schema Name |
false |
Name of the schema where the data will be loaded |
null |
Table Name |
true |
Name of the table where the data will be loaded |
— |
Table Columns |
true |
Columns of the table where the data will be loaded |
— |
StandartGpfdistService
StandartGpfdistService — controller service for writing data to ADB segments, using the mechanism of a readable external table with the gpfdist protocol version = 1.
The parameters of the controller service are given below.
Parameter | Required | Description | Default value |
---|---|---|---|
Listening Port |
true |
Port to listen for incoming gpfdist requests |
— |
Database Connection Pooling Service |
true |
Reference to the configured DBCPConnectionPool controller service |
— |
Minimum Gpffist Server Threads |
false |
Minimum number of threads used to run gpfdist server |
1 |
Maximum Gpffist Server Threads |
false |
Maximum number of threads used to run gpfdist server |
4 |
The Maximum Gpffist Server Threads Idle Timeout |
false |
Maximum idle time of gpfdist server threads in milliseconds |
60000 |
Write buffer size in bytes |
false |
Size of the write byte buffer for serialized writes |
1Mb |
Maximum Record Processor Threads |
false |
Maximum number of threads used to process records |
8 |
Maximum Gpfdist Request Processor Threads |
false |
Maximum number of threads used to process a gpfdist request |
8 |
GreenplumDBCPConnectionPool
GreenplumDBCPConnectionPool — ADB connection service. Provides a database connection pool service.
Supported data types
Mapping ADB data types to NiFi record field types is given below.
ADB data type | NiFi record field data type | Comment |
---|---|---|
BIT |
BOOLEAN |
— |
BOOLEAN |
BOOLEAN |
— |
SMALLINT |
SHORT, INT |
Value size should be less than 2 bytes |
INTEGER |
INT |
— |
BIGINT |
BIGINT, LONG |
— |
REAL |
FLOAT |
— |
DOUBLE |
DOUBLE |
— |
NUMERIC(p, s) |
DECIMAL |
— |
CHAR(n) |
STRING |
— |
VARCHAR(n) |
STRING |
— |
ENUM |
STRING |
— |
BYTEA |
BYTE[], STRING |
For the STRING type, a value should be written in hexadecimal format, for example: |
DATE |
DATE |
— |
TIME(n) |
TIME |
— |
TIMESTAMP(n) |
TIMESTAMP |
— |
TIMESTAMPTZ(n) |
TIMESTAMP |
— |
MONEY |
DECIMAL, STRING, DOUBLE, FLOAT |
— |
UUID |
STRING |
— |
JSONB |
STRING |
— |
HSTORE |
STRING, MAP(STRING, STRING) |
For the STRING type, a value should be in the format |
ARRAY |
STRING, STRING[] |
For the STRING type, a value should be in the format |