NiFi ADB Connector overview

Overview

NiFi ADB Connector provides high-speed parallel writing of data from NiFi to Arenadata DB (ADB).

Starting with ADS 3.9.0.1.b1, in the NiFi user interface, Arenadata development components for creating NiFI ADB Connector are available — PutGreenplumRecord processor and StandartGpfdistService controller service.

NiFi ADB Connector architecture

PutGreenplumRecord processors collect data and pass it to the StandartGpfdistService controller service, which, using the built-in gpfdist protocol, connects to ADB via the GreenplumDBCPConnectionPool database connection pool service.

Greenplum Parallel File Server (gpfdist) is a Greenplum utility for reading and writing data from files located on remote servers. It is installed on all hosts of the ADB cluster and provides parallel loading of data, distributing it between segments evenly or according to the specified data distribution key.

NiFi ADB Connector architecture
NiFi ADB Connector architecture
NiFi ADB Connector architecture
NiFi ADB Connector architecture

PutGreenplumRecord

PutGreenplumRecord — NIFI processor that processes input records through a gpfdist receiver created by StandartGpfdistService.

The processor parameters are listed below.

Parameter Required Description Default value

Gpfdist Service

true

Link to the running service of the StandartGpfdistService controller

 — 

Record Reader

true

Link to start one of the controller services: CSVReader, AvroReader, or another, depending on the source of input data

 — 

Schema Name

false

Name of the schema where the data will be loaded

null

Table Name

true

Name of the table where the data will be loaded

 — 

Table Columns

true

Columns of the table where the data will be loaded

 — 

StandartGpfdistService

StandartGpfdistService — controller service for writing data to ADB segments, using the mechanism of a readable external table with the gpfdist protocol version = 1.

The parameters of the controller service are given below.

Parameter Required Description Default value

Listening Port

true

Port to listen for incoming gpfdist requests

 — 

Database Connection Pooling Service

true

Reference to the configured DBCPConnectionPool controller service

 — 

Minimum Gpffist Server Threads

false

Minimum number of threads used to run gpfdist server

1

Maximum Gpffist Server Threads

false

Maximum number of threads used to run gpfdist server

4

The Maximum Gpffist Server Threads Idle Timeout

false

Maximum idle time of gpfdist server threads in milliseconds

60000

Write buffer size in bytes

false

Size of the write byte buffer for serialized writes

1Mb

Maximum Record Processor Threads

false

Maximum number of threads used to process records

8

Maximum Gpfdist Request Processor Threads

false

Maximum number of threads used to process a gpfdist request

8

GreenplumDBCPConnectionPool

GreenplumDBCPConnectionPool — ADB connection service. Provides a database connection pool service.

Supported data types

Mapping ADB data types to NiFi record field types is given below.

ADB data type NiFi record field data type Comment

BIT

BOOLEAN

 — 

BOOLEAN

BOOLEAN

 — 

SMALLINT

SHORT, INT

Value size should be less than 2 bytes

INTEGER

INT

 — 

BIGINT

BIGINT, LONG

 — 

REAL

FLOAT

 — 

DOUBLE

DOUBLE

 — 

NUMERIC(p, s)

DECIMAL

 — 

CHAR(n)

STRING

 — 

VARCHAR(n)

STRING

 — 

ENUM

STRING

 — 

BYTEA

BYTE[], STRING

For the STRING type, a value should be written in hexadecimal format, for example: \xd078

DATE

DATE

 — 

TIME(n)

TIME

 — 

TIMESTAMP(n)

TIMESTAMP

 — 

TIMESTAMPTZ(n)

TIMESTAMP

 — 

MONEY

DECIMAL, STRING, DOUBLE, FLOAT

 — 

UUID

STRING

 — 

JSONB

STRING

 — 

HSTORE

STRING, MAP(STRING, STRING)

For the STRING type, a value should be in the format key=value, key1=value1

ARRAY

STRING, STRING[]

For the STRING type, a value should be in the format {value, value1}

Found a mistake? Seleсt text and press Ctrl+Enter to report it