NiFi ADB Connector usage example
Overview
To illustrate the operation of NiFi ADB Connector, the article shows the implementation of loading data from PostgreSQL database tables into ADB tables (based on Greenplum).
The creation of NiFi ADB Connector is performed in NiFi user interface and is available starting from ADS 3.9.0.1.b1.
Prerequisites
The following describes the environment used to create the NiFi ADB Connector.
ADS
-
The ADS cluster is installed according to the Online installation guide. The minimum ADS version is 3.9.0.1.b1.
-
The NiFi and ZooKeeper services are installed in the ADS cluster.
-
Hosts where NiFi is installed have a binary JAR file of a PostgreSQL JDBC driver that allows programs to connect to a PostgreSQL database using standard Java code. The path to the file is used when configuring DBCPConnectionPool services to connect to PostgreSQL (ADPG) and Greenplum (ADB) servers.
The command below downloads the required version of JAR file (in this example, version 42.7.5 is used) to the required directory:
$ wget https://jdbc.postgresql.org/download/postgresql-42.7.5.jar
-
When configuring the DBCPConnectionPool service, a link in the format
jdbc:postgresql://<hostname>:5432/<database>
is used, where:-
<hostname>
is the host on which the ADPG component (for connecting to the PostgreSQL server) or the ADB Master component (for connecting to the Greenplum server) is installed; -
<database>
is the name of the database in which the table being used is created.
-
ADPG
-
The ADPG cluster is installed according to the Online installation guide.
-
The following preparatory actions were performed on the installed PostgreSQL server (the ADPG cluster):
-
A user named
my_user
, withSUPERUSER
privileges and a password, was created in thepostgres
database. -
A
my_table
table was created in thepostgres
database and several rows with data were added to it. -
The pg_hba.conf file is configured to provide user access from the host where the NiFi service of the ADS cluster is installed. For this purpose, an entry about the host address and user is added to the PG_HBA field on the configuration parameters page of the ADPG service:
host postgres my_user 10.92.43.206/32 trust
-
NOTE
|
ADB
-
The ADB cluster is installed according to the Online installation guide.
-
The ADB service is installed in the ADB cluster.
-
A user with the
new_user
name,SUPERUSER
privileges and password has been created in theadb
database. -
A
my_table
table has been created in theadb
database to record data. -
The pg_hba.conf file is configured to provide user access from the host on which the ADS cluster NiFi service is installed. To do this, an entry about the host address and user has been added to the PG_HBA field on the configuration parameters page of the ADB service:
host adb new_user 10.92.40.128/24 trust
-
On the Interconnect network to which the cluster hosts are connected, you should specify a jumbo frame
MTU=9000
so that the packets generated by ADB (gp_max_packet_size
+ overhead) can fit into these frames. For more information about ADB cluster network requirements, see Network requirements.
NOTE
For information on working with ADB tables, see Tables. |
Connect to PostgreSQL server (ADPG)
-
Create a QueryDatabaseTable processor and open its configuration. This processor executes SQL queries to retrieve data from a PostgreSQL table. Fill in the parameters related to the PostgreSQL table used.
QueryDatabaseTable processor configurationQueryDatabaseTable processor configuration -
Go to the Database Connection Pooling Service parameter value field, select Create new service… in the pop-up list and in the window that opens, create an instance of the DBCPConnectionPool service to create a connection to the PostgreSQL server.
Creating an instance of the DBCPConnectionPool serviceCreating an instance of the DBCPConnectionPool service -
After saving the created instance, click
. In the next NiFi Flow Configuration → Controller Services window, open the service configuration and specify parameters related to the PostgreSQL database.
PostgressDBCPConnectionPool service configurationPostgressDBCPConnectionPool service configuration
Connect to Greenplum (ADB)
-
To connect to the Greenplum table, create a PutGreenplumRecord processor, open its configuration, and specify parameters related to the Greenplum table.
PutGreenplumRecord processor configurationPutGreenplumRecord processor configuration -
Go to the Gpfdist Service parameter value field, select Create new service… in the pop-up list, and create an instance of the StandartGpfdistService service in the window that opens.
Creating an instance of the StandartGpfdistService serviceCreating an instance of the StandartGpfdistService service -
After saving the created instance, click
. In the next NiFi Flow Configuration → Controller Services window, open the service configuration and specify necessary parameters.
StandartGpfdistService service configurationStandartGpfdistService service configuration -
Go to the Database Connection Pooling Service parameter value field, select Create new service… in the pop-up list and in the opened window, create an instance of the DBCPConnectionPool service to create a connection to Greenplum.
-
After saving the created instance, click
and in the opened NiFi Flow Configuration → Controller Services window, open the service configuration and specify parameters related to the ADB database.
GreenplumDBCPConnectionPool service configurationGreenplumDBCPConnectionPool service configuration -
Close the NiFi Flow Configuration → Controller Services window and go back to the PutGreenplumRecord processor configuration. In the Record Reader parameter value field, create an instance of the AvroReader service to read records from PostgreSQL in Avro format with built-in schema.
If you are using multiple PutGreenplumRecord processors to transfer data from different sources to a single ADB database, for the Gpfdist Service value, select the same StandartGpfdistService you created.
After all services are created, they are displayed on the NiFi Flow Configuration → Controller Services page.


The StandartGpfdistService service fails until its associated GreenplumDBCPConnectionPool service starts.
Start a data flow


Processors are displayed with errors because the services associated with them are not running.
To start the flow, perform the following:
-
Start the services one by one on the NiFi Flow Configuration → Controller Services page by clicking on the icon
.
-
Run the created data stream.
Using queries to the ADB database, you can read the received data.