NiFi ADB Connector usage example

Overview

To illustrate the operation of NiFi ADB Connector, the article shows the implementation of loading data from PostgreSQL database tables into ADB tables (based on Greenplum).

The creation of NiFi ADB Connector is performed in NiFi user interface and is available starting from ADS 3.9.0.1.b1.

Prerequisites

The following describes the environment used to create the NiFi ADB Connector.

ADS

  • The ADS cluster is installed according to the Online installation guide. The minimum ADS version is 3.9.0.1.b1.

  • The NiFi and ZooKeeper services are installed in the ADS cluster.

  • Hosts where NiFi is installed have a binary JAR file of a PostgreSQL JDBC driver that allows programs to connect to a PostgreSQL database using standard Java code. The path to the file is used when configuring DBCPConnectionPool services to connect to PostgreSQL (ADPG) and Greenplum (ADB) servers.

    The command below downloads the required version of JAR file (in this example, version 42.7.5 is used) to the required directory:

$ wget https://jdbc.postgresql.org/download/postgresql-42.7.5.jar
  • When configuring the DBCPConnectionPool service, a link in the format jdbc:postgresql://<hostname>:5432/<database> is used, where:

    • <hostname> is the host on which the ADPG component (for connecting to the PostgreSQL server) or the ADB Master component (for connecting to the Greenplum server) is installed;

    • <database> is the name of the database in which the table being used is created.

ADPG

  • The ADPG cluster is installed according to the Online installation guide.

  • The following preparatory actions were performed on the installed PostgreSQL server (the ADPG cluster):

    • A user named my_user, with SUPERUSER privileges and a password, was created in the postgres database.

    • A my_table table was created in the postgres database and several rows with data were added to it.

    • The pg_hba.conf file is configured to provide user access from the host where the NiFi service of the ADS cluster is installed. For this purpose, an entry about the host address and user is added to the PG_HBA field on the configuration parameters page of the ADPG service:

      host    postgres  my_user      10.92.43.206/32     trust
NOTE

ADB

  • The ADB cluster is installed according to the Online installation guide.

  • The ADB service is installed in the ADB cluster.

  • A user with the new_user name, SUPERUSER privileges and password has been created in the adb database.

  • A my_table table has been created in the adb database to record data.

  • The pg_hba.conf file is configured to provide user access from the host on which the ADS cluster NiFi service is installed. To do this, an entry about the host address and user has been added to the PG_HBA field on the configuration parameters page of the ADB service:

    host    adb  new_user       10.92.40.128/24     trust
  • On the Interconnect network to which the cluster hosts are connected, you should specify a jumbo frame MTU=9000 so that the packets generated by ADB (gp_max_packet_size + overhead) can fit into these frames. For more information about ADB cluster network requirements, see Network requirements.

NOTE

For information on working with ADB tables, see Tables.

Connect to PostgreSQL server (ADPG)

  1. Create a QueryDatabaseTable processor and open its configuration. This processor executes SQL queries to retrieve data from a PostgreSQL table. Fill in the parameters related to the PostgreSQL table used.

    QueryDatabaseTable processor configuration
    QueryDatabaseTable processor configuration
    QueryDatabaseTable processor configuration
    QueryDatabaseTable processor configuration
  2. Go to the Database Connection Pooling Service parameter value field, select Create new service…​ in the pop-up list and in the window that opens, create an instance of the DBCPConnectionPool service to create a connection to the PostgreSQL server.

    Creating an instance of the DBCPConnectionPool service
    Creating an instance of the DBCPConnectionPool service
    Creating an instance of the DBCPConnectionPool service
    Creating an instance of the DBCPConnectionPool service
  3. After saving the created instance, click arrow2 light. In the next NiFi Flow Configuration → Controller Services window, open the service configuration and specify parameters related to the PostgreSQL database.

    PostgressDBCPConnectionPool service configuration
    PostgressDBCPConnectionPool service configuration
    PostgressDBCPConnectionPool service configuration
    PostgressDBCPConnectionPool service configuration

Creating service instances can be done before creating the processor. This can be used to connect multiple processors to a single service. To create a service:

  1. Right-click in the empty flow field and select Configure in the context menu that opens.

  2. In the NiFi Flow Configuration window, go to the Controller Services tab and click +.

  3. Select the desired service from the list and create an instance of the service in the window that opens.

Connect to Greenplum (ADB)

  1. To connect to the Greenplum table, create a PutGreenplumRecord processor, open its configuration, and specify parameters related to the Greenplum table.

    PutGreenplumRecord processor configuration
    PutGreenplumRecord processor configuration
    PutGreenplumRecord processor configuration
    PutGreenplumRecord processor configuration
  2. Go to the Gpfdist Service parameter value field, select Create new service…​ in the pop-up list, and create an instance of the StandartGpfdistService service in the window that opens.

    Creating an instance of the StandartGpfdistService service
    Creating an instance of the StandartGpfdistService service
    Creating an instance of the StandartGpfdistService service
    Creating an instance of the StandartGpfdistService service
  3. After saving the created instance, click arrow2 light. In the next NiFi Flow Configuration → Controller Services window, open the service configuration and specify necessary parameters.

    StandartGpfdistService service configuration
    StandartGpfdistService service configuration
    StandartGpfdistService service configuration
    StandartGpfdistService service configuration
  4. Go to the Database Connection Pooling Service parameter value field, select Create new service…​ in the pop-up list and in the opened window, create an instance of the DBCPConnectionPool service to create a connection to Greenplum.

  5. After saving the created instance, click arrow2 light and in the opened NiFi Flow Configuration → Controller Services window, open the service configuration and specify parameters related to the ADB database.

    GreenplumDBCPConnectionPool service configuration
    GreenplumDBCPConnectionPool service configuration
    GreenplumDBCPConnectionPool service configuration
    GreenplumDBCPConnectionPool service configuration
  6. Close the NiFi Flow Configuration → Controller Services window and go back to the PutGreenplumRecord processor configuration. In the Record Reader parameter value field, create an instance of the AvroReader service to read records from PostgreSQL in Avro format with built-in schema.

If you are using multiple PutGreenplumRecord processors to transfer data from different sources to a single ADB database, for the Gpfdist Service value, select the same StandartGpfdistService you created.

After all services are created, they are displayed on the NiFi Flow Configuration → Controller Services page.

Created services
Created services
Created services
Created services

The StandartGpfdistService service fails until its associated GreenplumDBCPConnectionPool service starts.

Start a data flow

Create and configure a connection between processors.

Created and connected processors
Created and connected processors
Created and connected processors
Created and connected processors

Processors are displayed with errors because the services associated with them are not running.

To start the flow, perform the following:

  1. Start the services one by one on the NiFi Flow Configuration → Controller Services page by clicking on the icon nifi ui oper 02.

  2. Run the created data stream.

Using queries to the ADB database, you can read the received data.

Found a mistake? Seleсt text and press Ctrl+Enter to report it