ADB ClickHouse connector installation

Currently, ADB ClickHouse connector can be installed only manually. You should install RPM packages and create corresponding extensions in the ADB database. Follow the steps:

  1. Make sure the PXF service is installed. For information on how to add services to ADB clusters, see Add services.

  2. Get tkh-connector, gppxf-fdw, and tkh-fdw RPM packages from the Arenadata support team. Package names should contain a specific version number, in the example below they are listed without versions for simplification. Install RPM packages on all cluster hosts. Below is an example of installation via the YUM package manager:

    • tkh-connector. If you already use Tkhemali connector 1.X (the Tkhemali service is installed in your cluster), you do not need to install this package.

      $ sudo yum install -y tkh-connector.rpm
    • gppxf-fdw

      $ sudo yum install -y gppxf-fdw.rpm
    • tkh-fdw

      $ sudo yum install -y tkh-fdw.rpm
  3. Add the new custom profile Tkh_clickhouse:text to the PXF configuration file /var/lib/pxf/conf/pxf-profiles.xml on Master. To edit the file, you can use the vi or vim command:

    $ sudo vi /var/lib/pxf/conf/pxf-profiles.xml

    The content of the changed file can look as follows. In our case, the file does not contain other profiles:

    <profiles>
        <profile>
            <name>Tkh_clickhouse:text</name>
            <description>Clickhouse</description>
            <plugins>
                <accessor>io.arenadata.tkh.pxf.TkhFdwAccessor</accessor>
                <resolver>io.arenadata.tkh.pxf.TkhResolver</resolver>
            </plugins>
        </profile>
    </profiles>
    NOTE

    The directory, where PXF configuration files are located, is defined by the $PXF_BASE environment variable. The default value is /var/lib/pxf. The path may differ in your environment.

  4. Login under the default user name gpadmin. All the commands listed in the subsequent steps should be performed on Master:

    $ sudo su - gpadmin
  5. Synchronize the PXF configuration to all hosts in the ADB cluster:

    $ pxf cluster sync

    The result:

    Syncing PXF configuration files from master host to standby master host and 2 segment hosts...
    PXF configs synced successfully on 3 out of 3 hosts
  6. Restart PXF:

    $ pxf cluster restart

    The result:

    Restarting PXF on master host, standby master host, and 2 segment hosts...
    PXF restarted successfully on 4 out of 4 hosts
  7. Connect to the database via psql (or any other client program):

    $ psql adb
  8. Create two new extensions:

    • pxf_fdw

      CREATE EXTENSION pxf_fdw;
    • tkh_fdw

      CREATE EXTENSION tkh_fdw;

      In both cases, the result should be similar to:

      CREATE EXTENSION
  9. To check the existence of new extensions, you can use the psql meta-command \dx:

    \dx

    The output contains new extensions pxf_fdw and tkh_fdw:

                                          List of installed extensions
       Name    | Version |   Schema   |                             Description
    -----------+---------+------------+---------------------------------------------------------------------
     dblink    | 1.1     | public     | connect to other PostgreSQL databases from within a database
     diskquota | 2.2     | public     | Disk Quota Main Program
     gptkh     | 0.14    | public     | gptkh
     kadb_fdw  | 0.16    | kadb       | Kafka-ADB foreign data wrapper
     plpgsql   | 1.0     | pg_catalog | PL/pgSQL procedural language
     postgis   | 2.5.4   | public     | PostGIS geometry, geography, and raster spatial types and functions
     pxf       | 2.0     | public     | Extension which allows to access unmanaged data
     pxf_fdw   | 1.0     | public     | PXF Foreign Data Wrapper for Greenplum
     tkh_fdw   | 1.0     | public     | ClickHouse Foreign Data Wrapper for Greenplum
    (9 rows)
  10. You can also check that the new foreign data wrapper tkh_fdw is available using the following SQL query against the pg_catalog.pg_foreign_data_wrapper system table:

    SELECT * FROM pg_catalog.pg_foreign_data_wrapper;

    The result:

        fdwname    | fdwowner | fdwhandler | fdwvalidator | fdwacl |                                         fdwoptions
    ---------------+----------+------------+--------------+--------+--------------------------------------------------------------------------------------------
     dblink_fdw    |       10 |          0 |        16480 |        |
     kadb_fdw      |       10 |      20308 |        20310 |        | {"mpp_execute=all segments",k_initial_offset=0,k_automatic_offsets=true,k_latency_ms=2000}
     jdbc_pxf_fdw  |       10 |      20377 |        20378 |        | {protocol=jdbc,"mpp_execute=all segments"}
     hdfs_pxf_fdw  |       10 |      20377 |        20378 |        | {protocol=hdfs,"mpp_execute=all segments"}
     hive_pxf_fdw  |       10 |      20377 |        20378 |        | {protocol=hive,"mpp_execute=all segments"}
     hbase_pxf_fdw |       10 |      20377 |        20378 |        | {protocol=hbase,"mpp_execute=all segments"}
     s3_pxf_fdw    |       10 |      20377 |        20378 |        | {protocol=s3,"mpp_execute=all segments"}
     gs_pxf_fdw    |       10 |      20377 |        20378 |        | {protocol=gs,"mpp_execute=all segments"}
     adl_pxf_fdw   |       10 |      20377 |        20378 |        | {protocol=adl,"mpp_execute=all segments"}
     wasbs_pxf_fdw |       10 |      20377 |        20378 |        | {protocol=wasbs,"mpp_execute=all segments"}
     file_pxf_fdw  |       10 |      20377 |        20378 |        | {protocol=file,"mpp_execute=all segments"}
     tkh_fdw       |       10 |      20392 |        20391 |        | {protocol=tkh_clickhouse,"mpp_execute=all segments",format=text}
    (12 rows)
Found a mistake? Seleсt text and press Ctrl+Enter to report it