ADB ClickHouse Connector usage examples

This article describes how to transfer data between ADB and a distributed table in ADQM via ADB ClickHouse Connector. The following prerequisites are met:

  • The ADQM cluster is installed according to the Online installation guide on the following hosts: dev-adqm-01, dev-adqm-02, dev-adqm-03, dev-adqm-04.

  • The default database exists in ADQM.

  • The default user exists in ADQM.

  • The default_cluster cluster is defined in the remote_servers section of the config.xml file in ADQM. For more information, see Typical cluster in the ADQM documentation.

  • All ADB nodes have access to the 8123 HTTP port on all ADQM nodes through which you plan to load data.

Insert data

To check that you can add data to ADQM from ADB, follow the steps:

  1. Connect to the default database of the ADQM cluster under the default user (for example, via clickhouse-client). Create the following tables in the given order.

    • The test_local local table:

      CREATE TABLE test_local ON CLUSTER default_cluster
      (
          id Int32,
          value_string String
      )
      ENGINE = ReplicatedMergeTree('/clickhouse/test/shard-{shard}', 'replica-{replica}')
      ORDER BY id;
    • The test_distr distributed table:

      CREATE TABLE test_distr ON CLUSTER default_cluster AS default.test_local
      ENGINE = Distributed(default_cluster, default, test_local, rand());
  2. Connect to the adb database of the ADB cluster under the gpadmin user (for example, via psql). Create the following objects in the given order.

    • The clickhouse_server server:

      CREATE SERVER clickhouse_server
      FOREIGN DATA WRAPPER tkh_fdw
      OPTIONS (
      	database 'default',
      	hosts 'dev-adqm-01:8123, dev-adqm-02:8123, dev-adqm-03:8123, dev-adqm-04:8123',
      	distribution_type 'random',
      	lines_batch_size '100000',
      	send_compressed 'true',
      	send_delay '300',
      	insert_distributed_sync 'true',
      	use_staging 'true',
      	staging_table_name_format '$_tmp_$');
    • The user mapping for the gpadmin user:

      CREATE USER MAPPING FOR gpadmin
      SERVER clickhouse_server
      OPTIONS (clickhouse_username 'default', clickhouse_password '');
    • The tkhemali_test foreign table:

      CREATE FOREIGN TABLE tkhemali_test (id INT, value_string TEXT)
      SERVER clickhouse_server
      OPTIONS (resource 'test_distr',
      	hosts 'dev-adqm-01:8123',
              cluster 'default_cluster');
      IMPORTANT

      Since the cluster option is defined, the work will be essentially produced with the hosts obtained from the ADQM cluster metadata. Due to this, remember to overwrite the hosts option value defined for clickhouse_server at the server level and leave only one host in the option value at the table level.

  3. In ADB, insert some data into the tkhemali_test foreign table:

    INSERT INTO tkhemali_test VALUES
    (1, 'test1'),
    (2, 'test2'),
    (3, 'test3'),
    (4, 'test4'),
    (5, 'test5'),
    (6, 'test6'),
    (7, 'test7'),
    (8,'test8'),
    (9,'test9'),
    (10,'test10');

    Result:

    WARNING:  skipping "themali_test" --- cannot analyze this foreign table
    INSERT 0 10
  4. In ADQM, check that data from ADB has been successfully transferred:

    SELECT * FROM test_distr;

    Result:

    ┌─id─┬─value_string─┐
    │  2 │ test2        │
    │  5 │ test5        │
    │  8 │ test8        │
    │ 10 │ test10       │
    └────┴──────────────┘
    ┌─id─┬─value_string─┐
    │  1 │ test1        │
    │  3 │ test3        │
    │  4 │ test4        │
    │  6 │ test6        │
    │  7 │ test7        │
    │  9 │ test9        │
    └────┴──────────────┘
    10 rows in set. Elapsed: 0.005 sec.

Select data

NOTE

ADB ClickHouse Connector supports the ability to select data from ADQM/ClickHouse to ADB 6 starting with 6.24.3.47. However, to use a foreign table in ADB for reading, you should specify column names for this table that exactly match the corresponding names in a source table in ADQM/ClickHouse.

Correct example

To check that ADQM data can be read from ADB, do the following:

  1. Perform steps 1—​2 that are listed above (if you have not completed them yet).

  2. Delete data from the test_local table in ADQM:

    ALTER TABLE default.test_local ON CLUSTER default_cluster DELETE WHERE true;
  3. In ADQM, add some data to the test_distr table:

    INSERT INTO test_distr VALUES
    (1, 'test1'),
    (2, 'test2'),
    (3, 'test3'),
    (4, 'test4'),
    (5, 'test5'),
    (6, 'test6'),
    (7, 'test7'),
    (8,'test8'),
    (9,'test9'),
    (10,'test10');

    Result:

    Query id: 430a158f-1374-470d-a8c2-7b4dabdefbc7
    
    Ok.
    
    10 rows in set. Elapsed: 0.048 sec.
  4. Check that data added to ADQM can be successfully selected in ADB:

    SELECT * FROM tkhemali_test ORDER BY id;

    Result:

     id | value_string
    ----+--------------
      1 | test1
      2 | test2
      3 | test3
      4 | test4
      5 | test5
      6 | test6
      7 | test7
      8 | test8
      9 | test9
     10 | test10
    (10 rows)

Error example

The following example shows that reading data from ADQM/ClickHouse to ADB is impossible if column names in the foreign ADB table and the source ADQM/ClickHouse table do not match:

  1. Perform all steps from the previous example with correct data reading (if you have not passed them yet).

  2. Ensure that reading data based on the current foreign table structure is successful in ADB:

    SELECT * FROM tkhemali_test ORDER BY id;

    Result:

     id | value_string
    ----+--------------
      1 | test1
      2 | test2
      3 | test3
      4 | test4
      5 | test5
      6 | test6
      7 | test7
      8 | test8
      9 | test9
     10 | test10
    (10 rows)
  3. Rename some columns of the tkhemali_test table in ADB:

    ALTER FOREIGN TABLE tkhemali_test RENAME value_string TO wrong_string;
  4. Check again the ability to select data from tkhemali_test in ADB:

    SELECT * FROM tkhemali_test ORDER BY id;

    The following error is displayed:

    ERROR:  CH easy datum. HTTP error: 404; CH message: Code: 47. DB::Exception: Missing columns: 'wrong_string' while processing query: 'SELECT id, wrong_string FROM default.test_local', required columns: 'id' 'wrong_string', maybe you meant: 'id'. (UNKNOWN_IDENTIFIER) (version 23.3.2.37) (ch_easy_datum.c:142)  (seg0 slice1 192.0.2.182:10000 pid=6414) (ch_easy_datum.c:142)
NOTE

If names are the same only for part of columns in the foreign table in ADB and the source table in ADQM/ClickHouse — the SQL query succeeds while the resulting dataset uses only the specified columns.

Use SSL/TLS

NOTE

Prior to ADB 6.30.0, SSL was only supported for INSERT operations in ADB ClickHouse Connector. Starting with 6.30.0, you can also use SSL with SELECT.

Before using ADB ClickHouse Connector with SSL, follow the steps described in Insert data to try out a regular connection and to have a local table created on the ClickHouse side and a foreign server and a user mapping on the ADB side.

Set up SSL/TLS in ClickHouse

For information on how to configure TLS on the server side, refer to the ClickHouse documentation. If you use ADQM, refer to the ADQM documentation.

For demonstration purposes, the following commands create a server certificate signed by a self‑created Certificate Authority (CA). For production, use certificates signed by a trusted CA.

  1. Generate a private key and save it to a ca.key file:

    $ openssl genrsa -out ca.key 2048
  2. Create the self‑signed CA certificate:

    $ openssl req -x509 -subj "/CN=MyClickHouseCA" -nodes -key ca.key -days 1095 -out ca.crt
  3. Generate a server private key and a Certificate Signing Request (CSR). In the Common Name (CN) field, provide the ClickHouse server address that ADB hosts can access (val-adqm-1.ru-central1.internal in this example). The same address will be used as the target host in foreign ADB tables:

    $ openssl req -newkey rsa:2048 -nodes \
      -subj "/CN=val-adqm-1.ru-central1.internal" \
      -addext "subjectAltName = DNS:val-adqm-1.ru-central1.internal" \
      -keyout server.key -out server.csr
  4. Sign the server CSR with the CA to obtain the server certificate:

    $ openssl x509 -req -in server.csr -CA ca.crt -CAkey ca.key -CAcreateserial -out server.crt -days 365
  5. Move the server certificate and key to a directory accessible by ClickHouse:

    $ mkdir /etc/clickhouse-server/certs
    $ mv ca.crt server.crt server.key /etc/clickhouse-server/certs/
  6. Update the certificate permissions:

    $ chown -R clickhouse:clickhouse /etc/clickhouse-server/certs
    $ chmod 600 /etc/clickhouse-server/certs/server.key
    $ chmod 644 /etc/clickhouse-server/certs/server.crt /etc/clickhouse-server/certs/ca.crt
  7. Create an SSL configuration file:

    $ vi /etc/clickhouse-server/config.d/ssl.xml
  8. Specify the port to be used for HTTPS connections and the path to the created certificate and key:

    <clickhouse>
        <https_port>8443</https_port>
        <openSSL>
            <server>
                <certificateFile>/etc/clickhouse-server/certs/server.crt</certificateFile>
                <privateKeyFile>/etc/clickhouse-server/certs/server.key</privateKeyFile>
            </server>
        </openSSL>
    </clickhouse>
  9. Restart ClickHouse:

    $ sudo systemctl restart clickhouse-server
  10. Copy the ca.crt file to the ADB master host and every segment host.

Before using ADB ClickHouse Connector with SSL, you can use cURL to verify that the above configuration is correct by establishing an HTTPS channel between an ADB host and the ClickHouse server. On any ADB host, run the following command:

$ curl --cacert /home/gpadmin/certs/ca.crt -u default: 'https://val-adqm-1.ru-central1.internal:8443/?query=SELECT%20version()'

The server should return the ClickHouse version, for example:

25.8.16.34

SSL without certificate validation

With the following setup, TLS encryption is enabled for data transmission between ADB and ClickHouse, but the connector does not validate the server certificate.

  1. In ADB, create a foreign table with ssl=true and sslmode=none in clickhouse_properties:

    CREATE FOREIGN TABLE tkhemali_sslmode_none (id INT, value_string TEXT)
    SERVER clickhouse_server
    OPTIONS (
        resource 'test_local',
        hosts 'val-adqm-1.ru-central1.internal:8443',
        clickhouse_properties 'ssl=true;sslmode=none'
    );
  2. To verify that INSERT commands are using SSL, add some data to the test_local table using the created foreign table:

    INSERT INTO tkhemali_sslmode_none (id, value_string) VALUES
        (1, 'ssl none test 1'),
        (2, 'ssl none test 2'),
        (3, 'ssl none test 3');

    Result:

    WARNING:  skipping "tkhemali_sslmode_none" --- cannot analyze this foreign table
    INSERT 0 3
  3. To verify that SELECT commands are using SSL, query the added data:

    SELECT * FROM tkhemali_sslmode_none;

    Result:

     id |  value_string
    ----+-----------------
      1 | ssl none test 1
      2 | ssl none test 2
      3 | ssl none test 3
    (3 rows)

SSL with certificate verification

With this setup, TLS encryption is enabled and the connector performs certificate validation.

  1. Create a foreign table with clickhouse_properties specifying ssl=true, sslmode=strict, and sslrootcert that points to a certificate file:

    CREATE FOREIGN TABLE tkhemali_sslmode_strict (id INT, value_string TEXT)
    SERVER clickhouse_server
    OPTIONS (
        resource 'test_local',
        hosts 'val-adqm-1.ru-central1.internal:8443',
        clickhouse_properties 'ssl=true;sslmode=strict;sslrootcert=/home/gpadmin/certs/ca.crt'
    );
  2. To verify that INSERT commands are using SSL, add some data to the test_local table using the created foreign table:

    INSERT INTO tkhemali_sslmode_strict (id, value_string) VALUES
        (1, 'strict test 1'),
        (2, 'strict test 2'),
        (3, 'strict test 3');

    Result:

    WARNING:  skipping "tkhemali_sslmode_strict" --- cannot analyze this foreign table
    INSERT 0 3
  3. To verify that SELECT commands are using SSL, query the added data:

    SELECT * FROM tkhemali_sslmode_strict ORDER BY value_string DESC LIMIT 3;

    Result:

     id | value_string
    ----+---------------
      3 | strict test 3
      2 | strict test 2
      1 | strict test 1
    (3 rows)
Found a mistake? Seleсt text and press Ctrl+Enter to report it