ADB ClickHouse connector usage example

This article describes how to transfer data from ADB to the distributed table in ADQM via ADB ClickHouse connector. The following prerequisites are met:

  • The ADQM cluster is installed according to the Online installation guide on the following hosts: dev-adqm-01, dev-adqm-02, dev-adqm-03, dev-adqm-04.

  • The default database exists in ADQM.

  • The default user exists in ADQM.

  • The default_cluster cluster is defined in the remote_servers section of the config.xml file in ADQM. For more information, see Typical cluster in the ADQM documentation.

  • All ADB nodes have access to the 8123 HTTP port on all ADQM nodes through which you plan to load data.

 
Follow the steps:

  1. Connect to the default database of the ADQM cluster under the default user (for example, via clickhouse-client). Create the following tables in the given order.

    • The test_local local table:

      CREATE TABLE test_local ON CLUSTER default_cluster
      (
          id Int32,
          value_string String
      )
      ENGINE = ReplicatedMergeTree('/clickhouse/test/shard-{shard}', 'replica-{replica}')
      ORDER BY id;
    • The test_distr distributed table:

      CREATE TABLE test_distr ON CLUSTER default_cluster AS default.test_local
      ENGINE = Distributed(default_cluster, default, test_local, rand());
  2. Connect to the adb database of the ADB cluster under the gpadmin user (for example, via psql). Create the following objects in the given order.

    • The clickhouse_server server:

      CREATE SERVER clickhouse_server
      FOREIGN DATA WRAPPER tkh_fdw
      OPTIONS (
      	database 'default',
      	hosts 'dev-adqm-01:8123, dev-adqm-02:8123, dev-adqm-03:8123, dev-adqm-04:8123',
      	distribution_type 'random',
      	lines_batch_size '100000',
      	send_compressed 'true',
      	send_delay '300',
      	insert_distributed_sync 'true',
      	use_staging 'true',
      	staging_table_name_format '$_tmp_$');
    • The user mapping for the gpadmin user:

      CREATE USER MAPPING FOR gpadmin
      SERVER clickhouse_server
      OPTIONS (clickhouse_username 'default', clickhouse_password '');
    • The tkhemali_test foreign table:

      CREATE FOREIGN TABLE tkhemali_test (id INT, value_string TEXT)
      SERVER clickhouse_server
      OPTIONS (resource 'test_distr',
      	hosts 'dev-adqm-01:8123',
              cluster 'default_cluster');
      IMPORTANT

      Since the cluster option is defined, the work will be essentially produced with the hosts obtained from the ADQM cluster metadata. Due to this, remember to overwrite the hosts option value defined for clickhouse_server at the server level and leave only one host in the option value at the table level.

  3. In ADB, insert some data into the tkhemali_test foreign table:

    INSERT INTO tkhemali_test VALUES
    (1, 'test1'),
    (2, 'test2'),
    (3, 'test3'),
    (4, 'test4'),
    (5, 'test5'),
    (6, 'test6'),
    (7, 'test7'),
    (8,'test8'),
    (9,'test9'),
    (10,'test10');

    The result:

    WARNING:  skipping "themali_test" --- cannot analyze this foreign table
    INSERT 0 10
  4. In ADQM, check that data from ADB has been successfully transferred:

    SELECT * FROM test_distr;

    The result:

    ┌─id─┬─value_string─┐
    │  2 │ test2        │
    │  5 │ test5        │
    │  8 │ test8        │
    │ 10 │ test10       │
    └────┴──────────────┘
    ┌─id─┬─value_string─┐
    │  1 │ test1        │
    │  3 │ test3        │
    │  4 │ test4        │
    │  6 │ test6        │
    │  7 │ test7        │
    │  9 │ test9        │
    └────┴──────────────┘
    10 rows in set. Elapsed: 0.005 sec.
Found a mistake? Seleсt text and press Ctrl+Enter to report it