ADB ClickHouse Connector usage examples

This article describes how to transfer data between ADB and a distributed table in ADQM via ADB ClickHouse Connector. The following prerequisites are met:

  • The ADQM cluster is installed according to the Online installation guide on the following hosts: dev-adqm-01, dev-adqm-02, dev-adqm-03, dev-adqm-04.

  • The default database exists in ADQM.

  • The default user exists in ADQM.

  • The default_cluster cluster is defined in the remote_servers section of the config.xml file in ADQM. For more information, see Typical cluster in the ADQM documentation.

  • All ADB nodes have access to the 8123 HTTP port on all ADQM nodes through which you plan to load data.

Insert data

To check that you can add data to ADQM from ADB, follow the steps:

  1. Connect to the default database of the ADQM cluster under the default user (for example, via clickhouse-client). Create the following tables in the given order.

    • The test_local local table:

      CREATE TABLE test_local ON CLUSTER default_cluster
      (
          id Int32,
          value_string String
      )
      ENGINE = ReplicatedMergeTree('/clickhouse/test/shard-{shard}', 'replica-{replica}')
      ORDER BY id;
    • The test_distr distributed table:

      CREATE TABLE test_distr ON CLUSTER default_cluster AS default.test_local
      ENGINE = Distributed(default_cluster, default, test_local, rand());
  2. Connect to the adb database of the ADB cluster under the gpadmin user (for example, via psql). Create the following objects in the given order.

    • The clickhouse_server server:

      CREATE SERVER clickhouse_server
      FOREIGN DATA WRAPPER tkh_fdw
      OPTIONS (
      	database 'default',
      	hosts 'dev-adqm-01:8123, dev-adqm-02:8123, dev-adqm-03:8123, dev-adqm-04:8123',
      	distribution_type 'random',
      	lines_batch_size '100000',
      	send_compressed 'true',
      	send_delay '300',
      	insert_distributed_sync 'true',
      	use_staging 'true',
      	staging_table_name_format '$_tmp_$');
    • The user mapping for the gpadmin user:

      CREATE USER MAPPING FOR gpadmin
      SERVER clickhouse_server
      OPTIONS (clickhouse_username 'default', clickhouse_password '');
    • The tkhemali_test foreign table:

      CREATE FOREIGN TABLE tkhemali_test (id INT, value_string TEXT)
      SERVER clickhouse_server
      OPTIONS (resource 'test_distr',
      	hosts 'dev-adqm-01:8123',
              cluster 'default_cluster');
      IMPORTANT

      Since the cluster option is defined, the work will be essentially produced with the hosts obtained from the ADQM cluster metadata. Due to this, remember to overwrite the hosts option value defined for clickhouse_server at the server level and leave only one host in the option value at the table level.

  3. In ADB, insert some data into the tkhemali_test foreign table:

    INSERT INTO tkhemali_test VALUES
    (1, 'test1'),
    (2, 'test2'),
    (3, 'test3'),
    (4, 'test4'),
    (5, 'test5'),
    (6, 'test6'),
    (7, 'test7'),
    (8,'test8'),
    (9,'test9'),
    (10,'test10');

    The result:

    WARNING:  skipping "themali_test" --- cannot analyze this foreign table
    INSERT 0 10
  4. In ADQM, check that data from ADB has been successfully transferred:

    SELECT * FROM test_distr;

    The result:

    ┌─id─┬─value_string─┐
    │  2 │ test2        │
    │  5 │ test5        │
    │  8 │ test8        │
    │ 10 │ test10       │
    └────┴──────────────┘
    ┌─id─┬─value_string─┐
    │  1 │ test1        │
    │  3 │ test3        │
    │  4 │ test4        │
    │  6 │ test6        │
    │  7 │ test7        │
    │  9 │ test9        │
    └────┴──────────────┘
    10 rows in set. Elapsed: 0.005 sec.

Select data

NOTE

ADB ClickHouse Connector supports the ability to select data from ADQM/ClickHouse to ADB starting with ADB 6.24.3.47. However, to use a foreign table in ADB for reading, you should specify for this table column names that match exactly the corresponding names in a source table in ADQM/ClickHouse.

Correct example

To check that ADQM data can be read from ADB, do the following:

  1. Perform steps 1-2 that are listed above (if you have not passed them yet).

  2. Delete data from the test_local table in ADQM:

    ALTER TABLE default.test_local ON CLUSTER default_cluster DELETE WHERE true;
  3. In ADQM, add some data to the test_distr table:

    INSERT INTO test_distr VALUES
    (1, 'test1'),
    (2, 'test2'),
    (3, 'test3'),
    (4, 'test4'),
    (5, 'test5'),
    (6, 'test6'),
    (7, 'test7'),
    (8,'test8'),
    (9,'test9'),
    (10,'test10');

    The result:

    Query id: 430a158f-1374-470d-a8c2-7b4dabdefbc7
    
    Ok.
    
    10 rows in set. Elapsed: 0.048 sec.
  4. Check that data added to ADQM can be successfully selected in ADB:

    SELECT * FROM tkhemali_test ORDER BY id;

    The result:

     id | value_string
    ----+--------------
      1 | test1
      2 | test2
      3 | test3
      4 | test4
      5 | test5
      6 | test6
      7 | test7
      8 | test8
      9 | test9
     10 | test10
    (10 rows)

Error example

The following example shows that reading data from ADQM/ClickHouse to ADB is impossible if column names in the foreign ADB table and the source ADQM/ClickHouse table do not match:

  1. Perform all steps from the previous example with correct data reading (if you have not passed them yet).

  2. Ensure that reading data based on the current foreign table structure is successful in ADB:

    SELECT * FROM tkhemali_test ORDER BY id;

    The result:

     id | value_string
    ----+--------------
      1 | test1
      2 | test2
      3 | test3
      4 | test4
      5 | test5
      6 | test6
      7 | test7
      8 | test8
      9 | test9
     10 | test10
    (10 rows)
  3. Rename some columns of the tkhemali_test table in ADB:

    ALTER FOREIGN TABLE tkhemali_test RENAME value_string TO wrong_string;
  4. Check again the ability to select data from tkhemali_test in ADB:

    SELECT * FROM tkhemali_test ORDER BY id;

    The following error is displayed:

    ERROR:  CH easy datum. HTTP error: 404; CH message: Code: 47. DB::Exception: Missing columns: 'wrong_string' while processing query: 'SELECT id, wrong_string FROM default.test_local', required columns: 'id' 'wrong_string', maybe you meant: 'id'. (UNKNOWN_IDENTIFIER) (version 23.3.2.37) (ch_easy_datum.c:142)  (seg0 slice1 10.92.17.182:10000 pid=6414) (ch_easy_datum.c:142)
NOTE

If names are the same only for part of columns in the foreign table in ADB and the source table in ADQM/ClickHouse — the SQL query succeeds while the resulting dataset uses only the specified columns.

Found a mistake? Seleсt text and press Ctrl+Enter to report it