ADB ClickHouse Connector usage examples
This article describes how to transfer data between ADB and a distributed table in ADQM via ADB ClickHouse Connector. The following prerequisites are met:
-
The ADQM cluster is installed according to the Online installation guide on the following hosts: dev-adqm-01, dev-adqm-02, dev-adqm-03, dev-adqm-04.
-
The
default
database exists in ADQM. -
The
default
user exists in ADQM. -
The
default_cluster
cluster is defined in theremote_servers
section of the config.xml file in ADQM. For more information, see Typical cluster in the ADQM documentation. -
All ADB nodes have access to the
8123
HTTP port on all ADQM nodes through which you plan to load data.
Insert data
To check that you can add data to ADQM from ADB, follow the steps:
-
Connect to the
default
database of the ADQM cluster under thedefault
user (for example, via clickhouse-client). Create the following tables in the given order.-
The
test_local
local table:CREATE TABLE test_local ON CLUSTER default_cluster ( id Int32, value_string String ) ENGINE = ReplicatedMergeTree('/clickhouse/test/shard-{shard}', 'replica-{replica}') ORDER BY id;
-
The
test_distr
distributed table:CREATE TABLE test_distr ON CLUSTER default_cluster AS default.test_local ENGINE = Distributed(default_cluster, default, test_local, rand());
-
-
Connect to the
adb
database of the ADB cluster under thegpadmin
user (for example, via psql). Create the following objects in the given order.-
The
clickhouse_server
server:CREATE SERVER clickhouse_server FOREIGN DATA WRAPPER tkh_fdw OPTIONS ( database 'default', hosts 'dev-adqm-01:8123, dev-adqm-02:8123, dev-adqm-03:8123, dev-adqm-04:8123', distribution_type 'random', lines_batch_size '100000', send_compressed 'true', send_delay '300', insert_distributed_sync 'true', use_staging 'true', staging_table_name_format '$_tmp_$');
-
The user mapping for the
gpadmin
user:CREATE USER MAPPING FOR gpadmin SERVER clickhouse_server OPTIONS (clickhouse_username 'default', clickhouse_password '');
-
The
tkhemali_test
foreign table:CREATE FOREIGN TABLE tkhemali_test (id INT, value_string TEXT) SERVER clickhouse_server OPTIONS (resource 'test_distr', hosts 'dev-adqm-01:8123', cluster 'default_cluster');
IMPORTANTSince the
cluster
option is defined, the work will be essentially produced with the hosts obtained from the ADQM cluster metadata. Due to this, remember to overwrite thehosts
option value defined forclickhouse_server
at the server level and leave only one host in the option value at the table level.
-
-
In ADB, insert some data into the
tkhemali_test
foreign table:INSERT INTO tkhemali_test VALUES (1, 'test1'), (2, 'test2'), (3, 'test3'), (4, 'test4'), (5, 'test5'), (6, 'test6'), (7, 'test7'), (8,'test8'), (9,'test9'), (10,'test10');
The result:
WARNING: skipping "themali_test" --- cannot analyze this foreign table INSERT 0 10
-
In ADQM, check that data from ADB has been successfully transferred:
SELECT * FROM test_distr;
The result:
┌─id─┬─value_string─┐ │ 2 │ test2 │ │ 5 │ test5 │ │ 8 │ test8 │ │ 10 │ test10 │ └────┴──────────────┘ ┌─id─┬─value_string─┐ │ 1 │ test1 │ │ 3 │ test3 │ │ 4 │ test4 │ │ 6 │ test6 │ │ 7 │ test7 │ │ 9 │ test9 │ └────┴──────────────┘ 10 rows in set. Elapsed: 0.005 sec.
Select data
NOTE
ADB ClickHouse Connector supports the ability to select data from ADQM/ClickHouse to ADB 6 starting with 6.24.3.47. However, to use a foreign table in ADB for reading, you should specify for this table column names that match exactly the corresponding names in a source table in ADQM/ClickHouse. |
Correct example
To check that ADQM data can be read from ADB, do the following:
-
Perform steps 1-2 that are listed above (if you have not passed them yet).
-
Delete data from the
test_local
table in ADQM:ALTER TABLE default.test_local ON CLUSTER default_cluster DELETE WHERE true;
-
In ADQM, add some data to the
test_distr
table:INSERT INTO test_distr VALUES (1, 'test1'), (2, 'test2'), (3, 'test3'), (4, 'test4'), (5, 'test5'), (6, 'test6'), (7, 'test7'), (8,'test8'), (9,'test9'), (10,'test10');
The result:
Query id: 430a158f-1374-470d-a8c2-7b4dabdefbc7 Ok. 10 rows in set. Elapsed: 0.048 sec.
-
Check that data added to ADQM can be successfully selected in ADB:
SELECT * FROM tkhemali_test ORDER BY id;
The result:
id | value_string ----+-------------- 1 | test1 2 | test2 3 | test3 4 | test4 5 | test5 6 | test6 7 | test7 8 | test8 9 | test9 10 | test10 (10 rows)
Error example
The following example shows that reading data from ADQM/ClickHouse to ADB is impossible if column names in the foreign ADB table and the source ADQM/ClickHouse table do not match:
-
Perform all steps from the previous example with correct data reading (if you have not passed them yet).
-
Ensure that reading data based on the current foreign table structure is successful in ADB:
SELECT * FROM tkhemali_test ORDER BY id;
The result:
id | value_string ----+-------------- 1 | test1 2 | test2 3 | test3 4 | test4 5 | test5 6 | test6 7 | test7 8 | test8 9 | test9 10 | test10 (10 rows)
-
Rename some columns of the
tkhemali_test
table in ADB:ALTER FOREIGN TABLE tkhemali_test RENAME value_string TO wrong_string;
-
Check again the ability to select data from
tkhemali_test
in ADB:SELECT * FROM tkhemali_test ORDER BY id;
The following error is displayed:
ERROR: CH easy datum. HTTP error: 404; CH message: Code: 47. DB::Exception: Missing columns: 'wrong_string' while processing query: 'SELECT id, wrong_string FROM default.test_local', required columns: 'id' 'wrong_string', maybe you meant: 'id'. (UNKNOWN_IDENTIFIER) (version 23.3.2.37) (ch_easy_datum.c:142) (seg0 slice1 10.92.17.182:10000 pid=6414) (ch_easy_datum.c:142)
NOTE
If names are the same only for part of columns in the foreign table in ADB and the source table in ADQM/ClickHouse — the SQL query succeeds while the resulting dataset uses only the specified columns. |