ADB ClickHouse Connector overview
Unlike the previous version Tkhemali Connector 1.X based on the mechanism of external tables, ADB ClickHouse Connector is implemented on the basis of a foreign data wrapper and foreign tables. Consequently, the new version of the ADQM/ClickHouse connector has the following features:
-
More convenient and secure user management. While in the previous connector version you had to write user credentials in the
LOCATION
option at the level of each external table or in a separate file explicitly, in ADB ClickHouse Connector the user login and password are stored in a separate ADB database object user mapping. It is enough to create a user mapping object once for one ClickHouse cluster with which you plan to connect. If necessary, the data can be modified later via theALTER USER MAPPING
command. -
Multi-level connection settings. General settings that will be the same for all or most ClickHouse target tables (load distribution strategy, need to use a staging layer, restrictions for batch data, and so on) can be specified at the server object level, and private settings (e.g. specific table name) — at the foreign table level. In the previous version of the connector, all settings had to be specified at the external table level.
-
New load balancing mechanism. In the new connector version, the list of ClickHouse hosts (through which you plan to load data from ADB) is specified in the
hosts
option. The separate optiondistribution_type
is used to declare the load distribution strategy. Two types of distribution are supported now: random and round-robin. The second method is preferable in terms of uniformity of load. At the same time, when writing to distributed tables, it is possible to specify the ClickHouse cluster name in thecluster
option at the foreign table level. In this case, it is not necessary to define explicitly all hosts: one entry point is sufficient, and the full list of hosts will be automatically obtained from the cluster metadata. In the previous version of the connector, it was also possible to get a list of hosts for distributed tables, but the load on ClickHouse cluster hosts could be uneven. -
Flexible management of custom settings. Custom settings of connection and execution of queries to ClickHouse can now be defined in the
clickhouse_properties
option in the following format:<custom_option>=<value>;[…]
. To change the settings, it is enough to execute theALTER SERVER
or theALTER FOREIGN TABLE
command. In the previous version of the connector, the custom settings for ClickHouse were stored in the XML file that needed to be copied to all ADB segment hosts. -
More control over the resource release. A foreign data wrapper supports "out of box" transactions and allows subscribing to transaction control hooks. Due to this, the new connector version provides more control over the progress of requests and resource release, which is especially relevant for transactional data loading via staging tables. In the previous version of the connector, a staging layer might not have been removed under certain circumstances and it was necessary to remove it manually.
-
Simple query syntax. There is no need to use the
txn
function to enable a transaction mode (staging layer usage) anymore. In the new connector version for data transmission from ADB to ClickHouse, it is enough to execute the ordinaryINSERT
query against the corresponding foreign table. Staging tables are now managed by a separate flaguse_staging
. In addition, in the previous connector version, it was necessary to specify full names of staging tables when creating external tables. In ADB ClickHouse Connector, only the target table name in ClickHouse is required to be filled in theresource
option at the foreign table level. Names for staging tables are automatically generated based on the template that is specified in thestaging_table_name_format
option.
IMPORTANT
|
The high-level architecture view of the new connector version is shown below.