ClickHouse Keeper
To use ClickHouse Keeper for data replication and query distribution in ADQM, you can:
-
install ClickHouse Keeper in an ADQM cluster as a separate service;
-
use ClickHouse Keeper integrated into a ClickHouse server.
ClickHouse Keeper as a service of ADQM
If you need to locate ClickHouse Keeper on dedicated hosts, install ClickHouse Keeper as a service in an ADQM cluster. For this, do the following in the ADCM interface:
-
Add the Clickhousekeeper service to the ADQM cluster.
-
Install the Clickhouse Keeper Server component to the odd number of hosts.
-
Configure the Clickhousekeeper service (for descriptions of available parameters, see the Configuration parameters article).
Set up the Clickhousekeeper serviceClick Save to save settings.
-
If you are migrating your cluster from ZooKeeper to Clickhouse Keeper, copy data from ZooKeeper to Clickhouse Keeper at this stage. Formats of ClickHouse Keeper snapshots and logs are incompatible with ZooKeeper, so you should first convert ZooKeeper data to ClickHouse Keeper snapshots (for details, see the Migration from ZooKeeper section in the ClickHouse documentation).
-
In the Engine section on the ADQMDB service configuration page, set
Clickhouse_keeper (allocated)
as the System parameter value.Enable the Clickhousekeeper service -
Click Save and execute the Reconfig and restart action for the ADQMDB service to use the installed ClickHouse Keeper as a coordination service for the ADQM cluster.
NOTE
To change hosts on which ClickHouse Keeper should be installed, you can use the actions of the Clickhousekeeper service:
After these actions are completed, run the Reconfig and restart action for the ADQMDB service to apply the changes. |
Integrated ClickHouse Keeper
If there is no requirement to control on which hosts ClickHouse Keeper is installed (for example, in test or small clusters), you can use the integrated ClickHouse Keeper — ADQM will automatically configure and deploy it on hosts where the Clickhouse Server component is installed. To do this, perform the following in the ADCM interface:
-
Activate the Clickhouse Keeper (integrated) option on the configuration page of the ADQMDB service. In the expanded section, settings are filled in automatically based on the ClickHouse server parameters. You can change parameter values if necessary (see parameter descriptions in the Configuration parameters article).
Configuration parameters of integrated ClickHouse KeeperIf you are migrating the cluster from ZooKeeper to ClickHouse Keeper, follow these steps:
-
Click Save and execute the Reconfig and restart action for the ADQMDB service to configure and deploy the internal ClickHouse Keeper.
-
Convert and copy data from ZooKeeper to ClickHouse Keeper.
-
-
In the Engine section of the ADQMDB service’s configuration page, set
Clickhouse_keeper (integrated)
as the System parameter value.Enable integrated ClickHouse KeeperClick Save and execute the Reconfig and restart action for the ADQMDB service to enable the use of the internal ClickHouse Keeper as a coordination service for the ADQM cluster.
Test replicated and distributed tables
Configure ADQM cluster with ClickHouse Keeper
-
Configure a test ADQM cluster — two shards with two replicas in each one (hosts used in this example are host-1, host-2, host-3, and host-4). To learn how to do this via the ADCM interface, read Configure logical clusters in the ADCM interface. In this case, all necessary sections described below are added to the configuration file of each server automatically.
-
Logical cluster configuration
In the config.xml file, the cluster configuration is defined in the
remote_servers
section:<remote_servers> <default_cluster> <shard> <internal_replication>true</internal_replication> <weight>1</weight> <replica> <host>host-1</host> <port>9000</port> </replica> <replica> <host>host-2</host> <port>9000</port> </replica> </shard> <shard> <internal_replication>true</internal_replication> <weight>1</weight> <replica> <host>host-3</host> <port>9000</port> </replica> <replica> <host>host-4</host> <port>9000</port> </replica> </shard> </default_cluster> </remote_servers>
In this example, a ReplicatedMergeTree table is used, so the
internal_replication
parameter is set totrue
for each shard — in this case, the replicated table manages data replication itself (data is written to any available replica, and another replica receives data in the asynchronous mode). If using regular tables, set theinternal_replication
parameter tofalse
— in this case, data is replicated by a Distributed table (data is written to all replicas of a shard). -
Macros
Each server configuration should also contain the
macros
section, which defines the shard and replica identifiers to be replaced with values corresponding to a specific host when creating replicated tables on the cluster (ON CLUSTER
). For example, macros for host-1 contained in thedefault_cluster
cluster:<macros> <replica>1</replica> <shard>1</shard> </macros>
If you configure a logical cluster in the ADCM interface via the Cluster Configuration setting and specify the cluster name, for example,
abc
, ADQM will write macros to config.xml as follows (for host-1 in theabc
cluster with the same topology as provided above):<macros> <abc_replica>1</abc_replica> <abc_shard>1</abc_shard> </macros>
In this case, use the
{abc_shard}
and{abc_replica}
variables within ReplicatedMergeTree parameters when creating replicated tables.
-
-
Install the Clickhousekeeper service to three hosts (host-1, host-2, host-3). After that, the following sections will be added to the config.xml file:
-
zookeeper
— list of ClickHouse Keeper nodes; -
distributed_ddl
— path in ClickHouse Keeper to the queue with DDL queries (if multiple clusters use the same ClickHouse Keeper, this path should be unique for each cluster).
<zookeeper> <node> <host>host-1</host> <port>2129</port> </node> <node> <host>host-2</host> <port>2129</port> </node> <node> <host>host-3</host> <port>2129</port> </node> <root>/clickhouse</root> </zookeeper> <distributed_ddl> <path>/clickhouse/task_queue/ddl</path> </distributed_ddl>
-
Replicated tables
-
Execute the following query on one of the hosts (for example, on host-1) to create a replicated table:
CREATE TABLE test_local ON CLUSTER default_cluster (id Int32, value_string String) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/test_local', '{replica}') ORDER BY id;
This query creates the
test_local
table on all hosts of thedefault_cluster
cluster. -
Insert data into the table on host-2:
INSERT INTO test_local VALUES (1, 'a');
-
Make sure data is replicated. To do this, select data from the
test_local
table on host-1:SELECT * FROM test_local;
If replication works correctly, the table on host-1 will automatically receive data written to the replica on host-2:
┌─id─┬─value_string─┐ │ 1 │ a │ └────┴──────────────┘
Distributed tables
Note that the SELECT
query returns data only from a table on a host on which the query is being run.
Insert data into any replica on the second shard (for example, into a table on host-3):
INSERT INTO test_local VALUES (2, 'b');
Repeat the SELECT
query on host-1 or host-2 — the resulting selection will not include data from the second shard:
┌─id─┬─value_string─┐ │ 1 │ a │ └────┴──────────────┘
You can use distributed tables to get data from all shards (for details, see the Distributed tables section).
-
Create a table using the Distributed engine:
CREATE TABLE test_distr ON CLUSTER default_cluster AS default.test_local ENGINE = Distributed(default_cluster, default, test_local, rand());
-
Run the following query on any host:
SELECT * FROM test_distr;
The output includes data from both shards:
┌─id─┬─value_string─┐ │ 1 │ a │ └────┴──────────────┘ ┌─id─┬─value_string─┐ │ 2 │ b │ └────┴──────────────┘