Typical cluster
ADQM cluster does not have a master node and a single entry point. Queries can be sent to any host in the cluster.
Consider a typical ADQM cluster — three shards, each one consists of two replicas (in other words, a distributed cluster of 3 nodes with replication factor set to 2
). Shards are servers or groups of servers that store different parts of the same database. Replicas are duplicating servers within a shard that store the same data.
Local tables are replicated tables responsible for storing data. Distributed tables do not store data, they allow querying multiple local tables distributed across hosts that are grouped in a virtual cluster.
Each server configuration (config.xml) should include the following sections:
-
remote_servers
— clusters used for creating distributed tables and executing distributed queries; -
macros
— macros used to substitute values corresponding to a specific host when creating replicated tables.
The remote_servers
configuration for the presented cluster:
<remote_servers>
<default_cluster>
<shard>
<replica>
<host>host_1</host>
<port>9000</port>
</replica>
<replica>
<host>host_4</host>
<port>9000</port>
</replica>
</shard>
<shard>
<replica>
<host>host_2</host>
<port>9000</port>
</replica>
<replica>
<host>host_5</host>
<port>9000</port>
</replica>
</shard>
<shard>
<replica>
<host>host_3</host>
<port>9000</port>
</replica>
<replica>
<host>host_6</host>
<port>9000</port>
</replica>
</shard>
</default_cluster>
</remote_servers>
This configuration specifies a distributed cluster (default_cluster
) that consists of three shards where each shard includes two replicas. Each server parameters are host
(remote server address) and port
(TCP port for inter-server communication, usually 9000
).
The macros
section contains the shard and replica identifiers for each server. For example, macros for host-1
:
<macros>
<replica>1</replica>
<shard>1</shard>
</macros>
NOTE
To group ADQM hosts into a cluster, use the ADQMDB service’s configuration parameters in ADCM — for details, see Configure logical clusters in the ADCM interface. |
Create replicated tables
An example of query that creates replicated tables:
CREATE TABLE test_local ON CLUSTER default_cluster
(
id Int32,
value_string String
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/test_local', '{replica}')
ORDER BY id;
The ReplicatedMergeTree table engine is used for creating replicated table. Meta information related to the replication process is stored in ZooKeeper or ClickHouse Keeper, so the ReplicatedMergeTree engine requires the following parameters to build a unique path to each replicated table in ZooKeeper/ClickHouse Keeper:
-
path to a table in ZooKeeper/ClickHouse Keeper;
-
replica name in ZooKeeper/ClickHouse Keeper (it identifies different replicas of the same table).
In this example, replicated tables are created via a distributed DDL query (ON CLUSTER
) — a query is executed once on a single server, and the necessary tables will be created on all hosts in the cluster. The shard identifier and replica name specified as the {shard}
and {replica}
macros within parameters of ReplicatedMergeTree are replaced with specific values from the macros
section of each server’s configuration.
If you do not use the ON CLUSTER
clause, perform the CREATE TABLE
query on each server separately. In this case, you can define the parameters explicitly instead of using macros. However, it is recommended to use macros to reduce the error probability when working with large clusters.
Create distributed tables
The Distributed table engine is used for creating distributed tables, for example:
CREATE TABLE test_distr ON CLUSTER default_cluster AS default.test_local
ENGINE = Distributed(default_cluster, default, test_local, rand());
In this example, the Distributed table engine accepts the following parameters:
-
cluster on which the query will be run (cluster name in the
remote_servers
section of the server’s config file); -
database for the replicated table that the distributed table will access;
-
replicated table that the distributed table will access;
-
sharding key for inserting data into the distributed table (the
rand()
expression is used to randomly distribute data across shards).