Geo-distributed cluster

A geo-distributed cluster is a cluster whose nodes are located in several geographically distant data centers. This solution ensures high reliability of the system, which will remain operational even if one of the data centers fails.

This article contains examples of geo-distributed ADQM cluster topologies. The most suitable topology depends on the required balance between reliability, low latency, disk storage, and maintainability. For example, the Stretched cluster, Stretched cluster with local reads and Stretched cluster with local writes topologies are often used to ensure the cluster performance in the high-latency environment.

In each of the examples below, cluster servers are located in two data centers — West and East.

Stretched cluster

Topology features:

  • SELECT and INSERT queries can be routed to East or West, but it is not guaranteed that all reads are local (it is possible that a query will be executed by replicas in different data centers).

  • ZooKeeper nodes are placed in the East and West data centers, as well as in the third Nord data center, as ZooKeeper requires an odd number of nodes.

Stretched cluster
Stretched cluster
Stretched cluster
Stretched cluster

Configure a logical cluster

In each server configuration (config.xml), define the stretched_cluster cluster that contains two shards with two replicas (located in different data centers) in each one. For shards, set the internal_replication parameter to true to replicate data through a replicated table (data is written to any available replica, and another replica receives its copy of data automatically).

config.xml
<remote_servers>
  <stretched_cluster>
    <shard>
      <internal_replication>true</internal_replication>
      <replica>
        <host>shard1.west</host>
        <port>9000</port>
      </replica>
      <replica>
        <host>shard1.east</host>
        <port>9000</port>
      </replica>
    </shard>
    <shard>
      <internal_replication>true</internal_replication>
      <replica>
        <host>shard2.west</host>
        <port>9000</port>
      </replica>
      <replica>
        <host>shard2.east</host>
        <port>9000</port>
      </replica>
    </shard>
  </stretched_cluster>
</remote_servers>

Create distributed tables

On all nodes of the cluster, create a replicated table (replicated_table) based on the ReplicatedMergeTree engine.

Use the Distributed engine to create distributed tables both for reads and writes on all nodes of the cluster. To do this, run the following query on any node:

CREATE TABLE table_distributed ON CLUSTER stretched_cluster AS replicated_table
ENGINE = Distributed(stretched_cluster, default, replicated_table);

Stretched cluster with local reads

Topology features:

  • SELECT and INSERT queries can be routed to East or West, and it is guaranteed that all reads are local.

  • ZooKeeper nodes are placed in the East and West data centers, as well as in the third Nord data center, as ZooKeeper requires an odd number of nodes.

Stretched cluster with local reads
Stretched cluster with local reads
Stretched cluster with local reads
Stretched cluster with local reads

Configure logical clusters

In each server configuration (config.xml), define the following clusters:

  • stretched_cluster_writes — hosts of both West and East data centers for writes (two shards with two replicas in each one);

  • stretched_cluster_west_reads — hosts of the West data center for reads (two shards with one replica in each one);

  • stretched_cluster_east_reads — hosts of the East data center for reads (two shards with one replica in each one).

For shards, set the internal_replication parameter to true to replicate data through a replicated table (data is written to any available replica, and another replica receives its copy of data automatically).

config.xml
<remote_servers>
  <stretched_cluster_writes>
    <shard>
      <internal_replication>true</internal_replication>
      <replica>
        <host>shard1.west</host>
        <port>9000</port>
      </replica>
      <replica>
        <host>shard1.east</host>
        <port>9000</port>
      </replica>
    </shard>
    <shard>
      <internal_replication>true</internal_replication>
      <replica>
        <host>shard2.west</host>
        <port>9000</port>
      </replica>
      <replica>
        <host>shard2.east</host>
        <port>9000</port>
      </replica>
    </shard>
  </stretched_cluster_writes>
  <stretched_cluster_west_reads>
    <shard>
      <internal_replication>true</internal_replication>
      <replica>
        <host>shard1.west</host>
        <port>9000</port>
      </replica>
    </shard>
    <shard>
      <internal_replication>true</internal_replication>
      <replica>
        <host>shard2.west</host>
        <port>9000</port>
      </replica>
    </shard>
  </stretched_cluster_west_reads>
  <stretched_cluster_east_reads>
    <shard>
      <internal_replication>true</internal_replication>
      <replica>
        <host>shard1.east</host>
        <port>9000</port>
      </replica>
    </shard>
    <shard>
      <internal_replication>true</internal_replication>
      <replica>
        <host>shard2.east</host>
        <port>9000</port>
      </replica>
    </shard>
  </stretched_cluster_east_reads>
</remote_servers>

Create distributed tables

On all nodes of the cluster, create a replicated table (replicated_table) based on the ReplicatedMergeTree engine.

Create distributed tables for reads separately in each data center:

  • West:

    CREATE TABLE table_distributed_west_reads ON CLUSTER stretched_cluster_west_reads AS replicated_table
    ENGINE = Distributed(stretched_cluster_west_reads, default, replicated_table);
  • East:

    CREATE TABLE table_distributed_east_reads ON CLUSTER stretched_cluster_east_reads AS replicated_table
    ENGINE = Distributed(stretched_cluster_east_reads, default, replicated_table);

Create identical distributed tables for writes in a logical cluster that combines both data centers:

CREATE TABLE table_distributed_writes ON CLUSTER stretched_cluster_writes AS replicated_table
ENGINE = Distributed(stretched_cluster_writes, default, replicated_table);

Stretched cluster with local writes

Topology features:

  • SELECT and INSERT queries can be routed to East or West, and it is guaranteed that all writes are local.

  • ZooKeeper nodes are placed in the East and West data centers, as well as in the third Nord data center, as ZooKeeper requires an odd number of nodes.

Stretched cluster with local writes
Stretched cluster with local writes
Stretched cluster with local writes
Stretched cluster with local writes

Configure logical clusters

In each server configuration (config.xml), define the following clusters:

  • stretched_cluster_reads — hosts of both West and East data centers for reads (two shards with two replicas in each one);

  • stretched_cluster_west_writes — hosts of the West data center for writes (two shards with one replica in each one);

  • stretched_cluster_east_writes — hosts of the East data center for writes (two shards with one replica in each one).

For shards, set the internal_replication parameter to true to replicate data through a replicated table (data is written to any available replica, and another replica receives its copy of data automatically).

config.xml
<remote_servers>
  <stretched_cluster_reads>
    <shard>
      <internal_replication>true</internal_replication>
      <replica>
        <host>shard1.west</host>
        <port>9000</port>
      </replica>
      <replica>
        <host>shard1.east</host>
        <port>9000</port>
      </replica>
    </shard>
    <shard>
      <internal_replication>true</internal_replication>
      <replica>
        <host>shard2.west</host>
        <port>9000</port>
      </replica>
      <replica>
        <host>shard2.east</host>
        <port>9000</port>
      </replica>
    </shard>
  </stretched_cluster_reads>
  <stretched_cluster_west_writes>
    <shard>
      <internal_replication>true</internal_replication>
      <replica>
        <host>shard1.west</host>
        <port>9000</port>
      </replica>
    </shard>
    <shard>
      <internal_replication>true</internal_replication>
      <replica>
        <host>shard2.west</host>
        <port>9000</port>
      </replica>
    </shard>
  </stretched_cluster_west_writes>
  <stretched_cluster_east_writes>
    <shard>
      <internal_replication>true</internal_replication>
      <replica>
        <host>shard1.east</host>
        <port>9000</port>
      </replica>
    </shard>
    <shard>
      <internal_replication>true</internal_replication>
      <replica>
        <host>shard2.east</host>
        <port>9000</port>
      </replica>
    </shard>
  </stretched_cluster_east_writes>
</remote_servers>

Create distributed tables

On all nodes of the cluster, create a replicated table (replicated_table) based on the ReplicatedMergeTree engine.

Create distributed tables for writes separately in each data center:

  • West:

    CREATE TABLE table_distributed_west_writes ON CLUSTER stretched_cluster_west_writes AS replicated_table
    ENGINE = Distributed(stretched_cluster_west_writes, default, replicated_table);
  • East:

    CREATE TABLE table_distributed_east_writes ON CLUSTER stretched_cluster_east_writes AS replicated_table
    ENGINE = Distributed(stretched_cluster_east_writes, default, replicated_table);

Create identical distributed tables for reads in a logical cluster that combines both data centers:

CREATE TABLE table_distributed_reads ON CLUSTER stretched_cluster_reads AS replicated_table
ENGINE = Distributed(stretched_cluster_reads, default, replicated_table);

Asymmetric cluster

Topology features:

  • Reads are possible both in the West and in East data centers, while writes — in West only.

  • Servers of a single ZooKeeper ensemble are located in two data centers (the third data center is not needed), while only servers in West participate in the quorum. Replicas in the East data center cannot be leaders (the replicated_can_become_leader=0 setting in the merge_tree section of config.xml controls this behavior).

  • The insert_quorum parameter is set to 1 as the number of servers in the quorum should be equal or less than the number of replicas in the main data center (in this example, it is the West data center to which data can be inserted).

Asymmetric cluster
Asymmetric cluster
Asymmetric cluster
Asymmetric cluster

Configure logical clusters

In each server configuration (config.xml), define the following clusters:

  • asymmetric_cluster_west — hosts of the West data center (two shards with one replica in each one);

  • asymmetric_cluster_east — hosts of the East data center (two shards with one replica in each one).

For shards, set the internal_replication parameter to true to replicate data through a replicated table (data is written to any available replica, and another replica receives its copy of data automatically).

config.xml
<remote_servers>
  <asymmetric_cluster_west>
    <shard>
      <internal_replication>true</internal_replication>
      <replica>
        <host>shard1.west</host>
        <port>9000</port>
      </replica>
    </shard>
    <shard>
      <internal_replication>true</internal_replication>
      <replica>
        <host>shard2.west</host>
        <port>9000</port>
      </replica>
    </shard>
  </asymmetric_cluster_west>
  <asymmetric_cluster_east>
    <shard>
      <internal_replication>true</internal_replication>
      <replica>
        <host>shard1.east</host>
        <port>9000</port>
      </replica>
    </shard>
    <shard>
      <internal_replication>true</internal_replication>
      <replica>
        <host>shard2.east</host>
        <port>9000</port>
      </replica>
    </shard>
  </asymmetric_cluster_east>
</remote_servers>

Create distributed tables

On all nodes of the cluster, create a replicated table (replicated_table) based on the ReplicatedMergeTree engine.

Create distributed tables for reads separately in each data center:

  • West:

    CREATE TABLE table_distributed_reads ON CLUSTER asymmetric_cluster_west AS replicated_table
    ENGINE = Distributed(asymmetric_cluster_west, default, replicated_table);
  • East:

    CREATE TABLE table_distributed_reads ON CLUSTER asymmetric_cluster_east AS replicated_table
    ENGINE = Distributed(asymmetric_cluster_east, default, replicated_table);

Create a distributed table for writes in the West data center only:

CREATE TABLE table_distributed_writes ON CLUSTER asymmetric_cluster_west AS replicated_table
ENGINE = Distributed(asymmetric_cluster_west, default, replicated_table);

Cluster with local reads, without native geo replication

Topology features:

  • Native ClickHouse replication between data centers is not supported. Replication between data centers is provided by distributed tables.

  • Set the insert_quorum parameter to 2 — when executing an INSERT query, a distributed table will write data to two replicas located in different data centers.

  • SELECT and INSERT queries can be routed to East or West, and it is guaranteed that all reads are local.

  • High latency between West and East is possible.

  • ZooKeeper is available in one data center only.

Cluster with local reads, without native geo replication
Cluster with local reads, without native geo replication
Cluster with local reads, without native geo replication
Cluster with local reads, without native geo replication

Configure logical clusters

In each server configuration (config.xml), define the following clusters:

  • cluster_writes — includes hosts of the West and East data centers for writes (two shards with two replicas in each one);

  • cluster_west_reads — includes hosts of the West data center for reads (two shards with one replica in each one);

  • cluster_east_reads — includes hosts of the East data center for reads (two shards with one replica in each one).

The internal_replication parameter for shards of the cluster_writes cluster to which data is written is set to false so that the distributed table writes data to both replicas located in different data centers.

config.xml
<remote_servers>
  <cluster_writes>
    <shard>
      <internal_replication>false</internal_replication>
      <replica>
        <host>shard1.west</host>
        <port>9000</port>
      </replica>
      <replica>
        <host>sh1r1.east</host>
        <port>9000</port>
      </replica>
    </shard>
    <shard>
      <internal_replication>false</internal_replication>
      <replica>
        <host>shard2.west</host>
        <port>9000</port>
      </replica>
      <replica>
        <host>sh2r1.east</host>
        <port>9000</port>
      </replica>
    </shard>
  </cluster_writes>
  <cluster_west_reads>
    <shard>
      <replica>
        <host>shard1.west</host>
        <port>9000</port>
      </replica>
    </shard>
    <shard>
      <replica>
        <host>shard2.west</host>
        <port>9000</port>
      </replica>
    </shard>
  </cluster_west_reads>
  <cluster_east_reads>
    <shard>
      <internal_replication>true</internal_replication>
      <replica>
        <host>shard1r2.east</host>
        <port>9000</port>
      </replica>
    </shard>
    <shard>
      <internal_replication>true</internal_replication>
      <replica>
        <host>shard2r2.east</host>
        <port>9000</port>
      </replica>
    </shard>
  </cluster_east_reads>
</remote_servers>

Create distributed tables

On all nodes of the cluster, create a replicated table (replicated_table) based on the ReplicatedMergeTree engine.

Create distributed tables for reads in all nodes of the West data center individually (ZooKeeper is required to make ON CLUSTER working):

CREATE TABLE table_distributed_reads AS replicated_table
ENGINE = Distributed(cluster_west_reads, default, replicated_table);

To create a distributed table for reads in East, it is enough to run the following query on any node:

CREATE TABLE table_distributed_reads ON CLUSTER cluster_east_reads AS replicated_table
ENGINE = Distributed(cluster_east_reads, default, replicated_table);

A distributed table for writes should be created on the shard1.west, shard2.west, shard1r1.east, shard2r1.east hosts:

CREATE TABLE table_distributed_writes AS replicated_table
ENGINE = Distributed(cluster_writes, default, replicated_table);
Found a mistake? Seleсt text and press Ctrl+Enter to report it