Table engines

Table engine in ADQM is a table type that determines:

  • how and where data is stored;

  • supported queries;

  • concurrent data access;

  • whether indexes can be used;

  • whether multi-threaded query execution is supported;

  • data replication parameters.

Table engine families

ADQM supports a variety of table engines that are optimized for different tasks. All engines are grouped into families:

  • MergeTree — the most universal and functional table engines for storing big data and high-performance search. The main feature of these engines is fast data insertion followed by background data processing. For details, see the MergeTree family section below.

  • Log — simple engines for scenarios when it is required to quickly write a lot of small tables (up to 1 million rows) and then read them in full.

  • Integration — engines for communicating with external data storages (ODBC, JDBC, MySQL, MongoDB, HDFS, S3, Kafka, EmbeddedRocksDB, RabbitMQ, PostgreSQL). You can find detailed information about these engines with examples in the Table Engines for Integrations section of the ClickHouse documentation.

  • Special — engines for special tasks, for example:

    • Distributed — engine for distributed query processing on multiple servers. Read more about creating and using Distributed tables in the Sharding article.

    • Dictionary — engine that displays a dictionary data as a table.

    • Merge (not to be confused with the MergeTree engine) — engine that allows reading data from any number of other tables simultaneously. A Merge table does not store data itself and does not support writing data.

    • Other engines — read more in the Special Table Engines section of the ClickHouse documentation.

You can get information on features supported by table engines from the system.table_engines system table, for example:

SELECT * FROM system.table_engines WHERE name in ('MergeTree', 'TinyLog', 'MySQL');

The query result:

│ MySQL     │                 1 │                         0 │                    0 │                   0 │            0 │                    0 │                      0 │                        0 │
│ MergeTree │                 1 │                         1 │                    1 │                   1 │            1 │                    0 │                      0 │                        1 │
│ TinyLog   │                 1 │                         0 │                    0 │                   0 │            0 │                    0 │                      0 │                        0 │

MergeTree family

Features of tables of the MergeTree family

MergeTree is the main table engine for working with big data in ADQM. This engine and all its special versions support:

  • partitioning — combining table data into partitions by the specified criterion (in other words, a partition is a logical group of data rows that you can manipulate as needed — detach/attach, move, etc.);

  • data replication;

  • data sampling;

  • sparse index for optimized data search.

Data insertion and storage

Features related to data storage in tables of the MergeTree family:

  • The MergeTree table engine inserts data as a set of separate parts and then merges them in the background.

  • Data in each part is sorted by the primary key.

  • Data that belongs to different partitions is split into different parts. Data parts of different partitions are not merged.

  • The merge process is managed by the system (the system tends to a state when a partition stores only one sorted data part), but you can force merging with the OPTIMIZE query.

  • Each data part is logically divided into granules. A granule is the smallest indivisible data set that is read when selecting data. The size of granules is set by the engine’s index_granularity (the maximum number of data rows in a granule) and index_granularity_bytes (the maximum size of a granule in bytes) settings.

  • Each data part is stored in a separate folder inside the table folder. Data parts can be stored both in the column format (each column is stored in a separate file) or in the compact format (all columns are stored in one file).

  • At a particular moment in time, a table can store more than one copy of data — unmerged parts and their merged version.

  • Old parts, whose data was merged into a new part, are removed after the time specified by the table’s old_parts_lifetime setting.

  • You can manipulate data parts as partitions — move, detach/attach, etc.

Special versions of the MergeTree engine

The MergeTree family also includes the following table engines that inherit the MergeTree functionality and also add extra functionality for specific use cases:

  • ReplacingMergeTree — removes duplicate rows with the same sorting key value. This table engine is suitable for cleaning of duplicate data in the background to save space, however it does not guarantee the absence of duplicates.

  • SummingMergeTree — replaces all rows with the same sorting key with one row that stores summarized values in numeric columns.

  • AggregatingMergeTree — used for creating materialized views.

  • CollapsingMergeTree and VersionedCollapsingMergeTree — add the logic of collapsing (deleting) rows to the algorithm for merging data parts.

  • GraphiteMergeTree — thins and aggregates/averages Graphite data.

Replicated table engines

Engines of the MergeTree family with the Replicated prefix are used for creating replicated tables:

  • ReplicatedMergeTree

  • ReplicatedReplacingMergeTree

  • ReplicatedSummingMergeTree

  • ReplicatedAggregatingMergeTree

  • ReplicatedCollapsingMergeTree

  • ReplicatedVersionedCollapsingMergeTree

  • ReplicatedGraphiteMergeTree

For an example on how to create replicated tables using the ReplicatedMergeTree engine, see the Typical cluster article.

Found a mistake? Seleсt text and press Ctrl+Enter to report it