Indexes

Overview

Indexing is a technique to improve database performance. Indexes are special data structures that allow a database server to quickly find requested rows by values ​​of a key column (or column set) without a full table scan.

ADQM supports the following types of indexes for MergeTree tables:

  • Primary index. A primary index in ADQM differs from the index in traditional relational databases to perform analytical queries on large amounts of data most efficiently — it stores key column values ​​not per table row but per range of N rows ordered by key columns (each index entry stores key column values in the first row of the corresponding range). When a primary index is defined correctly for a table, it reduces the amount of data to be read from disk for query execution and provides quick searches for ranges of rows with the desired values. A key to build the primary index should be specified on a table creation and cannot be changed after that.

  • Data skipping indexes (secondary indexes). When a primary index is not enough to ensure high query performance, you can add data skipping indexes to your table — secondary indexes that enable ADQM to skip reading blocks of data (they can be quite large) that are guaranteed not to contain the requested values. For example, such indexes can be effective for full-text search or for checking whether values ​​are included in specific lists. Unlike a primary index, skipping indexes can be added to or removed from existing tables.

Each type of ADQM index has its own advantages and features. When designing table indexes, take into account which queries will be sent to a table most often, characteristics of columns used in queries (data type, cardinality), desired balance between read/write performance and system resource consumption. To select the most effective index, it is often necessary to test different index types, granularity sizes, and other parameters.

TIP
In the Improve query performance article, you can find examples of applying indexes of different types to tables and explore how they affect query execution speed.

Sparse primary index

MergeTree table data is divided into data parts. Rows within a data part are sorted by primary key columns. Each data part is logically divided into granules. A granule is the smallest indivisible data set (always contains an integer number of rows) that is read when selecting data. The first row of a granule is marked with values of primary key columns for that row — this is a mark. For each data part, ADQM creates an index file that stores marks. Thus, instead of indexing each row, the primary index for a data part has one index entry (mark) per group of rows (granule) — this technique is called sparse index. A sparse primary index allows it to avoid scanning the entire table for individual rows and quickly (with a binary search over index entries) identify granules that could possibly include rows matching a query filter. Found groups of potentially matching rows are then loaded into memory in parallel in order to find exact matches.

This type of indexing makes a primary index small (it completely fits into memory) and still allows data to be searched on large tables in seconds, skipping terabytes or even petabytes of irrelevant data (although a sparse index allows extra data to be read).

Specify a primary key

For a primary index to work most effectively, a primary key should include a column that will filter the majority of queries accessing the table. A primary key can contain multiple columns — when defining a compound primary key, follow the recommendations below.

To assign a primary key to a MergeTree table, use the ORDER BY or PRIMARY KEY clause in the CREATE TABLE query when creating the table:

CREATE TABLE <table_name> (<column_name> <column_type>, ...)
ENGINE = MergeTree
ORDER BY <sorting_key_expr>
[PRIMARY KEY <primary_key_expr>];
  • ORDER BY — sorting key defined as one or more columns. ADQM uses the sorting key as a primary key if the primary key is not specified explicitly via the PRIMARY KEY clause.

  • PRIMARY KEY — primary key. By default, a primary key is the same as a sorting key (ORDER BY), and it is not necessary to specify the PRIMARY KEY clause separately in most cases. However, it is possible to set a primary key (to write key column values as marks in the index file) that differs from the sorting key (to use columns for ordering rows in data parts). In this case, the tuple of primary key columns must be a prefix of the tuple of sorting key columns (for example, if the primary key is (a, b), then the sorting key must be (a, b, c…​)).

Each table can have only one primary key. You can also create a table without a primary key — to do this, use the ORDER BY tuple() syntax.

Values in primary key columns do not have to be unique — ADQM allows multiple rows with the same values in key columns to be inserted into a table.

Compound primary key

In a compound primary key, the order of key columns is important and can significantly affect both:

  • efficiency of filtering by secondary key columns in queries;

  • compression ratio of table’s data files.

When a query filters data by the first key column, ADQM runs a binary search algorithm over the index marks of that column. Filtering by other key columns uses the generic exclusion search algorithm that is less efficient. For example, the (a, b) primary key is useful for speeding up queries filtering by the a column, but it does not improve the performance of queries filtering by the b column, despite the b column is a part of the compound primary key. Conversely, the primary index of a table with the (b, a) compound primary key speeds up queries filtering by the b column, but does not optimize queries filtering by the a column.

If you want to speed up both types of queries (filtering by the first key column and filtering by the secondary key column) and the cardinality (number of distinct values) of columns in the compound primary key is equally high, it makes sense to remove the second key column from the primary index (this will result in less memory consumption of the index), and use multiple primary indexes instead. To implement this, you can use one of the following approaches:

  • create a second table with a different primary key;

  • create a materialized view on the existing table;

  • add a projection to the existing table.

If columns in a compound primary key have large differences in cardinality, it is recommended to order the primary key columns by cardinality in ascending order to speed up queries. The higher the difference in cardinality between key columns, the more important the order of those columns in the key is.

Data skipping indexes

For MergeTree tables, you can also set data skipping indexes. A data skipping index aggregates some information about the specified expression (column) on data blocks. This information is then used in SELECT queries to reduce the amount of data to be read from disk by skipping blocks of data that are guaranteed not to match the query’s filter criteria. A skipped block consists of data granules in an amount equal to the granularity of the given index.

Manage indexes

Describe skipping indexes using the INDEX clause within the column description section of the CREATE TABLE query:

CREATE TABLE <table_name>
(   <column_name> <column_type>, ...
    INDEX <index_name> <index_expr> TYPE <index_type> [GRANULARITY <granularity_value>], ...)
ENGINE = MergeTree
...;

where:

  • <index_name> — index name;

  • <index_expr> — expression by which the information will be aggregated (often it is just a column name);

  • <index_type> — index type that defines which information about a data block the index should store (this information will allow finding the necessary blocks quickly and skipping all the others);

  • <granularity_value> — index granularity that is the number of data granules in a skipped block (the default value is 1).

You can add a skipping index to an existing table using the following query:

ALTER TABLE <table_name> ADD INDEX <index_name> <index_expr> TYPE <index_type> [GRANULARITY <granularity_value>];

When an index is added to an existing table, it is not automatically updated. ALTER TABLE changes metadata, and the index will only be calculated for new data inserted into the table. To apply the index to existing data, run the additional command:

ALTER TABLE <table_name> MATERIALIZE INDEX <index_name>;

To delete indexes, use the following queries:

  • ALTER TABLE <table_name> DROP INDEX <index_name> — removes index description from table metadata and deletes index files from disk;

  • ALTER TABLE <table_name> CLEAR INDEX <index_name> — removes index files from disk without removing the index description from metadata.

Index types

MinMax

The minmax index stores the minimum and maximum values of the column (or expression) for each block.

Set

The set(<max_rows>) index stores unique values of the column (or expression) per block up to <max_rows> (0 means an unlimited number of unique values). When an indexed column is used in the WHERE clause, ADQM reads a small set of values instead of the entire column. This type of index works well if a column has low cardinality (a small number of unique values) in each set of granules, but high cardinality overall.

Bloom Filter

ADQM supports three different types of Bloom filter index:

  • bloom_filter([<false_positive>]) — Bloom filter for the column. The optional <false_positive> parameter defines the probability of receiving a false positive response from the filter. Possible values are between 0 and 1. A default value is 0.025.

    This index type is supported for the following data types: Int*, UInt*, Float*, Enum, Date, DateTime, String, FixedString, Array, LowCardinality, Nullable, UUID and Map.

  • tokenbf_v1(<size_of_bloom_filter_in_bytes>, <number_of_hash_functions>, <random_seed>) — an input string is split into alphanumeric tokens (sequences of characters separated by non-alphanumeric characters) and then tokens are stored in a Bloom filter. It is suitable when an exact match on a string is searched (for example, when you want to find a specific part of the URL or a query parameter in the URL column).

    All three index parameters are related to the Bloom filter configuration:

    • <size_of_bloom_filter_in_bytes> — Bloom filter size in bytes (larger filters have fewer false positives);

    • <number_of_hash_functions> — number of hash functions used in the Bloom filter;

    • <random_seed> — seed for the Bloom filter hash functions.

    This index type works with the String, FixedString, and Map data types only.

  • ngrambf_v1(<n>, <size_of_bloom_filter_in_bytes>, <number_of_hash_functions>, <random_seed>) — an input string is split into n-grams (a substring of n characters) and then stored in a Bloom filter. It is suitable for full-text search, particularly in languages without word breaks, such as Chinese. The first parameter (<n>) sets the size of n-grams, the rest ones are similar to tokenbf_v1. This index type works with the String, FixedString, and Map data types.

Inverted

The inverted index inverted([<n>], [<max_rows_per_posting_list>]) stores a mapping of unique words (or n-grams) of a text column with pointers to their location in the table (the identifiers of rows in which they are contained). Use it to optimize full-text search across columns of the types: String, FixedString, Array(String), Array(FixedString), Map(String), and Map(String). Currently, it is in the experimental state. For more detailed information and an example of the use, see the Full-text search article.

Get information on indexes

The following system tables provide information about existing indexes:

  • system.tables — the primary_key field contains information about the table’s primary key.

    Example
    SELECT primary_key FROM system.tables WHERE table='table_compound_primary_key';
    ┌─primary_key─┐
    │ a, b        │
    └─────────────┘
  • system.data_skipping_indices — contains information about existing data skipping indexes in all tables.

    Example
    SELECT type_full,expr FROM system.data_skipping_indices WHERE table='table_skip_index';
    ┌─type_full─┬─expr─┐
    │ set(100)  │ b    │
    └───────────┴──────┘

You can also use queries:

  • SHOW CREATE TABLE <table_name>.

    Example
    SHOW CREATE TABLE table_skip_index;
    CREATE TABLE default.table_skip_index
    (
        `a` UInt64,
        `b` UInt64,
        INDEX my_index b TYPE set(100) GRANULARITY 2
    )
    ENGINE = MergeTree
    PRIMARY KEY a
    ORDER BY a
    SETTINGS index_granularity = 10000
  • SHOW {INDEX|INDEXES|INDICES|KEYS} FROM <table_name>.

    Example
    SHOW INDEXES FROM table_skip_index;
    ┌─table────────────┬─non_unique─┬─key_name─┬─seq_in_index─┬─column_name─┬     ┬─index_type─┬─comment─┬─index_comment─┬─visible─┬─expression─┐
    │ table_skip_index │          1 │ PRIMARY  │            1 │ a           │ ... │ PRIMARY    │         │               │ YES     │            │
    └──────────────────┴────────────┴──────────┴──────────────┴─────────────┴─   ─┴────────────┴─────────┴───────────────┴─────────┴────────────┘
    ┌─table────────────┬─non_unique─┬─key_name─┬─seq_in_index─┬─column_name─┬─    ┬─index_type─┬─comment─┬─index_comment─┬─visible─┬─expression─┐
    │ table_skip_index │          1 │ my_index │            1 │             │ ... │ SET        │         │               │ YES     │ b          │
    └──────────────────┴────────────┴──────────┴──────────────┴─────────────┴─    ┴────────────┴─────────┴───────────────┴─────────┴────────────┘

To examine the use of indexes for an individual query, you can analyze that query execution using the EXPLAIN command with the indexes parameter.

Example
EXPLAIN indexes = 1 SELECT * FROM table_skip_index WHERE b=100 OR b=555;
┌─explain──────────────────────────────────────────┐
│ Expression ((Project names + Projection))        │
│   Expression                                     │
│     ReadFromMergeTree (default.table_skip_index) │
│     Indexes:                                     │
│       PrimaryKey                                 │
│         Condition: true                          │
│         Parts: 6/6                               │
│         Granules: 10004/10004                    │
│       Skip                                       │
│         Name: my_index                           │
│         Description: set GRANULARITY 2           │
│         Parts: 1/6                               │
│         Granules: 4/10004                        │
└──────────────────────────────────────────────────┘
Found a mistake? Seleсt text and press Ctrl+Enter to report it