Home
Arenadata QuickMarts
How to
Data querying
Query types
Dictionaries

Dictionaries

Elena Dvoryadkina

Collapse content Expand content

Contents

Create and configure a dictionary
View information on dictionaries
Load dictionaries
Use dictionaries
Detach and attach a dictionary
Delete a dictionary

A dictionary is a key/value data store that is fully or partially stored in the RAM of the ADQM server and can be used as a reference to substitute data values by keys in the final sample. Dictionaries are an easier-to-use alternative to the JOIN operator.

A data source for a dictionary can be a local file (text or executable), an HTTP(s) resource, or another DBMS. ADQM automatically updates dictionaries and loads missing or changed values from the source. Access to dictionaries from ADQM is read-only.

You can configure ADQM to connect dictionaries from external sources using XML configuration files or DDL queries. This article contains details on the second approach (recommended) — it describes syntax and provides examples of queries to create/configure/delete dictionaries in ADQM, as well as to read data from dictionaries using special functions.

Create and configure a dictionary

To create and configure a dictionary in ADQM, use the CREATE DICTIONARY query. The basic syntax is:

CREATE DICTIONARY [OR REPLACE][IF NOT EXISTS] [<database_name>.]<dictionary_name> [ON CLUSTER <cluster_name>]
(   <key1_name> <key1_data_type>,
    <key2_name> <key2_data_type>,
    <attr1_name> <attr1_data_type> [DEFAULT <default_value1>] [EXPRESSION <expr1>] [HIERARCHICAL] [INJECTIVE] [IS_OBJECT_ID],
    <attr2_name> <attr2_data_type> [DEFAULT <default_value2>] [EXPRESSION <expr2>] [HIERARCHICAL] [INJECTIVE] [IS_OBJECT_ID])
PRIMARY KEY <key1_name>, <key2_name>
SOURCE(<SOURCE_TYPE>([<param1_name> <param1_value> ... <paramN_name> <paramN_value>]))
LAYOUT(<LAYOUT_NAME>([<param_name> <param_value>]))
LIFETIME({MIN <min_value> MAX <max_value> | <max_value>})
[SETTINGS(<setting_name> = <setting_value>, <setting_name> = <setting_value>, ...)]
[COMMENT '<comment_text>']

where:

PRIMARY KEY — dictionary key;
SOURCE — data source for a dictionary;
LAYOUT — way to store a dictionary in memory;
LIFETIME — frequency of dictionary updates;
SETTINGS — additional settings that can be specified for a dictionary if its data source is a local file, HTTP(s) resource, or ClickHouse table;
COMMENT — text comment.

Dictionary key and attributes

The dictionary structure is defined by a key and attributes specified as parameters of the CREATE DICTIONARY query.

A key is a column in the dictionary source with unique values that are used to identify records and search for data values.

ADQM supports the following types of keys:
- Numeric key — a column of the UInt64 type.
  
  CREATE DICTIONARY <dictionary_name> ( <key_name> UInt64, ... ) PRIMARY KEY <key_name> ...
  
  where <key_name> is a name of a column with keys.
- Composite key is a tuple of columns of arbitrary types. It can consist of a single element (for example, you can use a string as a key).
  
  CREATE DICTIONARY <dictionary_name> ( <key1_name> <key1_data_type>, <key2_name> <key2_data_type> ... ) PRIMARY KEY <key1_name>, <key2_name> ...
  
  where <key1_name>, <key2_name> are names of columns with keys, and <key1_data_type>, <key2_data_type> are key types.

Attributes are columns in the dictionary source that store data values to be retrieved by a key. An attribute is described as follows:

CREATE DICTIONARY <dictionary_name> (
    ...
    <attribute_name> <attribute_data_type> [DEFAULT <default_value>] [EXPRESSION <expr>] [HIERARCHICAL] [INJECTIVE] [IS_OBJECT_ID]
    )
...

Parameters and basic clauses

<attribute_name>

Name of a column in the data source

<attribute_data_type>

Data type to which a value from the dictionary should be cast.

Supported types: UInt8, UInt16, UInt32, UInt64, Int8, Int16, Int32, Int64, Float32, Float64, UUID, Decimal32, Decimal64, Decimal128, Decimal256, String, Array.

The ability to specify Nullable types for attributes depends on how the dictionary is stored in memory (for example, Nullable types are currently not supported for attributes of IP_TRIE dictionaries)

DEFAULT

Default value for an element not found by key. NULL can only be specified if the attribute type is Nullable

EXPRESSION

Expression executed on a value

HIERARCHICAL

Indicates that the attribute contains a value of a parent key for the current key (use it to configure a hierarchical dictionary)

INJECTIVE

Indicates that the attribute is injective (different keys correspond to different values of the attribute). In this case, if GROUP BY uses a function that retrieves the attribute value by key, this function is automatically removed from GROUP BY. This can optimize the data aggregation process

IS_OBJECT_ID

Indicates that the query is executed for a MongoDB document by ObjectID

Dictionary source

Use the SOURCE clause in the CREATE DICTIONARY query to specify a data source for a dictionary:

CREATE DICTIONARY <dictionary_name> (...)
...
SOURCE(<SOURCE_TYPE>([<param1_name> <param1_value> ... <paramN_name> <paramN_value>]))
...

where <SOURCE_TYPE> is a data source type, <paramX_name> <paramX_value> — source settings.

Possible values of <SOURCE_TYPE>

Value

Data source

FILE

Local file (a source file should be located in the user_files folder)

HTTP

HTTP resource

MYSQL

Table from a MySQL database

CLICKHOUSE

Table from a ClickHouse database

MONGODB

Collection from a MongoDB database

REDIS

Redis database

CASSANDRA

Column family (table) from a Cassandra keyspace (database)

POSTGRESQL

Table from a PostgreSQL database (see the example in the Integration between ADQM and ADPG article)

ODBC

Table from any database that has the ODBC driver

You can find examples of connecting dictionaries to various types of sources in the Dictionary Sources section of the ClickHouse documentation.

Store dictionaries in memory

Use the LAYOUT clause in the CREATE DICTIONARY query to specify how to store a dictionary in memory:

CREATE DICTIONARY <dictionary_name> (...)
...
LAYOUT(<LAYOUT_TYPE>([<param_name> <param_value>]))
...

where <LAYOUT_TYPE> is a way to store a dictionary in memory.

Possible values of <LAYOUT_TYPE>

Value Way to store a dictionary

FLAT

A dictionary is completely stored in RAM as flat arrays

HASHED, COMPLEX_KEY_HASHED

A dictionary is completely stored in RAM as hash tables

SPARSE_HASHED, COMPLEX_KEY_SPARSE_HASHED

A dictionary is completely stored in RAM as hash tables, similar to HASHED/COMPLEX_KEY_HASHED, but uses less memory in favor higher CPU usage

HASHED_ARRAY, COMPLEX_KEY_HASHED_ARRAY

A dictionary is completely stored in RAM. Each attribute is stored in an array. The key is stored as a hash table where a value is an index in an array of attributes

RANGE_HASHED, COMPLEX_KEY_RANGE_HASHED

The dictionary is stored in RAM as a hash table with an ordered array of ranges and their corresponding values

CACHE, COMPLEX_KEY_CACHE

A dictionary is stored in a cache that has a fixed number of cells containing frequently used elements

SSD_CACHE, COMPLEX_KEY_SSD_CACHE

A dictionary is stored in a cache, similar to SSD_CACHE/COMPLEX_KEY_SSD_CACHE, but data is stored on SSD and index is in RAM

DIRECT, COMPLEX_KEY_DIRECT

A dictionary does not store data locally and interacts with the source while processing a query

IP_TRIE

This type of dictionary storage is for retrieving metadata (such as AS number or country code) from network prefixes (IP addresses). A dictionary should have a composite key that contains one element of the String type — IP address

Layouts with the COMPLEX_KEY prefix are used for dictionaries with composite keys.

For more details on all ways of storing dictionaries in memory, refer to the Storing Dictionaries in Memory section of the ClickHouse documentation.

Dictionary updates

Use the LIFETIME clause in the CREATE DICTIONARY query to specify a time interval for dictionary updates in seconds.

CREATE DICTIONARY <dictionary_name> (...)
...
LIFETIME(<max_value> | MIN <min_value> MAX <max_value>)
...

where:

<max_value> is a dictionary update interval. If you set LIFETIME(0), dictionary updates are disabled.
MIN <min_value> MAX <max_value> is an interval within which the time to update the dictionary is randomly selected. This setting can be useful to distribute the load on the dictionary source when updating on a large number of servers. If you set MIN 0 MAX 0, the dictionary is not reloaded by timeout. In this case, the dictionary data can be reloaded with the SYSTEM RELOAD DICTIONARY command.

Dictionary updates (except for loading on the first use) do not block queries — the old version of a dictionary is used during an update. If an error occurs during an update, the error is written to the server log, and queries continue using the old version of the dictionary.

The dictionary update logic depends on the data source type and how the dictionary is stored in memory — read the Dictionary Updates section of the ClickHouse documentation for details.

Example

Create a roles_dict table to be a data source for a dictionary:

CREATE TABLE roles_dict (`id` Int32, `role` String) ENGINE = MergeTree() ORDER BY id;

INSERT INTO roles_dict VALUES (10, 'admin'), (20, 'owner'), (30, 'author'), (40, 'reviewer'), (50, 'editor'), (60, 'view only');

--id─┬─role-------
│ 10 │ admin     │
│ 20 │ owner     │
│ 30 │ author    │
│ 40 │ reviewer  │
│ 50 │ editor    │
│ 60 │ view only │
------------------

Create a dictionary that reads data from the roles_dict table:

CREATE DICTIONARY roles_dict_clickhouse (`id` Int32, `role` String DEFAULT 'no role assigned')
PRIMARY KEY id
SOURCE(CLICKHOUSE(HOST 'localhost' USER 'default' PASSWORD '' DATABASE 'default' TABLE 'roles_dict'))
LIFETIME(MIN 10 MAX 20)
LAYOUT(FLAT());

View information on dictionaries

You can obtain information about dictionaries configured on the server from the system.dictionaries system table:

SELECT * FROM system.dictionaries WHERE name = '<dictionary_name>' [FORMAT <format_name>];

Example

Query:

SELECT * FROM system.dictionaries WHERE name = 'roles_dict_clickhouse' FORMAT Vertical;

If a dictionary is configured but not loaded, the output can look like:

Row 1:
──────
database:                           default
name:                               roles_dict_clickhouse
uuid:                               4e28febc-b472-423c-bc14-bd29f195390f
status:                             NOT_LOADED
origin:                             4e28febc-b472-423c-bc14-bd29f195390f
type:
key.names:                          ['id']
key.types:                          ['UInt64']
attribute.names:                    ['role']
attribute.types:                    ['String']
bytes_allocated:                    0
hierarchical_index_bytes_allocated: 0
query_count:                        0
hit_rate:                           0
found_rate:                         0
element_count:                      0
load_factor:                        0
source:
lifetime_min:                       0
lifetime_max:                       0
loading_start_time:                 1970-01-01 00:00:00
last_successful_update_time:        1970-01-01 00:00:00
loading_duration:                   0
last_exception:
comment:

You can view names of all dictionaries that exist in the specified database with the SHOW DICTIONARIES query:

SHOW DICTIONARIES FROM <database_name>;

Load dictionaries

The dictionaries_lazy_load server configuration parameter defines whether dictionaries should be loaded on server startup or on the first use of a dictionary.

To force reloading of all dictionaries at once or a single dictionary, execute the corresponding query:

SYSTEM RELOAD DICTIONARIES;

SYSTEM RELOAD DICTIONARY <dictionary_name>;

After a dictionary is loaded, the system.dictionaries table changes the dictionary status to LOADED, and provides values of the dictionary configuration parameters and various metrics (such as the amount of RAM allocated or the number of queries since the dictionary was successfully loaded). For example:

SELECT * FROM system.dictionaries WHERE name = 'roles_dict_clickhouse' FORMAT Vertical;

Row 1:
──────
database:                           default
name:                               roles_dict_clickhouse
uuid:                               4e28febc-b472-423c-bc14-bd29f195390f
status:                             LOADED
origin:                             4e28febc-b472-423c-bc14-bd29f195390f
type:                               Flat
key.names:                          ['id']
key.types:                          ['UInt64']
attribute.names:                    ['role']
attribute.types:                    ['String']
bytes_allocated:                    21048
hierarchical_index_bytes_allocated: 0
query_count:                        0
hit_rate:                           1
found_rate:                         0
element_count:                      6
load_factor:                        0.005859375
source:                             ClickHouse: default.roles_dict
lifetime_min:                       10
lifetime_max:                       20
loading_start_time:                 2023-03-14 11:05:13
last_successful_update_time:        2023-03-14 11:05:13
loading_duration:                   0.001
last_exception:
comment:

Use dictionaries

ADQM supports a set of special functions to work with dictionaries.

dictGet, dictGetOrDefault, dictGetOrNull

The dictGet, dictGetOrDefault, and dictGetOrNull functions retrieve values from a dictionary.

dictGet('<dictionary_name>', <attr_name>, <id_expr>)
dictGetOrDefault('<dictionary_name>', <attr_name>, <id_expr>, <default_value_expr>)
dictGetOrNull('<dictionary_name>', <attr_name>, <id_expr>)

Arguments

<dictionary_name>

Name of a dictionary. Specify it as a string literal

<attr_name>

Name of a column in the dictionary source (string literal) or a tuple of column names (tuple)

<id_expr>

Key value. Specify it as an expression returning a value of the UInt64 type or a tuple of expressions (tuple), depending on whether the dictionary key is numeric or composite

<default_value_expr>

Value returned if the dictionary does not contain a row with the <id_expr> key. Specify it as an expression returning a value of the data type configured for the <attr_name> attribute, or a tuple of expressions

Returned value:

value of the dictionary attribute corresponding to the <id_expr> key, if ADQM is able to cast this value to the attribute’s data type specified in the dictionary configuration;
if the <id_expr> key is not found in the dictionary:
- dictGet returns the value specified for the attribute via the DEFAULT clause in the dictionary configuration;
- dictGetOrDefault returns the <default_value_expr> parameter value;
- dictGetOrNull returns NULL.

For example, the following query returns the role with the 20 identifier from the roles_dict_clickhouse dictionary:

SELECT dictGet('roles_dict_clickhouse', 'role', 20);

┌─dictGet('roles_dict_clickhouse', 'role', 20)─┐
│ owner                                        │
└──────────────────────────────────────────────┘

ADQM also supports specialized functions that cast dictionary attribute values to a specific data type regardless of the dictionary configuration:

dictGetInt8, dictGetInt16, dictGetInt32, dictGetInt64;
dictGetUInt8, dictGetUInt16, dictGetUInt32, dictGetUInt64;
dictGetFloat32, dictGetFloat64;
dictGetDate;
dictGetDateTime;
dictGetUUID;
dictGetString.

The syntax of these functions is the same as dictGet syntax described above. You can use any of these functions with the OrDefault suffix (for example, dictGetInt8OrDefault) and specify the <default_value_expr> parameter value to be returned if the key is not found in the dictionary.

dictHas

The dictHas function checks whether the specified key exists in a dictionary.

dictHas('<dictionary_name>', <id_expr>)

Arguments

<dictionary_name>

Name of a dictionary. Specify it as a string literal

<id_expr>

Key value. Specify it as an expression returning a value of the UInt64 type or a tuple of expressions (tuple), depending on whether the dictionary key is numeric or composite

The function returns:

0 — if the key is not found in the dictionary;
1 — if the key exists.

For example, the following query checks if the roles_dict_clickhouse dictionary contains a role with the 100 identifier:

SELECT dictHas('roles_dict_clickhouse', 100);

┌─dictHas('roles_dict_clickhouse', 100)─┐
│                                     0 │
└───────────────────────────────────────┘

Example

Create the users table with users and identifiers of their assigned roles:

CREATE TABLE users (user_id Int32, name String, role_id Int32) ENGINE = MergeTree ORDER BY user_id;

INSERT INTO users VALUES (1, 'john', 10), (2, 'mary', 30), (3, 'andrew', 40), (4, 'harry', 70), (5, 'ann', 50);

--user_id---name-----role_id--
│       1 │ john   │      10 │
│       2 │ mary   │      30 │
│       3 │ andrew │      40 │
│       4 │ harry  │      70 │
│       5 │ ann    │      50 │
------------------------------

Run the following query to read usernames from the users table and find their role names in the roles_dict_clickhouse dictionary by role_id:

SELECT  name AS user, dictGet('roles_dict_clickhouse', 'role', toUInt64(role_id)) AS role FROM users;

Result:

--user-----role--------------
│ john   │ admin            │
│ mary   │ author           │
│ andrew │ reviewer         │
│ harry  │ no role assigned │
│ ann    │ editor           │
-----------------------------

Detach and attach a dictionary

You can detach a dictionary with the DETACH DICTIONARY query:

DETACH DICTIONARY [IF EXISTS] [<database_name.]<dictionary_name> [ON CLUSTER <cluster_name>] [PERMANENTLY];

When this query is executed, the dictionary becomes "invisible" to the server, but its metadata is not deleted. If the dictionary was not detached permanently (the PERMANENTLY clause was not used in the query), the next time the server starts, it will read the dictionary metadata and "see" the dictionary again. If a dictionary has been detached permanently, the server will not automatically re-enable it.

You can re-attach a dictionary with the ATTACH query regardless of whether the dictionary was detached permanently or not:

ATTACH DICTIONARY  [IF NOT EXISTS] [<database_name.]<dictionary_name> [ON CLUSTER <cluster_name>];

Delete a dictionary

To delete a dictionary, use the DROP DICTIONARY query:

DROP DICTIONARY [IF EXISTS] [<database_name>.]<dictionary_name> [ON CLUSTER <cluster_name>];

Found a mistake? Seleсt text and press Ctrl+Enter to report it