Dictionaries
A dictionary is a key/value data store that is fully or partially stored in the RAM of the ADQM server and can be used as a reference to substitute data values by keys in the final sample. Dictionaries are an easier-to-use alternative to the JOIN operator.
A data source for a dictionary can be a local file (text or executable), an HTTP(s) resource, or another DBMS. ADQM automatically updates dictionaries and loads missing or changed values from the source. Access to dictionaries from ADQM is read-only.
You can configure ADQM to connect dictionaries from external sources using XML configuration files or DDL queries. This article contains details on the second approach (recommended) — it describes syntax and provides examples of queries to create/configure/delete dictionaries in ADQM, as well as to read data from dictionaries using special functions.
Create and configure a dictionary
To create and configure a dictionary in ADQM, use the CREATE DICTIONARY
query. The basic syntax is:
CREATE DICTIONARY [OR REPLACE][IF NOT EXISTS] [<database_name>.]<dictionary_name> [ON CLUSTER <cluster_name>]
( <key1_name> <key1_data_type>,
<key2_name> <key2_data_type>,
<attr1_name> <attr1_data_type> [DEFAULT <default_value1>] [EXPRESSION <expr1>] [HIERARCHICAL] [INJECTIVE] [IS_OBJECT_ID],
<attr2_name> <attr2_data_type> [DEFAULT <default_value2>] [EXPRESSION <expr2>] [HIERARCHICAL] [INJECTIVE] [IS_OBJECT_ID])
PRIMARY KEY <key1_name>, <key2_name>
SOURCE(<SOURCE_TYPE>([<param1_name> <param1_value> ... <paramN_name> <paramN_value>]))
LAYOUT(<LAYOUT_NAME>([<param_name> <param_value>]))
LIFETIME({MIN <min_value> MAX <max_value> | <max_value>})
[SETTINGS(<setting_name> = <setting_value>, <setting_name> = <setting_value>, ...)]
[COMMENT '<comment_text>']
where:
-
PRIMARY KEY
— dictionary key; -
SOURCE
— data source for a dictionary; -
LAYOUT
— way to store a dictionary in memory; -
LIFETIME
— frequency of dictionary updates; -
SETTINGS
— additional settings that can be specified for a dictionary if its data source is a local file, HTTP(s) resource, or ClickHouse table; -
COMMENT
— text comment.
Dictionary key and attributes
The dictionary structure is defined by a key and attributes specified as parameters of the CREATE DICTIONARY
query.
-
A key is a column in the dictionary source with unique values that are used to identify records and search for data values.
ADQM supports the following types of keys:-
Numeric key — a column of the UInt64 type.
CREATE DICTIONARY <dictionary_name> ( <key_name> UInt64, ... ) PRIMARY KEY <key_name> ...
where
<key_name>
is a name of a column with keys. -
Composite key is a tuple of columns of arbitrary types. It can consist of a single element (for example, you can use a string as a key).
CREATE DICTIONARY <dictionary_name> ( <key1_name> <key1_data_type>, <key2_name> <key2_data_type> ... ) PRIMARY KEY <key1_name>, <key2_name> ...
where
<key1_name>
,<key2_name>
are names of columns with keys, and<key1_data_type>
,<key2_data_type>
are key types.
-
-
Attributes are columns in the dictionary source that store data values to be retrieved by a key. An attribute is described as follows:
CREATE DICTIONARY <dictionary_name> ( ... <attribute_name> <attribute_data_type> [DEFAULT <default_value>] [EXPRESSION <expr>] [HIERARCHICAL] [INJECTIVE] [IS_OBJECT_ID] ) ...
Parameters and basic clauses<attribute_name>
Name of a column in the data source
<attribute_data_type>
Data type to which a value from the dictionary should be cast.
Supported types: UInt8, UInt16, UInt32, UInt64, Int8, Int16, Int32, Int64, Float32, Float64, UUID, Decimal32, Decimal64, Decimal128, Decimal256, String, Array.
The ability to specify Nullable types for attributes depends on how the dictionary is stored in memory (for example, Nullable types are currently not supported for attributes of
IP_TRIE
dictionaries)DEFAULT
Default value for an element not found by key. NULL can only be specified if the attribute type is Nullable
EXPRESSION
Expression executed on a value
HIERARCHICAL
Indicates that the attribute contains a value of a parent key for the current key (use it to configure a hierarchical dictionary)
INJECTIVE
Indicates that the attribute is injective (different keys correspond to different values of the attribute). In this case, if
GROUP BY
uses a function that retrieves the attribute value by key, this function is automatically removed fromGROUP BY
. This can optimize the data aggregation processIS_OBJECT_ID
Indicates that the query is executed for a MongoDB document by
ObjectID
Dictionary source
Use the SOURCE
clause in the CREATE DICTIONARY
query to specify a data source for a dictionary:
CREATE DICTIONARY <dictionary_name> (...)
...
SOURCE(<SOURCE_TYPE>([<param1_name> <param1_value> ... <paramN_name> <paramN_value>]))
...
where <SOURCE_TYPE>
is a data source type, <paramX_name> <paramX_value>
— source settings.
Value | Data source |
---|---|
FILE |
Local file (a source file should be located in the user_files folder) |
HTTP |
HTTP resource |
MYSQL |
Table from a MySQL database |
CLICKHOUSE |
Table from a ClickHouse database |
MONGODB |
Collection from a MongoDB database |
REDIS |
Redis database |
CASSANDRA |
Column family (table) from a Cassandra keyspace (database) |
POSTGRESQL |
Table from a PostgreSQL database |
ODBC |
Table from any database that has the ODBC driver |
You can find examples of connecting dictionaries to various types of sources in the Dictionary Sources section of the ClickHouse documentation.
Store dictionaries in memory
Use the LAYOUT
clause in the CREATE DICTIONARY
query to specify how to store a dictionary in memory:
CREATE DICTIONARY <dictionary_name> (...)
...
LAYOUT(<LAYOUT_TYPE>([<param_name> <param_value>]))
...
where <LAYOUT_TYPE>
is a way to store a dictionary in memory.
Value | Way to store a dictionary |
---|---|
FLAT |
A dictionary is completely stored in RAM as flat arrays |
HASHED, COMPLEX_KEY_HASHED |
A dictionary is completely stored in RAM as hash tables |
SPARSE_HASHED, COMPLEX_KEY_SPARSE_HASHED |
A dictionary is completely stored in RAM as hash tables, similar to |
HASHED_ARRAY, COMPLEX_KEY_HASHED_ARRAY |
A dictionary is completely stored in RAM. Each attribute is stored in an array. The key is stored as a hash table where a value is an index in an array of attributes |
RANGE_HASHED, COMPLEX_KEY_RANGE_HASHED |
The dictionary is stored in RAM as a hash table with an ordered array of ranges and their corresponding values |
CACHE, COMPLEX_KEY_CACHE |
A dictionary is stored in a cache that has a fixed number of cells containing frequently used elements |
SSD_CACHE, COMPLEX_KEY_SSD_CACHE |
A dictionary is stored in a cache, similar to |
DIRECT, COMPLEX_KEY_DIRECT |
A dictionary does not store data locally and interacts with the source while processing a query |
IP_TRIE |
This type of dictionary storage is for retrieving metadata (such as |
Layouts with the COMPLEX_KEY
prefix are used for dictionaries with composite keys.
For more details on all ways of storing dictionaries in memory, refer to the Storing Dictionaries in Memory section of the ClickHouse documentation.
Dictionary updates
Use the LIFETIME
clause in the CREATE DICTIONARY
query to specify a time interval for dictionary updates in seconds.
CREATE DICTIONARY <dictionary_name> (...)
...
LIFETIME(<max_value> | MIN <min_value> MAX <max_value>)
...
where:
-
<max_value>
is a dictionary update interval. If you setLIFETIME(0)
, dictionary updates are disabled. -
MIN <min_value> MAX <max_value>
is an interval within which the time to update the dictionary is randomly selected. This setting can be useful to distribute the load on the dictionary source when updating on a large number of servers. If you setMIN 0 MAX 0
, the dictionary is not reloaded by timeout. In this case, the dictionary data can be reloaded with the SYSTEM RELOAD DICTIONARY command.
Dictionary updates (except for loading on the first use) do not block queries — the old version of a dictionary is used during an update. If an error occurs during an update, the error is written to the server log, and queries continue using the old version of the dictionary.
The dictionary update logic depends on the data source type and how the dictionary is stored in memory — read the Dictionary Updates section of the ClickHouse documentation for details.
Example
-
Create a
roles_dict
table to be a data source for a dictionary:CREATE TABLE roles_dict (`id` Int32, `role` String) ENGINE = MergeTree() ORDER BY id;
INSERT INTO roles_dict VALUES (10, 'admin'), (20, 'owner'), (30, 'author'), (40, 'reviewer'), (50, 'editor'), (60, 'view only');
--id─┬─role------- │ 10 │ admin │ │ 20 │ owner │ │ 30 │ author │ │ 40 │ reviewer │ │ 50 │ editor │ │ 60 │ view only │ ------------------
-
Create a dictionary that reads data from the
roles_dict
table:CREATE DICTIONARY roles_dict_clickhouse (`id` Int32, `role` String DEFAULT 'no role assigned') PRIMARY KEY id SOURCE(CLICKHOUSE(HOST 'localhost' USER 'default' PASSWORD '' DATABASE 'default' TABLE 'roles_dict')) LIFETIME(MIN 10 MAX 20) LAYOUT(FLAT());
View information on dictionaries
You can obtain information about dictionaries configured on the server from the system.dictionaries system table:
SELECT * FROM system.dictionaries WHERE name = '<dictionary_name>' [FORMAT <format_name>];
Query:
SELECT * FROM system.dictionaries WHERE name = 'roles_dict_clickhouse' FORMAT Vertical;
If a dictionary is configured but not loaded, the output can look like:
Row 1: ────── database: default name: roles_dict_clickhouse uuid: 4e28febc-b472-423c-bc14-bd29f195390f status: NOT_LOADED origin: 4e28febc-b472-423c-bc14-bd29f195390f type: key.names: ['id'] key.types: ['UInt64'] attribute.names: ['role'] attribute.types: ['String'] bytes_allocated: 0 hierarchical_index_bytes_allocated: 0 query_count: 0 hit_rate: 0 found_rate: 0 element_count: 0 load_factor: 0 source: lifetime_min: 0 lifetime_max: 0 loading_start_time: 1970-01-01 00:00:00 last_successful_update_time: 1970-01-01 00:00:00 loading_duration: 0 last_exception: comment:
You can view names of all dictionaries that exist in the specified database with the SHOW DICTIONARIES query:
SHOW DICTIONARIES FROM <database_name>;
Load dictionaries
The dictionaries_lazy_load server configuration parameter defines whether dictionaries should be loaded on server startup or on the first use of a dictionary.
To force reloading of all dictionaries at once or a single dictionary, execute the corresponding query:
SYSTEM RELOAD DICTIONARIES;
SYSTEM RELOAD DICTIONARY <dictionary_name>;
After a dictionary is loaded, the system.dictionaries table changes the dictionary status to LOADED
, and provides values of the dictionary configuration parameters and various metrics (such as the amount of RAM allocated or the number of queries since the dictionary was successfully loaded). For example:
SELECT * FROM system.dictionaries WHERE name = 'roles_dict_clickhouse' FORMAT Vertical;
Row 1: ────── database: default name: roles_dict_clickhouse uuid: 4e28febc-b472-423c-bc14-bd29f195390f status: LOADED origin: 4e28febc-b472-423c-bc14-bd29f195390f type: Flat key.names: ['id'] key.types: ['UInt64'] attribute.names: ['role'] attribute.types: ['String'] bytes_allocated: 21048 hierarchical_index_bytes_allocated: 0 query_count: 0 hit_rate: 1 found_rate: 0 element_count: 6 load_factor: 0.005859375 source: ClickHouse: default.roles_dict lifetime_min: 10 lifetime_max: 20 loading_start_time: 2023-03-14 11:05:13 last_successful_update_time: 2023-03-14 11:05:13 loading_duration: 0.001 last_exception: comment:
Use dictionaries
ADQM supports a set of special functions to work with dictionaries.
dictGet, dictGetOrDefault, dictGetOrNull
The dictGet
, dictGetOrDefault
, and dictGetOrNull
functions retrieve values from a dictionary.
dictGet('<dictionary_name>', <attr_name>, <id_expr>)
dictGetOrDefault('<dictionary_name>', <attr_name>, <id_expr>, <default_value_expr>)
dictGetOrNull('<dictionary_name>', <attr_name>, <id_expr>)
<dictionary_name> |
Name of a dictionary. Specify it as a string literal |
<attr_name> |
Name of a column in the dictionary source (string literal) or a tuple of column names (tuple) |
<id_expr> |
Key value. Specify it as an expression returning a value of the UInt64 type or a tuple of expressions (tuple), depending on whether the dictionary key is numeric or composite |
<default_value_expr> |
Value returned if the dictionary does not contain a row with the |
Returned value:
-
value of the dictionary attribute corresponding to the
<id_expr>
key, if ADQM is able to cast this value to the attribute’s data type specified in the dictionary configuration; -
if the
<id_expr>
key is not found in the dictionary:-
dictGet
returns the value specified for the attribute via theDEFAULT
clause in the dictionary configuration; -
dictGetOrDefault
returns the<default_value_expr>
parameter value; -
dictGetOrNull
returns NULL.
-
For example, the following query returns the role with the 20
identifier from the roles_dict_clickhouse
dictionary:
SELECT dictGet('roles_dict_clickhouse', 'role', 20);
┌─dictGet('roles_dict_clickhouse', 'role', 20)─┐ │ owner │ └──────────────────────────────────────────────┘
ADQM also supports specialized functions that cast dictionary attribute values to a specific data type regardless of the dictionary configuration:
-
dictGetInt8
,dictGetInt16
,dictGetInt32
,dictGetInt64
; -
dictGetUInt8
,dictGetUInt16
,dictGetUInt32
,dictGetUInt64
; -
dictGetFloat32
,dictGetFloat64
; -
dictGetDate
; -
dictGetDateTime
; -
dictGetUUID
; -
dictGetString
.
The syntax of these functions is the same as dictGet
syntax described above. You can use any of these functions with the OrDefault
suffix (for example, dictGetInt8OrDefault
) and specify the <default_value_expr>
parameter value to be returned if the key is not found in the dictionary.
dictHas
The dictHas
function checks whether the specified key exists in a dictionary.
dictHas('<dictionary_name>', <id_expr>)
<dictionary_name> |
Name of a dictionary. Specify it as a string literal |
<id_expr> |
Key value. Specify it as an expression returning a value of the UInt64 type or a tuple of expressions (tuple), depending on whether the dictionary key is numeric or composite |
The function returns:
-
0
— if the key is not found in the dictionary; -
1
— if the key exists.
For example, the following query checks if the roles_dict_clickhouse
dictionary contains a role with the 100
identifier:
SELECT dictHas('roles_dict_clickhouse', 100);
┌─dictHas('roles_dict_clickhouse', 100)─┐ │ 0 │ └───────────────────────────────────────┘
Example
-
Create the
users
table with users and identifiers of their assigned roles:CREATE TABLE users (user_id Int32, name String, role_id Int32) ENGINE = MergeTree ORDER BY user_id;
INSERT INTO users VALUES (1, 'john', 10), (2, 'mary', 30), (3, 'andrew', 40), (4, 'harry', 70), (5, 'ann', 50);
--user_id---name-----role_id-- │ 1 │ john │ 10 │ │ 2 │ mary │ 30 │ │ 3 │ andrew │ 40 │ │ 4 │ harry │ 70 │ │ 5 │ ann │ 50 │ ------------------------------
-
Run the following query to read usernames from the
users
table and find their role names in theroles_dict_clickhouse
dictionary byrole_id
:SELECT name AS user, dictGet('roles_dict_clickhouse', 'role', toUInt64(role_id)) AS role FROM users;
Result:
--user-----role-------------- │ john │ admin │ │ mary │ author │ │ andrew │ reviewer │ │ harry │ no role assigned │ │ ann │ editor │ -----------------------------
Detach and attach a dictionary
You can detach a dictionary with the DETACH DICTIONARY
query:
DETACH DICTIONARY [IF EXISTS] [<database_name.]<dictionary_name> [ON CLUSTER <cluster_name>] [PERMANENTLY];
When this query is executed, the dictionary becomes "invisible" to the server, but its metadata is not deleted. If the dictionary was not detached permanently (the PERMANENTLY
clause was not used in the query), the next time the server starts, it will read the dictionary metadata and "see" the dictionary again. If a dictionary has been detached permanently, the server will not automatically re-enable it.
You can re-attach a dictionary with the ATTACH
query regardless of whether the dictionary was detached permanently or not:
ATTACH DICTIONARY [IF NOT EXISTS] [<database_name.]<dictionary_name> [ON CLUSTER <cluster_name>];