Analyze queries
To analyze how ADQM transforms and executes queries, use the EXPLAIN
clause. Basic syntax:
EXPLAIN [AST | SYNTAX | QUERY TREE | PLAN | PIPELINE | ESTIMATE | TABLE OVERRIDE] [<setting_name> = <setting_value>, ...]
<query_to_analyze>
[FORMAT <output_format>]
where:
-
EXPLAIN [<explain_type>] [<explain_settings>]
— type of theEXPLAIN
analyzer (see the table below) and its additional settings; -
<query_to_analyze>
— query to be inspected; -
FORMAT <output_format>
— output format for analysis result.
EXPLAIN type | EXPLAIN result |
---|---|
Abstract syntax tree of the query |
|
Optimized version of the query syntax with which ADQM will replace the original text of the query when executing it |
|
Query tree after optimization |
|
Query execution plan. This |
|
Query execution pipeline |
|
Estimated number of rows, marks, and data parts that ADQM should process when executing the query |
|
Pre-validation of table schema overrides to be specified for a MaterializedMySQL database |
EXPLAIN AST
The EXPLAIN AST
query returns an abstract syntax tree (AST) of the analyzed query. This analyzer supports all types of queries, not only SELECT
.
Example
EXPLAIN AST SELECT sum(number) AS test_sums FROM numbers(5) GROUP BY number % 2;
┌─explain─────────────────────────────────────────┐ │ SelectWithUnionQuery (children 1) │ │ ExpressionList (children 1) │ │ SelectQuery (children 3) │ │ ExpressionList (children 1) │ │ Function sum (alias test_sums) (children 1) │ │ ExpressionList (children 1) │ │ Identifier number │ │ TablesInSelectQuery (children 1) │ │ TablesInSelectQueryElement (children 1) │ │ TableExpression (children 1) │ │ Function numbers (children 1) │ │ ExpressionList (children 1) │ │ Literal UInt64_5 │ │ ExpressionList (children 1) │ │ Function modulo (children 1) │ │ ExpressionList (children 2) │ │ Identifier number │ │ Literal UInt64_2 │ └─────────────────────────────────────────────────┘
Settings
Enable the graph
option for an EXPLAIN AST
query to get an abstract syntax tree of the query as a graph described in the DOT graph description language. Then you can load this graph description into a special application that is able to draw it as a diagram (for example, use Graphviz — open-source software for graph visualization).
Below is an example of a query abstract syntax tree presented as a graph:
EXPLAIN AST graph=1 SELECT sum(number) AS test_sums FROM numbers(5) GROUP BY number % 2;
┌─explain────────────────────────────────────────────────────────────────────┐ │ digraph { │ │ rankdir="UD"; │ │ n140208872268568[label="SelectWithUnionQuery (children 1)"]; │ │ n140208877065784[label="ExpressionList (children 1)"]; │ │ n140205632168472[label="SelectQuery (children 3)"]; │ │ n140206404082776[label="ExpressionList (children 1)"]; │ │ n140206025898776[label="Function sum (alias test_sums) (children 1)"]; │ │ n140206026032952[label="ExpressionList (children 1)"]; │ │ n140205801742360[label="Identifier number"]; │ │ ... │ │ } │ └────────────────────────────────────────────────────────────────────────────┘
EXPLAIN SYNTAX
When ADQM executes queries, it can automatically rewrite their syntax to improve performance. The EXPLAIN SYNTAX
analyzer allows you to view syntax optimizations that ADQM applies — then you can use them to manually optimize complex queries for faster execution.
Example
-
Create a MergeTree table with 1 million rows and specify the partition key so that data is divided into 10 partitions:
CREATE TABLE test_table (value UInt64) ENGINE = MergeTree PARTITION BY value % 10 ORDER BY value AS SELECT number FROM numbers(1000000);
-
Run a query that calculates the arithmetic mean in the
value
column for the first partition:SELECT avg(value) FROM (SELECT value FROM test_table) WHERE (value % 10) = 1;
As you can see in the query result, ADQM processes only 100 thousand rows (not a million):
┌─avg(value)─┐ │ 499996 │ └────────────┘ 1 row in set. Elapsed: 0.006 sec. Processed 100.00 thousand rows, 800.00 KB (16.40 million rows/s., 131.16 MB/s.)
-
Use
EXPLAIN SYNTAX
to analyze the query above:EXPLAIN SYNTAX SELECT avg(value) FROM (SELECT value FROM test_table) WHERE (value % 10) = 1;
-
The analysis shows that ADQM automatically optimizes the query by moving the
WHERE
filter inside the subquery — so it is enough to process only 1/10 part of the table data:┌─explain────────────────────┐ │ SELECT avg(value) │ │ FROM │ │ ( │ │ SELECT value │ │ FROM test_table │ │ WHERE (value % 10) = 1 │ │ ) │ │ WHERE (value % 10) = 1 │ └────────────────────────────┘
Thus, you can use this optimization in more complex queries and manually specify the necessary filters in subqueries.
EXPLAIN QUERY TREE
The EXPLAIN QUERY TREE
analyzer dumps the query tree. This analyzer type is currently experimental. To use it, enable the option: SET allow_experimental_analyzer = 1
.
Examples of EXPLAIN QUERY TREE
queries below use a test table that stores randomly generated pageview data. To reproduce these examples, prepare the page_views
table:
CREATE TABLE page_views (time DateTime, page_name String, time_on_page UInt64)
ENGINE = MergeTree
ORDER BY time;
INSERT INTO page_views
SELECT
toDate('2023-01-01 00:00:00') + (rand() % 182) AS time,
['landing', 'products', 'docs', 'about'][toInt32(randBinomial(4, 0.01)) + 1],
rand() % 15
FROM numbers(1000000);
Example
SET allow_experimental_analyzer = 1;
EXPLAIN QUERY TREE
SELECT page_name, toDate(toStartOfMonth(time)) AS month, count() AS page_view_count
FROM page_views
WHERE page_name = 'landing'
GROUP BY page_name, month
ORDER BY page_name, month
LIMIT 3;
┌─explain───────────────────────────────────────────────────────────────────────────────────────────────────┐ │ QUERY id: 0 │ │ PROJECTION COLUMNS │ │ page_name String │ │ month Date │ │ page_view_count UInt64 │ │ PROJECTION │ │ LIST id: 1, nodes: 3 │ │ COLUMN id: 2, column_name: page_name, result_type: String, source_id: 3 │ │ FUNCTION id: 4, function_name: toDate, function_type: ordinary, result_type: Date │ │ ARGUMENTS │ │ LIST id: 5, nodes: 1 │ │ FUNCTION id: 6, function_name: toStartOfMonth, function_type: ordinary, result_type: Date │ │ ARGUMENTS │ │ LIST id: 7, nodes: 1 │ │ COLUMN id: 8, column_name: time, result_type: DateTime, source_id: 3 │ │ FUNCTION id: 9, function_name: count, function_type: aggregate, result_type: UInt64 │ │ JOIN TREE │ │ TABLE id: 3, table_name: default.page_views │ │ WHERE │ │ FUNCTION id: 10, function_name: equals, function_type: ordinary, result_type: UInt8 │ │ ARGUMENTS │ │ LIST id: 11, nodes: 2 │ │ COLUMN id: 2, column_name: page_name, result_type: String, source_id: 3 │ │ CONSTANT id: 12, constant_value: 'landing', constant_value_type: String │ │ GROUP BY │ │ LIST id: 13, nodes: 2 │ │ COLUMN id: 2, column_name: page_name, result_type: String, source_id: 3 │ │ FUNCTION id: 4, function_name: toDate, function_type: ordinary, result_type: Date │ │ ARGUMENTS │ │ LIST id: 5, nodes: 1 │ │ FUNCTION id: 6, function_name: toStartOfMonth, function_type: ordinary, result_type: Date │ │ ARGUMENTS │ │ LIST id: 7, nodes: 1 │ │ COLUMN id: 8, column_name: time, result_type: DateTime, source_id: 3 │ │ ORDER BY │ │ LIST id: 14, nodes: 2 │ │ SORT id: 15, sort_direction: ASCENDING, with_fill: 0 │ │ EXPRESSION │ │ COLUMN id: 2, column_name: page_name, result_type: String, source_id: 3 │ │ SORT id: 16, sort_direction: ASCENDING, with_fill: 0 │ │ EXPRESSION │ │ FUNCTION id: 4, function_name: toDate, function_type: ordinary, result_type: Date │ │ ARGUMENTS │ │ LIST id: 5, nodes: 1 │ │ FUNCTION id: 6, function_name: toStartOfMonth, function_type: ordinary, result_type: Date │ │ ARGUMENTS │ │ LIST id: 7, nodes: 1 │ │ COLUMN id: 8, column_name: time, result_type: DateTime, source_id: 3 │ │ LIMIT │ │ CONSTANT id: 17, constant_value: UInt64_3, constant_value_type: UInt64 │ └───────────────────────────────────────────────────────────────────────────────────────────────────────────┘
TIP
To speed up the execution of queries that calculate aggregates (for example, like the query above calculates the number of page views by month), you can use materialized views that automatically calculate intermediate data aggregate states — see an example in the Use materialized views article.
|
Settings
To customize the EXPLAIN QUERY TREE
output, use the following parameters (set a parameter value to 1
or 0
to enable or disable the corresponding setting respectively):
-
run_passes
— runs all query tree passes before dumping the query tree (the default is1
); -
passes
— specifies how many passes to run (the default is-1
that runs all the passes); -
dump_passes
— dumps information about used passes before dumping the query tree (the default is0
).
The following analysis runs three query tree passes and outputs information about them before dumping the query tree:
EXPLAIN QUERY TREE passes=3, dump_passes=1
SELECT page_name, toDate(toStartOfMonth(time)) AS month, count() AS page_view_count
FROM page_views
WHERE page_name = 'landing'
GROUP BY page_name, month
ORDER BY page_name, month
LIMIT 3
FORMAT TSVRaw;
Pass 1 QueryAnalysis - Resolve type for each query expression. Replace identifiers, matchers with query expressions. Perform constant folding. Evaluate scalar subqueries. Pass 2 FunctionToSubcolumns - Rewrite function to subcolumns, for example tupleElement(column, subcolumn) into column.subcolumn Pass 3 CountDistinct - Optimize single countDistinct into count over subquery QUERY id: 0 PROJECTION COLUMNS page_name String month Date page_view_count UInt64 PROJECTION LIST id: 1, nodes: 3 COLUMN id: 2, column_name: page_name, result_type: String, source_id: 3 FUNCTION id: 4, function_name: toDate, function_type: ordinary, result_type: Date ARGUMENTS LIST id: 5, nodes: 1 FUNCTION id: 6, function_name: toStartOfMonth, function_type: ordinary, result_type: Date ARGUMENTS LIST id: 7, nodes: 1 COLUMN id: 8, column_name: time, result_type: DateTime, source_id: 3 FUNCTION id: 9, function_name: count, function_type: aggregate, result_type: UInt64 JOIN TREE TABLE id: 3, table_name: default.page_views WHERE FUNCTION id: 10, function_name: equals, function_type: ordinary, result_type: UInt8 ARGUMENTS LIST id: 11, nodes: 2 COLUMN id: 2, column_name: page_name, result_type: String, source_id: 3 CONSTANT id: 12, constant_value: 'landing', constant_value_type: String GROUP BY LIST id: 13, nodes: 2 COLUMN id: 2, column_name: page_name, result_type: String, source_id: 3 FUNCTION id: 4, function_name: toDate, function_type: ordinary, result_type: Date ARGUMENTS LIST id: 5, nodes: 1 FUNCTION id: 6, function_name: toStartOfMonth, function_type: ordinary, result_type: Date ARGUMENTS LIST id: 7, nodes: 1 COLUMN id: 8, column_name: time, result_type: DateTime, source_id: 3 ORDER BY LIST id: 14, nodes: 2 SORT id: 15, sort_direction: ASCENDING, with_fill: 0 EXPRESSION COLUMN id: 2, column_name: page_name, result_type: String, source_id: 3 SORT id: 16, sort_direction: ASCENDING, with_fill: 0 EXPRESSION FUNCTION id: 4, function_name: toDate, function_type: ordinary, result_type: Date ARGUMENTS LIST id: 5, nodes: 1 FUNCTION id: 6, function_name: toStartOfMonth, function_type: ordinary, result_type: Date ARGUMENTS LIST id: 7, nodes: 1 COLUMN id: 8, column_name: time, result_type: DateTime, source_id: 3 LIMIT CONSTANT id: 17, constant_value: UInt64_3, constant_value_type: UInt64
EXPLAIN PLAN
The EXPLAIN PLAN
analyzer displays the query execution plan as a tree that shows in which order clauses from the query are actually executed. Read query execution plan stages from bottom to top. This analyzer type is used by default if another type is not explicitly specified for EXPLAIN
.
Example
The following analysis shows that when executing a SELECT
query, ADQM first reads data (ReadFromStorage
), then filters (Filter
) and aggregates (Aggregating
) it.
EXPLAIN SELECT sum(number) AS test_sums FROM numbers(10000) WHERE number > 100 GROUP BY number % 2;
┌─explain─────────────────────────────────────┐ │ Expression ((Projection + Before ORDER BY)) │ │ Aggregating │ │ Expression (Before GROUP BY) │ │ Filter (WHERE) │ │ ReadFromStorage (SystemNumbers) │ └─────────────────────────────────────────────┘
Settings
In EXPLAIN PLAN
queries, you can use the following parameters to specify information the analysis output should include (if parameter value is 1
— the corresponding setting is enabled, if 0
— it is disabled):
-
header
— adds theHeader
key to the query execution plan as an array of columns (column name and type are printed) for which values are calculated at each step.ExampleEXPLAIN header = 0 (default) EXPLAIN header = 1 EXPLAIN SELECT sum(number) AS test_sums FROM numbers(5) GROUP BY number % 2;
┌─explain─────────────────────────────────────┐ │ Expression ((Projection + Before ORDER BY)) │ │ Aggregating │ │ Expression (Before GROUP BY) │ │ ReadFromStorage (SystemNumbers) │ └─────────────────────────────────────────────┘
EXPLAIN header = 1 SELECT sum(number) AS test_sums FROM numbers(5) GROUP BY number % 2;
┌─explain─────────────────────────────────────┐ │ Expression ((Projection + Before ORDER BY)) │ │ Header: test_sums UInt64 │ │ Aggregating │ │ Header: modulo(number, 2) UInt8 │ │ sum(number) UInt64 │ │ Expression (Before GROUP BY) │ │ Header: number UInt64 │ │ modulo(number, 2) UInt8 │ │ ReadFromStorage (SystemNumbers) │ │ Header: number UInt64 │ └─────────────────────────────────────────────┘
-
description
— prints a description for each step of the query execution plan.ExampleEXPLAIN description = 0 EXPLAIN description = 1 (default) EXPLAIN description = 0 SELECT sum(number) AS test_sums FROM numbers(5) GROUP BY number % 2;
┌─explain───────────────┐ │ Expression │ │ Aggregating │ │ Expression │ │ ReadFromStorage │ └───────────────────────┘
EXPLAIN SELECT sum(number) AS test_sums FROM numbers(5) GROUP BY number % 2;
┌─explain─────────────────────────────────────┐ │ Expression ((Projection + Before ORDER BY)) │ │ Aggregating │ │ Expression (Before GROUP BY) │ │ ReadFromStorage (SystemNumbers) │ └─────────────────────────────────────────────┘
-
indexes
— adds theIndexes
key to the query execution plan as an array of used indexes. Each index contains theType
key (MinMax
,Partition
,PrimaryKey
, orSkip
) and optional keys:-
Name
— index name (forSkip
indexes only); -
Keys
— array of columns used by the index; -
Condition
— used condition; -
Description
— index description (forSkip
indexes only); -
Parts
— number of parts filtered by the applied index; -
Granules
— number of granules filtered by the applied index.
This parameter is only supported for analyzing queries that access MergeTree tables.
ExampleEXPLAIN actions = 0 (default) EXPLAIN actions = 1 EXPLAIN indexes = 0 SELECT sum(value) FROM test_table WHERE (value % 10) = 1;
┌─explain────────────────────────────────────────┐ │ Expression ((Projection + Before ORDER BY)) │ │ Aggregating │ │ Expression (Before GROUP BY) │ │ Filter (WHERE) │ │ ReadFromMergeTree (default.test_table) │ └────────────────────────────────────────────────┘
EXPLAIN indexes = 1 SELECT sum(value) FROM test_table WHERE (value % 10) = 1;
┌─explain────────────────────────────────────────────────────┐ │ Expression ((Projection + Before ORDER BY)) │ │ Aggregating │ │ Expression (Before GROUP BY) │ │ Filter (WHERE) │ │ ReadFromMergeTree (default.test_table) │ │ Indexes: │ │ MinMax │ │ Condition: true │ │ Parts: 10/10 │ │ Granules: 120/120 │ │ Partition │ │ Keys: │ │ moduloLegacy(value, 10) │ │ Condition: (moduloLegacy(value, 10) in [1, 1]) │ │ Parts: 1/10 │ │ Granules: 12/120 │ │ PrimaryKey │ │ Condition: true │ │ Parts: 1/1 │ │ Granules: 12/12 │ └────────────────────────────────────────────────────────────┘
-
-
actions
— prints detailed information about the actions performed at each step of the query execution plan (theActions
key).ExampleEXPLAIN actions = 0 (default) EXPLAIN actions = 1 EXPLAIN SELECT sum(number) AS test_sums FROM numbers(5) GROUP BY number % 2;
┌─explain─────────────────────────────────────┐ │ Expression ((Projection + Before ORDER BY)) │ │ Aggregating │ │ Expression (Before GROUP BY) │ │ ReadFromStorage (SystemNumbers) │ └─────────────────────────────────────────────┘
EXPLAIN actions = 1 SELECT sum(number) AS test_sums FROM numbers(5) GROUP BY number % 2;
┌─explain─────────────────────────────────────────────────────────────────────────┐ │ Expression ((Projection + Before ORDER BY)) │ │ Actions: INPUT : 0 -> sum(number) UInt64 : 0 │ │ ALIAS sum(number) :: 0 -> test_sums UInt64 : 1 │ │ Positions: 1 │ │ Aggregating │ │ Keys: modulo(number, 2) │ │ Aggregates: │ │ sum(number) │ │ Function: sum(UInt64) → UInt64 │ │ Arguments: number │ │ Skip merging: 0 │ │ Expression (Before GROUP BY) │ │ Actions: INPUT : 0 -> number UInt64 : 0 │ │ COLUMN Const(UInt8) -> 2 UInt8 : 1 │ │ FUNCTION modulo(number : 0, 2 :: 1) -> modulo(number, 2) UInt8 : 2 │ │ Positions: 0 2 │ │ ReadFromStorage (SystemNumbers) │ └─────────────────────────────────────────────────────────────────────────────────┘
-
json
— outputs the query execution plan in JSON format. Each node is a dictionary, which always contains theNode Type
(a step name) andPlans
(an array with child step descriptions) keys. Other optional keys may be added depending on a node type and settings.To avoid unnecessary escaping, it is recommended to use the TSVRaw format for analysis result output.
ExampleEXPLAIN json = 0 (default) EXPLAIN json = 1 EXPLAIN SELECT sum(number) AS test_sums FROM numbers(5) GROUP BY number % 2;
┌─explain─────────────────────────────────────┐ │ Expression ((Projection + Before ORDER BY)) │ │ Aggregating │ │ Expression (Before GROUP BY) │ │ ReadFromStorage (SystemNumbers) │ └─────────────────────────────────────────────┘
EXPLAIN json = 1 SELECT sum(number) AS test_sums FROM numbers(5) GROUP BY number % 2 FORMAT TSVRaw;
[ { "Plan": { "Node Type": "Expression", "Description": "(Projection + Before ORDER BY)", "Plans": [ { "Node Type": "Aggregating", "Plans": [ { "Node Type": "Expression", "Description": "Before GROUP BY", "Plans": [ { "Node Type": "ReadFromStorage", "Description": "SystemNumbers" } ] } ] } ] } } ]
EXPLAIN PIPELINE
The EXPLAIN PIPELINE
analyzer displays the query execution pipeline that shows how many parallel threads are used to execute each step of the query plan.
Example
EXPLAIN PIPELINE SELECT sum(number) AS test_sums FROM numbers(5) GROUP BY number % 2;
┌─explain───────────────────┐ │ (Expression) │ │ ExpressionTransform × 8 │ │ (Aggregating) │ │ Resize 1 → 8 │ │ AggregatingTransform │ │ (Expression) │ │ ExpressionTransform │ │ (ReadFromStorage) │ │ Limit │ │ Numbers 0 → 1 │ └───────────────────────────┘
Settings
To configure the EXPLAIN PIPELINE
query output, use the following parameters (set the parameter value to 1
or 0
to enable or disable the corresponding setting respectively):
-
header
— prints a header for each output port (the default is0
).ExampleEXPLAIN PIPELINE header = 1 SELECT sum(number) AS test_sums FROM numbers(5) GROUP BY number % 2;
┌─explain─────────────────────────────────────────────────────────────────────────┐ │ (Expression) │ │ ExpressionTransform × 8 │ │ Header: test_sums UInt64: test_sums UInt64 UInt64(size = 0) │ │ (Aggregating) │ │ Resize 1 → 8 │ │ Header × 8 : modulo(number, 2) UInt8: modulo(number, 2) UInt8 UInt8(size = 0) │ │ sum(number) UInt64: sum(number) UInt64 UInt64(size = 0) │ │ AggregatingTransform │ │ Header: modulo(number, 2) UInt8: modulo(number, 2) UInt8 UInt8(size = 0) │ │ sum(number) UInt64: sum(number) UInt64 UInt64(size = 0) │ │ (Expression) │ │ ExpressionTransform │ │ Header: number UInt64: number UInt64 UInt64(size = 0) │ │ modulo(number, 2) UInt8: modulo(number, 2) UInt8 UInt8(size = 0) │ │ (ReadFromStorage) │ │ Limit │ │ Header: number UInt64: number UInt64 UInt64(size = 0) │ │ Numbers 0 → 1 │ │ Header: number UInt64: number UInt64 UInt64(size = 0) │ └─────────────────────────────────────────────────────────────────────────────────┘
-
graph
— prints the query execution pipeline as a graph described in the DOT graph description language (the default is0
). Load the provided syntax into an application for graph visualization (for example, Graphviz) to view the visual representation of the graph.ExampleEXPLAIN PIPELINE graph = 1 SELECT sum(number) AS test_sums FROM numbers(5) GROUP BY number % 2 FORMAT TSV;
digraph { rankdir="LR"; { node [shape = rect] subgraph cluster_0 { label ="Expression"; style=filled; color=lightgrey; node [style=filled,color=white]; { rank = same; n6 [label="ExpressionTransform × 8"]; } } subgraph cluster_1 { label ="ReadFromStorage"; style=filled; color=lightgrey; node [style=filled,color=white]; { rank = same; n2 [label="Limit"]; n1 [label="Numbers"]; } } subgraph cluster_2 { label ="Expression"; style=filled; color=lightgrey; node [style=filled,color=white]; { rank = same; n3 [label="ExpressionTransform"]; } } subgraph cluster_3 { label ="Aggregating"; style=filled; color=lightgrey; node [style=filled,color=white]; { rank = same; n4 [label="AggregatingTransform"]; n5 [label="Resize"]; } } } n2 -> n3 [label=""]; n1 -> n2 [label=""]; n3 -> n4 [label=""]; n4 -> n5 [label=""]; n5 -> n6 [label="× 8"]; }
-
compact
— prints a graph in the compact mode if thegraph
setting is enabled (the default is1
).ExampleEXPLAIN PIPELINE graph = 1, compact = 0 SELECT sum(number) AS test_sums FROM numbers(5) GROUP BY number % 2 FORMAT TSV;
digraph { rankdir="LR"; { node [shape = rect] n140206024061464[label="Numbers"]; n140205465170584[label="Limit"]; n140205620507928[label="ExpressionTransform"]; n140205801688600[label="AggregatingTransform"]; n140205649781464[label="Resize"]; n140205630520728[label="ExpressionTransform"]; n140205788048536[label="ExpressionTransform"]; n140205620512152[label="ExpressionTransform"]; n140206023882392[label="ExpressionTransform"]; n140206024060696[label="ExpressionTransform"]; n140205620509080[label="ExpressionTransform"]; n140205717371544[label="ExpressionTransform"]; n140206023882008[label="ExpressionTransform"]; } n140206024061464 -> n140205465170584; n140205465170584 -> n140205620507928; n140205620507928 -> n140205801688600; n140205801688600 -> n140205649781464; n140205649781464 -> n140205630520728; n140205649781464 -> n140205788048536; n140205649781464 -> n140205620512152; n140205649781464 -> n140206023882392; n140205649781464 -> n140206024060696; n140205649781464 -> n140205620509080; n140205649781464 -> n140205717371544; n140205649781464 -> n140206023882008; }
EXPLAIN ESTIMATE
The EXPLAIN ESTIMATE
analyzer shows the estimated number of rows, marks, and data parts to be read from a table when processing the query. This type of analysis can only be applied to queries that access MergeTree tables.
Example
-
Create a MergeTree table using options that specify how to divide data parts into granules:
-
index_granularity
— specifies the maximum number of data rows between marks of an index; -
write_final_mark
— indicates whether to write the last index mark at the end of a data part (after the last byte).
CREATE TABLE test_table_estimate (i Int64) ENGINE = MergeTree ORDER BY i SETTINGS index_granularity = 16, write_final_mark = 0;
-
-
Insert data into the table by running the following query for three times:
INSERT INTO test_table_estimate SELECT number FROM numbers(128);
-
Run
EXPLAIN ESTIMATE
for a query that reads data from the table:EXPLAIN ESTIMATE SELECT * FROM test_table_estimate;
The result of the query analysis:
┌─database─┬─table───────────────┬─parts─┬─rows─┬─marks─┐ │ default │ test_table_estimate │ 3 │ 384 │ 24 │ └──────────┴─────────────────────┴───────┴──────┴───────┘
EXPLAIN TABLE OVERRIDE
The EXPLAIN TABLE OVERRIDE
query allows you to pre-check the correctness of table schema overrides, which can then be specified for a MaterializedMySQL database creation (see Table Overrides for details). If a table schema override would result in any error, an appropriate message will be displayed in the EXPLAIN TABLE OVERRIDE
output (for example, when using Nullable columns for the PARTITION BY
and ORDER BY
clauses, for which it is not recommended).
To execute an EXPLAIN TABLE OVERRIDE
query, use the mysql table function to specify a connection to a remote MySQL table.
Example
-
On a remote MySQL server, create a table in the
mysql_db
database:CREATE TABLE mysql_table (id INT PRIMARY KEY, date DATETIME DEFAULT now());
-
In ADQM, run the following query to check if the table can be properly partitioned by the
date
column as specified (replace<host>:<port>
with the MySQL server address, and<user_name>
and<user_password>
with a name and password of a user that accesses the MySQL database):EXPLAIN TABLE OVERRIDE mysql('<host>:<port>', 'mysql_db', 'mysql_table', '<user_name>', '<user_password>') PARTITION BY toYYYYMM(date);
In the analysis output, ADQM will throw an exception with a message about incorrect partitioning key override and recommendation on how to fix it:
Exception: PARTITION BY override refers to nullable column `date` (use assumeNotNull() if the column does not in fact contain NULL values). (INVALID_TABLE_OVERRIDE) ....`
-
Change the
PARTITION BY
override according to the provided recommendation:EXPLAIN TABLE OVERRIDE mysql('<host>:<port>', 'mysql_db', 'mysql_table', 'root', 'my-secret-pw') PARTITION BY toYYYYMM(assumeNotNull(date));
Now the validation is passed:
┌─explain──────────────────────────────────────────────┐ │ PARTITION BY uses columns: `date` Nullable(DateTime) │ └──────────────────────────────────────────────────────┘