Analyze queries

Elena Dvoryadkina

Contents

EXPLAIN AST
- Example
- Settings
EXPLAIN SYNTAX
- Example
EXPLAIN QUERY TREE
- Example
- Settings
EXPLAIN PLAN
- Example
- Settings
EXPLAIN PIPELINE
- Example
- Settings
EXPLAIN ESTIMATE
- Example
EXPLAIN TABLE OVERRIDE
- Example

To analyze how ADQM transforms and executes queries, use the EXPLAIN clause. Basic syntax:

EXPLAIN [AST | SYNTAX | QUERY TREE | PLAN | PIPELINE | ESTIMATE | TABLE OVERRIDE] [<setting_name> = <setting_value>, ...]
<query_to_analyze>
[FORMAT <output_format>]

where:

EXPLAIN [<explain_type>] [<explain_settings>] — type of the EXPLAIN analyzer (see the table below) and its additional settings;
<query_to_analyze> — query to be inspected;
FORMAT <output_format> — output format for analysis result.

EXPLAIN analyzer types
EXPLAIN type	EXPLAIN result
AST	Abstract syntax tree of the query
SYNTAX	Optimized version of the query syntax with which ADQM will replace the original text of the query when executing it
QUERY TREE	Query tree after optimization
PLAN	Query execution plan. This `EXPLAIN` type is used by default
PIPELINE	Query execution pipeline
ESTIMATE	Estimated number of rows, marks, and data parts that ADQM should process when executing the query
TABLE OVERRIDE	Pre-validation of table schema overrides to be specified for a MaterializedMySQL database

EXPLAIN AST

The EXPLAIN AST query returns an abstract syntax tree (AST) of the analyzed query. This analyzer supports all types of queries, not only SELECT.

Example

EXPLAIN AST SELECT sum(number) AS test_sums FROM numbers(5) GROUP BY number % 2;

┌─explain─────────────────────────────────────────┐
│ SelectWithUnionQuery (children 1)               │
│  ExpressionList (children 1)                    │
│   SelectQuery (children 3)                      │
│    ExpressionList (children 1)                  │
│     Function sum (alias test_sums) (children 1) │
│      ExpressionList (children 1)                │
│       Identifier number                         │
│    TablesInSelectQuery (children 1)             │
│     TablesInSelectQueryElement (children 1)     │
│      TableExpression (children 1)               │
│       Function numbers (children 1)             │
│        ExpressionList (children 1)              │
│         Literal UInt64_5                        │
│    ExpressionList (children 1)                  │
│     Function modulo (children 1)                │
│      ExpressionList (children 2)                │
│       Identifier number                         │
│       Literal UInt64_2                          │
└─────────────────────────────────────────────────┘

Settings

Enable the graph option for an EXPLAIN AST query to get an abstract syntax tree of the query as a graph described in the DOT graph description language. Then you can load this graph description into a special application that is able to draw it as a diagram (for example, use Graphviz — open-source software for graph visualization).

Below is an example of a query abstract syntax tree presented as a graph:

EXPLAIN AST graph=1 SELECT sum(number) AS test_sums FROM numbers(5) GROUP BY number % 2;

┌─explain────────────────────────────────────────────────────────────────────┐
│ digraph {                                                                  │
│     rankdir="UD";                                                          │
│     n140208872268568[label="SelectWithUnionQuery (children 1)"];           │
│     n140208877065784[label="ExpressionList (children 1)"];                 │
│     n140205632168472[label="SelectQuery (children 3)"];                    │
│     n140206404082776[label="ExpressionList (children 1)"];                 │
│     n140206025898776[label="Function sum (alias test_sums) (children 1)"]; │
│     n140206026032952[label="ExpressionList (children 1)"];                 │
│     n140205801742360[label="Identifier number"];                           │
│     ...                                                                    │
│ }                                                                          │
└────────────────────────────────────────────────────────────────────────────┘

EXPLAIN SYNTAX

When ADQM executes queries, it can automatically rewrite their syntax to improve performance. The EXPLAIN SYNTAX analyzer allows you to view syntax optimizations that ADQM applies — then you can use them to manually optimize complex queries for faster execution.

Example

Create a MergeTree table with 1 million rows and specify the partition key so that data is divided into 10 partitions:

CREATE TABLE test_table (value UInt64)
ENGINE = MergeTree
PARTITION BY value % 10
ORDER BY value AS
SELECT number
FROM numbers(1000000);

Run a query that calculates the arithmetic mean in the value column for the first partition:

SELECT avg(value)
FROM (SELECT value FROM test_table)
WHERE (value % 10) = 1;

As you can see in the query result, ADQM processes only 100 thousand rows (not a million):

┌─avg(value)─┐
│     499996 │
└────────────┘

1 row in set. Elapsed: 0.006 sec. Processed 100.00 thousand rows, 800.00 KB (16.40 million rows/s., 131.16 MB/s.)

Use EXPLAIN SYNTAX to analyze the query above:

EXPLAIN SYNTAX
SELECT avg(value)
FROM (SELECT value FROM test_table)
WHERE (value % 10) = 1;

The analysis shows that ADQM automatically optimizes the query by moving the WHERE filter inside the subquery — so it is enough to process only 1/10 part of the table data:

┌─explain────────────────────┐
│ SELECT avg(value)          │
│ FROM                       │
│ (                          │
│     SELECT value           │
│     FROM test_table        │
│     WHERE (value % 10) = 1 │
│ )                          │
│ WHERE (value % 10) = 1     │
└────────────────────────────┘

Thus, you can use this optimization in more complex queries and manually specify the necessary filters in subqueries.

EXPLAIN QUERY TREE

The EXPLAIN QUERY TREE analyzer dumps the query tree. This analyzer type is currently experimental. To use it, enable the option: SET allow_experimental_analyzer = 1.

Create a test table to run examples below

Examples of EXPLAIN QUERY TREE queries below use a test table that stores randomly generated pageview data. To reproduce these examples, prepare the page_views table:

CREATE TABLE page_views (time DateTime, page_name String, time_on_page UInt64)
ENGINE = MergeTree
ORDER BY time;

INSERT INTO page_views
SELECT
    toDate('2023-01-01 00:00:00') + (rand() % 182) AS time,
    ['landing', 'products', 'docs', 'about'][toInt32(randBinomial(4, 0.01)) + 1],
    rand() % 15
FROM numbers(1000000);

Example

SET allow_experimental_analyzer = 1;

EXPLAIN QUERY TREE
SELECT page_name, toDate(toStartOfMonth(time)) AS month, count() AS page_view_count
FROM page_views
WHERE page_name = 'landing'
GROUP BY page_name, month
ORDER BY page_name, month
LIMIT 3;

┌─explain───────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY id: 0                                                                                               │
│   PROJECTION COLUMNS                                                                                      │
│     page_name String                                                                                      │
│     month Date                                                                                            │
│     page_view_count UInt64                                                                                │
│   PROJECTION                                                                                              │
│     LIST id: 1, nodes: 3                                                                                  │
│       COLUMN id: 2, column_name: page_name, result_type: String, source_id: 3                             │
│       FUNCTION id: 4, function_name: toDate, function_type: ordinary, result_type: Date                   │
│         ARGUMENTS                                                                                         │
│           LIST id: 5, nodes: 1                                                                            │
│             FUNCTION id: 6, function_name: toStartOfMonth, function_type: ordinary, result_type: Date     │
│               ARGUMENTS                                                                                   │
│                 LIST id: 7, nodes: 1                                                                      │
│                   COLUMN id: 8, column_name: time, result_type: DateTime, source_id: 3                    │
│       FUNCTION id: 9, function_name: count, function_type: aggregate, result_type: UInt64                 │
│   JOIN TREE                                                                                               │
│     TABLE id: 3, table_name: default.page_views                                                           │
│   WHERE                                                                                                   │
│     FUNCTION id: 10, function_name: equals, function_type: ordinary, result_type: UInt8                   │
│       ARGUMENTS                                                                                           │
│         LIST id: 11, nodes: 2                                                                             │
│           COLUMN id: 2, column_name: page_name, result_type: String, source_id: 3                         │
│           CONSTANT id: 12, constant_value: 'landing', constant_value_type: String                         │
│   GROUP BY                                                                                                │
│     LIST id: 13, nodes: 2                                                                                 │
│       COLUMN id: 2, column_name: page_name, result_type: String, source_id: 3                             │
│       FUNCTION id: 4, function_name: toDate, function_type: ordinary, result_type: Date                   │
│         ARGUMENTS                                                                                         │
│           LIST id: 5, nodes: 1                                                                            │
│             FUNCTION id: 6, function_name: toStartOfMonth, function_type: ordinary, result_type: Date     │
│               ARGUMENTS                                                                                   │
│                 LIST id: 7, nodes: 1                                                                      │
│                   COLUMN id: 8, column_name: time, result_type: DateTime, source_id: 3                    │
│   ORDER BY                                                                                                │
│     LIST id: 14, nodes: 2                                                                                 │
│       SORT id: 15, sort_direction: ASCENDING, with_fill: 0                                                │
│         EXPRESSION                                                                                        │
│           COLUMN id: 2, column_name: page_name, result_type: String, source_id: 3                         │
│       SORT id: 16, sort_direction: ASCENDING, with_fill: 0                                                │
│         EXPRESSION                                                                                        │
│           FUNCTION id: 4, function_name: toDate, function_type: ordinary, result_type: Date               │
│             ARGUMENTS                                                                                     │
│               LIST id: 5, nodes: 1                                                                        │
│                 FUNCTION id: 6, function_name: toStartOfMonth, function_type: ordinary, result_type: Date │
│                   ARGUMENTS                                                                               │
│                     LIST id: 7, nodes: 1                                                                  │
│                       COLUMN id: 8, column_name: time, result_type: DateTime, source_id: 3                │
│   LIMIT                                                                                                   │
│     CONSTANT id: 17, constant_value: UInt64_3, constant_value_type: UInt64                                │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────┘

TIP

To speed up the execution of queries that calculate aggregates (for example, like the query above calculates the number of page views by month), you can use materialized views that automatically calculate intermediate data aggregate states — see an example in the Use materialized views article.

Settings

To customize the EXPLAIN QUERY TREE output, use the following parameters (set a parameter value to 1 or 0 to enable or disable the corresponding setting respectively):

run_passes — runs all query tree passes before dumping the query tree (the default is 1);
passes — specifies how many passes to run (the default is -1 that runs all the passes);
dump_passes — dumps information about used passes before dumping the query tree (the default is 0).

Example

The following analysis runs three query tree passes and outputs information about them before dumping the query tree:

EXPLAIN QUERY TREE passes=3, dump_passes=1
SELECT page_name, toDate(toStartOfMonth(time)) AS month, count() AS page_view_count
FROM page_views
WHERE page_name = 'landing'
GROUP BY page_name, month
ORDER BY page_name, month
LIMIT 3
FORMAT TSVRaw;

Pass 1 QueryAnalysis - Resolve type for each query expression. Replace identifiers, matchers with query expressions. Perform constant folding. Evaluate scalar subqueries.
Pass 2 FunctionToSubcolumns - Rewrite function to subcolumns, for example tupleElement(column, subcolumn) into column.subcolumn
Pass 3 CountDistinct - Optimize single countDistinct into count over subquery
QUERY id: 0
  PROJECTION COLUMNS
    page_name String
    month Date
    page_view_count UInt64
  PROJECTION
    LIST id: 1, nodes: 3
      COLUMN id: 2, column_name: page_name, result_type: String, source_id: 3
      FUNCTION id: 4, function_name: toDate, function_type: ordinary, result_type: Date
        ARGUMENTS
          LIST id: 5, nodes: 1
            FUNCTION id: 6, function_name: toStartOfMonth, function_type: ordinary, result_type: Date
              ARGUMENTS
                LIST id: 7, nodes: 1
                  COLUMN id: 8, column_name: time, result_type: DateTime, source_id: 3
      FUNCTION id: 9, function_name: count, function_type: aggregate, result_type: UInt64
  JOIN TREE
    TABLE id: 3, table_name: default.page_views
  WHERE
    FUNCTION id: 10, function_name: equals, function_type: ordinary, result_type: UInt8
      ARGUMENTS
        LIST id: 11, nodes: 2
          COLUMN id: 2, column_name: page_name, result_type: String, source_id: 3
          CONSTANT id: 12, constant_value: 'landing', constant_value_type: String
  GROUP BY
    LIST id: 13, nodes: 2
      COLUMN id: 2, column_name: page_name, result_type: String, source_id: 3
      FUNCTION id: 4, function_name: toDate, function_type: ordinary, result_type: Date
        ARGUMENTS
          LIST id: 5, nodes: 1
            FUNCTION id: 6, function_name: toStartOfMonth, function_type: ordinary, result_type: Date
              ARGUMENTS
                LIST id: 7, nodes: 1
                  COLUMN id: 8, column_name: time, result_type: DateTime, source_id: 3
  ORDER BY
    LIST id: 14, nodes: 2
      SORT id: 15, sort_direction: ASCENDING, with_fill: 0
        EXPRESSION
          COLUMN id: 2, column_name: page_name, result_type: String, source_id: 3
      SORT id: 16, sort_direction: ASCENDING, with_fill: 0
        EXPRESSION
          FUNCTION id: 4, function_name: toDate, function_type: ordinary, result_type: Date
            ARGUMENTS
              LIST id: 5, nodes: 1
                FUNCTION id: 6, function_name: toStartOfMonth, function_type: ordinary, result_type: Date
                  ARGUMENTS
                    LIST id: 7, nodes: 1
                      COLUMN id: 8, column_name: time, result_type: DateTime, source_id: 3
  LIMIT
    CONSTANT id: 17, constant_value: UInt64_3, constant_value_type: UInt64

EXPLAIN PLAN

The EXPLAIN PLAN analyzer displays the query execution plan as a tree that shows in which order clauses from the query are actually executed. Read query execution plan stages from bottom to top. This analyzer type is used by default if another type is not explicitly specified for EXPLAIN.

Example

The following analysis shows that when executing a SELECT query, ADQM first reads data (ReadFromStorage), then filters (Filter) and aggregates (Aggregating) it.

EXPLAIN SELECT sum(number) AS test_sums FROM numbers(10000) WHERE number > 100 GROUP BY number % 2;

┌─explain─────────────────────────────────────┐
│ Expression ((Projection + Before ORDER BY)) │
│   Aggregating                               │
│     Expression (Before GROUP BY)            │
│       Filter (WHERE)                        │
│         ReadFromStorage (SystemNumbers)     │
└─────────────────────────────────────────────┘

Settings

In EXPLAIN PLAN queries, you can use the following parameters to specify information the analysis output should include (if parameter value is 1 — the corresponding setting is enabled, if 0 — it is disabled):

header — adds the Header key to the query execution plan as an array of columns (column name and type are printed) for which values are calculated at each step.

Example

EXPLAIN header = 0 (default) EXPLAIN header = 1

EXPLAIN
SELECT sum(number) AS test_sums
FROM numbers(5) GROUP BY number % 2;

┌─explain─────────────────────────────────────┐
│ Expression ((Projection + Before ORDER BY)) │
│   Aggregating                               │
│     Expression (Before GROUP BY)            │
│       ReadFromStorage (SystemNumbers)       │
└─────────────────────────────────────────────┘

EXPLAIN header = 1
SELECT sum(number) AS test_sums
FROM numbers(5) GROUP BY number % 2;

┌─explain─────────────────────────────────────┐
│ Expression ((Projection + Before ORDER BY)) │
│ Header: test_sums UInt64                    │
│   Aggregating                               │
│   Header: modulo(number, 2) UInt8           │
│           sum(number) UInt64                │
│     Expression (Before GROUP BY)            │
│     Header: number UInt64                   │
│             modulo(number, 2) UInt8         │
│       ReadFromStorage (SystemNumbers)       │
│       Header: number UInt64                 │
└─────────────────────────────────────────────┘

description — prints a description for each step of the query execution plan.

Example

EXPLAIN description = 0 EXPLAIN description = 1 (default)

EXPLAIN description = 0
SELECT sum(number) AS test_sums
FROM numbers(5) GROUP BY number % 2;

┌─explain───────────────┐
│ Expression            │
│   Aggregating         │
│     Expression        │
│       ReadFromStorage │
└───────────────────────┘

EXPLAIN
SELECT sum(number) AS test_sums
FROM numbers(5) GROUP BY number % 2;

┌─explain─────────────────────────────────────┐
│ Expression ((Projection + Before ORDER BY)) │
│   Aggregating                               │
│     Expression (Before GROUP BY)            │
│       ReadFromStorage (SystemNumbers)       │
└─────────────────────────────────────────────┘

indexes — adds the Indexes key to the query execution plan as an array of used indexes. Each index contains the Type key (MinMax, Partition, PrimaryKey, or Skip) and optional keys:

Name — index name (for Skip indexes only);
Keys — array of columns used by the index;
Condition — used condition;
Description — index description (for Skip indexes only);
Parts — number of parts filtered by the applied index;
Granules — number of granules filtered by the applied index.

This parameter is only supported for analyzing queries that access MergeTree tables.

Example

EXPLAIN actions = 0 (default) EXPLAIN actions = 1

EXPLAIN indexes = 0
SELECT sum(value)
FROM test_table
WHERE (value % 10) = 1;

┌─explain────────────────────────────────────────┐
│ Expression ((Projection + Before ORDER BY))    │
│   Aggregating                                  │
│     Expression (Before GROUP BY)               │
│       Filter (WHERE)                           │
│         ReadFromMergeTree (default.test_table) │
└────────────────────────────────────────────────┘

EXPLAIN indexes = 1
SELECT sum(value)
FROM test_table
WHERE (value % 10) = 1;

┌─explain────────────────────────────────────────────────────┐
│ Expression ((Projection + Before ORDER BY))                │
│   Aggregating                                              │
│     Expression (Before GROUP BY)                           │
│       Filter (WHERE)                                       │
│         ReadFromMergeTree (default.test_table)             │
│         Indexes:                                           │
│           MinMax                                           │
│             Condition: true                                │
│             Parts: 10/10                                   │
│             Granules: 120/120                              │
│           Partition                                        │
│             Keys:                                          │
│               moduloLegacy(value, 10)                      │
│             Condition: (moduloLegacy(value, 10) in [1, 1]) │
│             Parts: 1/10                                    │
│             Granules: 12/120                               │
│           PrimaryKey                                       │
│             Condition: true                                │
│             Parts: 1/1                                     │
│             Granules: 12/12                                │
└────────────────────────────────────────────────────────────┘

actions — prints detailed information about the actions performed at each step of the query execution plan (the Actions key).

Example

EXPLAIN actions = 0 (default) EXPLAIN actions = 1

EXPLAIN
SELECT sum(number) AS test_sums
FROM numbers(5) GROUP BY number % 2;

┌─explain─────────────────────────────────────┐
│ Expression ((Projection + Before ORDER BY)) │
│   Aggregating                               │
│     Expression (Before GROUP BY)            │
│       ReadFromStorage (SystemNumbers)       │
└─────────────────────────────────────────────┘

EXPLAIN actions = 1
SELECT sum(number) AS test_sums
FROM numbers(5) GROUP BY number % 2;

┌─explain─────────────────────────────────────────────────────────────────────────┐
│ Expression ((Projection + Before ORDER BY))                                     │
│ Actions: INPUT : 0 -> sum(number) UInt64 : 0                                    │
│          ALIAS sum(number) :: 0 -> test_sums UInt64 : 1                         │
│ Positions: 1                                                                    │
│   Aggregating                                                                   │
│   Keys: modulo(number, 2)                                                       │
│   Aggregates:                                                                   │
│       sum(number)                                                               │
│         Function: sum(UInt64) → UInt64                                          │
│         Arguments: number                                                       │
│   Skip merging: 0                                                               │
│     Expression (Before GROUP BY)                                                │
│     Actions: INPUT : 0 -> number UInt64 : 0                                     │
│              COLUMN Const(UInt8) -> 2 UInt8 : 1                                 │
│              FUNCTION modulo(number : 0, 2 :: 1) -> modulo(number, 2) UInt8 : 2 │
│     Positions: 0 2                                                              │
│       ReadFromStorage (SystemNumbers)                                           │
└─────────────────────────────────────────────────────────────────────────────────┘

json — outputs the query execution plan in JSON format. Each node is a dictionary, which always contains the Node Type (a step name) and Plans (an array with child step descriptions) keys. Other optional keys may be added depending on a node type and settings.

To avoid unnecessary escaping, it is recommended to use the TSVRaw format for analysis result output.

Example

EXPLAIN json = 0 (default) EXPLAIN json = 1

EXPLAIN
SELECT sum(number) AS test_sums
FROM numbers(5) GROUP BY number % 2;

┌─explain─────────────────────────────────────┐
│ Expression ((Projection + Before ORDER BY)) │
│   Aggregating                               │
│     Expression (Before GROUP BY)            │
│       ReadFromStorage (SystemNumbers)       │
└─────────────────────────────────────────────┘

EXPLAIN json = 1
SELECT sum(number) AS test_sums
FROM numbers(5) GROUP BY number % 2 FORMAT TSVRaw;

[
  {
    "Plan": {
      "Node Type": "Expression",
      "Description": "(Projection + Before ORDER BY)",
      "Plans": [
        {
          "Node Type": "Aggregating",
          "Plans": [
            {
              "Node Type": "Expression",
              "Description": "Before GROUP BY",
              "Plans": [
                {
                  "Node Type": "ReadFromStorage",
                  "Description": "SystemNumbers"
                }
              ]
            }
          ]
        }
      ]
    }
  }
]

EXPLAIN PIPELINE

The EXPLAIN PIPELINE analyzer displays the query execution pipeline that shows how many parallel threads are used to execute each step of the query plan.

Example

EXPLAIN PIPELINE SELECT sum(number) AS test_sums FROM numbers(5) GROUP BY number % 2;

┌─explain───────────────────┐
│ (Expression)              │
│ ExpressionTransform × 8   │
│   (Aggregating)           │
│   Resize 1 → 8            │
│     AggregatingTransform  │
│       (Expression)        │
│       ExpressionTransform │
│         (ReadFromStorage) │
│         Limit             │
│           Numbers 0 → 1   │
└───────────────────────────┘

Settings

To configure the EXPLAIN PIPELINE query output, use the following parameters (set the parameter value to 1 or 0 to enable or disable the corresponding setting respectively):

header — prints a header for each output port (the default is 0).

Example

EXPLAIN PIPELINE header = 1
SELECT sum(number) AS test_sums FROM numbers(5) GROUP BY number % 2;

┌─explain─────────────────────────────────────────────────────────────────────────┐
│ (Expression)                                                                    │
│ ExpressionTransform × 8                                                         │
│ Header: test_sums UInt64: test_sums UInt64 UInt64(size = 0)                     │
│   (Aggregating)                                                                 │
│   Resize 1 → 8                                                                  │
│   Header × 8 : modulo(number, 2) UInt8: modulo(number, 2) UInt8 UInt8(size = 0) │
│                 sum(number) UInt64: sum(number) UInt64 UInt64(size = 0)         │
│     AggregatingTransform                                                        │
│     Header: modulo(number, 2) UInt8: modulo(number, 2) UInt8 UInt8(size = 0)    │
│             sum(number) UInt64: sum(number) UInt64 UInt64(size = 0)             │
│       (Expression)                                                              │
│       ExpressionTransform                                                       │
│       Header: number UInt64: number UInt64 UInt64(size = 0)                     │
│               modulo(number, 2) UInt8: modulo(number, 2) UInt8 UInt8(size = 0)  │
│         (ReadFromStorage)                                                       │
│         Limit                                                                   │
│         Header: number UInt64: number UInt64 UInt64(size = 0)                   │
│           Numbers 0 → 1                                                         │
│           Header: number UInt64: number UInt64 UInt64(size = 0)                 │
└─────────────────────────────────────────────────────────────────────────────────┘

graph — prints the query execution pipeline as a graph described in the DOT graph description language (the default is 0). Load the provided syntax into an application for graph visualization (for example, Graphviz) to view the visual representation of the graph.

Example

EXPLAIN PIPELINE graph = 1
SELECT sum(number) AS test_sums FROM numbers(5) GROUP BY number % 2 FORMAT TSV;

digraph
{
  rankdir="LR";
  { node [shape = rect]
    subgraph cluster_0 {
      label ="Expression";
      style=filled;
      color=lightgrey;
      node [style=filled,color=white];
      { rank = same;
        n6 [label="ExpressionTransform × 8"];
      }
    }
    subgraph cluster_1 {
      label ="ReadFromStorage";
      style=filled;
      color=lightgrey;
      node [style=filled,color=white];
      { rank = same;
        n2 [label="Limit"];
        n1 [label="Numbers"];
      }
    }
    subgraph cluster_2 {
      label ="Expression";
      style=filled;
      color=lightgrey;
      node [style=filled,color=white];
      { rank = same;
        n3 [label="ExpressionTransform"];
      }
    }
    subgraph cluster_3 {
      label ="Aggregating";
      style=filled;
      color=lightgrey;
      node [style=filled,color=white];
      { rank = same;
        n4 [label="AggregatingTransform"];
        n5 [label="Resize"];
      }
    }
  }
  n2 -> n3 [label=""];
  n1 -> n2 [label=""];
  n3 -> n4 [label=""];
  n4 -> n5 [label=""];
  n5 -> n6 [label="× 8"];
}

compact — prints a graph in the compact mode if the graph setting is enabled (the default is 1).

Example

EXPLAIN PIPELINE graph = 1, compact = 0
SELECT sum(number) AS test_sums FROM numbers(5) GROUP BY number % 2 FORMAT TSV;

digraph
{
  rankdir="LR";
  { node [shape = rect]
    n140206024061464[label="Numbers"];
    n140205465170584[label="Limit"];
    n140205620507928[label="ExpressionTransform"];
    n140205801688600[label="AggregatingTransform"];
    n140205649781464[label="Resize"];
    n140205630520728[label="ExpressionTransform"];
    n140205788048536[label="ExpressionTransform"];
    n140205620512152[label="ExpressionTransform"];
    n140206023882392[label="ExpressionTransform"];
    n140206024060696[label="ExpressionTransform"];
    n140205620509080[label="ExpressionTransform"];
    n140205717371544[label="ExpressionTransform"];
    n140206023882008[label="ExpressionTransform"];
  }
  n140206024061464 -> n140205465170584;
  n140205465170584 -> n140205620507928;
  n140205620507928 -> n140205801688600;
  n140205801688600 -> n140205649781464;
  n140205649781464 -> n140205630520728;
  n140205649781464 -> n140205788048536;
  n140205649781464 -> n140205620512152;
  n140205649781464 -> n140206023882392;
  n140205649781464 -> n140206024060696;
  n140205649781464 -> n140205620509080;
  n140205649781464 -> n140205717371544;
  n140205649781464 -> n140206023882008;
}

EXPLAIN ESTIMATE

The EXPLAIN ESTIMATE analyzer shows the estimated number of rows, marks, and data parts to be read from a table when processing the query. This type of analysis can only be applied to queries that access MergeTree tables.

Example

Create a MergeTree table using options that specify how to divide data parts into granules:
- index_granularity — specifies the maximum number of data rows between marks of an index;
- write_final_mark — indicates whether to write the last index mark at the end of a data part (after the last byte).
CREATE TABLE test_table_estimate (i Int64) ENGINE = MergeTree ORDER BY i SETTINGS index_granularity = 16, write_final_mark = 0;
Insert data into the table by running the following query for three times:
INSERT INTO test_table_estimate SELECT number FROM numbers(128);

Run EXPLAIN ESTIMATE for a query that reads data from the table:

EXPLAIN ESTIMATE SELECT * FROM test_table_estimate;

The result of the query analysis:

┌─database─┬─table───────────────┬─parts─┬─rows─┬─marks─┐
│ default  │ test_table_estimate │     3 │  384 │    24 │
└──────────┴─────────────────────┴───────┴──────┴───────┘

EXPLAIN TABLE OVERRIDE

The EXPLAIN TABLE OVERRIDE query allows you to pre-check the correctness of table schema overrides, which can then be specified for a MaterializedMySQL database creation (see Table Overrides for details). If a table schema override would result in any error, an appropriate message will be displayed in the EXPLAIN TABLE OVERRIDE output (for example, when using Nullable columns for the PARTITION BY and ORDER BY clauses, for which it is not recommended).

To execute an EXPLAIN TABLE OVERRIDE query, use the mysql table function to specify a connection to a remote MySQL table.

Example

On a remote MySQL server, create a table in the mysql_db database:

CREATE TABLE mysql_table (id INT PRIMARY KEY, date DATETIME DEFAULT now());

In ADQM, run the following query to check if the table can be properly partitioned by the date column as specified (replace <host>:<port> with the MySQL server address, and <user_name> and <user_password> with a name and password of a user that accesses the MySQL database):
EXPLAIN TABLE OVERRIDE mysql('<host>:<port>', 'mysql_db', 'mysql_table', '<user_name>', '<user_password>') PARTITION BY toYYYYMM(date);
In the analysis output, ADQM will throw an exception with a message about incorrect partitioning key override and recommendation on how to fix it:
Exception: PARTITION BY override refers to nullable column `date` (use assumeNotNull() if the column does not in fact contain NULL values). (INVALID_TABLE_OVERRIDE) ....`

Change the PARTITION BY override according to the provided recommendation:

EXPLAIN TABLE OVERRIDE mysql('<host>:<port>', 'mysql_db', 'mysql_table', 'root', 'my-secret-pw')
PARTITION BY toYYYYMM(assumeNotNull(date));

Now the validation is passed:

┌─explain──────────────────────────────────────────────┐
│ PARTITION BY uses columns: `date` Nullable(DateTime) │
└──────────────────────────────────────────────────────┘

Found a mistake? Seleсt text and press Ctrl+Enter to report it