Use flameGraph

Elena Dvoryadkina

Contents

The flameGraph function syntax
Build a flame graph
Example

For in-depth performance analysis and query debugging, ADQM/ClickHouse provides the system.trace_log system table, which stores stack traces collected by the query profiler. For example, it can help you understand why a query is running slowly or not as expected. By examining the sequence of operations in the trace, you can detect specific query functions that cause problems (requiring the most time or resources to execute). This information can be useful for optimizing your query and improving its performance.

To visually represent stack traces as a flame graph, you can use the flameGraph aggregate function. This function returns an array of strings based on data from the system.trace_log table, which you can then pass to the flamegraph.pl utility (or a similar tool) to create a flame graph.

The flameGraph function syntax

The basic syntax of the flameGraph function:

flameGraph(<trace>, [<size>], [<ptr>])

where:

<trace> — stack trace;
<size> — allocation size for memory profiling (the Memory, MemorySample, or MemoryPeak trace type), the default value is 1;
<ptr> — allocation address, the default value is 0.

Build a flame graph

To get a query profile as a flame graph based on information from the system.trace_log table, follow the instructions below.

First, set up the ADQM query profiler and prepare tools for rendering flame graphs on your ADQM hosts:

Enable the query profiler and configure it using the query_profiler_* and memory_profiler_* parameters (see Session Settings in the ClickHouse documentation). You can set these parameters at the ADQM user level, the clickhouse-client session level, or for an individual query to be profiled (for details, refer to the Configure user settings article). For example, it is recommended to set query_profiler_real_time_period_ns and/or query_profiler_cpu_time_period_ns to 10000000 to trace C++ functions that are running.
Install the adqm-clickhouse-debuginfo package. Make sure the allow_introspection_functions setting is enabled for a user who will connect to ADQM and use the flameGraph function. You can also enable it at the clickhouse-client session level:
- in the batch mode:
  
  $ clickhouse-client --allow_introspection_functions=1 -q "..." ...
- in the interactive mode:
  
  SET allow_introspection_functions = 1;
Download the flamegraph.pl script to ADQM hosts and grant it execution permission (with the chmod +x command). Check also that Perl is installed on the hosts.

Now, you can use the flameGraph function and the flamegraph.pl utility to generate a flame graph for a query to profile it. For example, the following command creates a flame graph (an interactive SVG file named <flame_cpu>) visualizing stack traces by CPU time for the query with the <query_id> identifier (<path_to> in this example is a path to the flamegraph.pl script on the ADQM host):

$ clickhouse-client \
    -q "SELECT arrayJoin(flameGraph(arrayReverse(trace))) \
        FROM system.trace_log \
        WHERE trace_type = 'CPU' AND query_id = '<query_id>'" \
    | <path_to>/flamegraph.pl  > <flame_cpu>.svg

Example

This example shows how you can use a flame graph to identify which functions are executed and in what order during the execution of a query.

Create a table without a primary key that has two columns — the first column is compressed with the LZ4 algorithm, and the second one stores data without compression:
CREATE TABLE test_table (a Int64 CODEC(LZ4), b Int64 CODEC(NONE)) ENGINE = MergeTree ORDER BY tuple();

Insert 100 million rows with test values into the table:

INSERT INTO test_table SELECT 1, 1 FROM numbers(100000000);

Using the following query, you can see the difference between columns in the size of stored data:

SELECT
    name,
    formatReadableSize(data_uncompressed_bytes) AS uncompressed_size,
    formatReadableSize(data_compressed_bytes) AS compressed_size,
    round(data_uncompressed_bytes / data_compressed_bytes, 2) AS ratio
FROM system.columns
WHERE table = 'test_table';

   ┌─name─┬─uncompressed_size─┬─compressed_size─┬─ratio─┐
1. │ a    │ 762.94 MiB        │ 3.45 MiB        │ 221.4 │
2. │ b    │ 762.94 MiB        │ 763.23 MiB      │     1 │
   └──────┴───────────────────┴─────────────────┴───────┘

Run two separate queries to calculate the sum of the values in each column:

SELECT sum(a) FROM test_table;

Query id: adde6caf-60c5-4820-8bb3-5f75a8edb98e

   ┌────sum(a)─┐
1. │ 100000000 │ -- 100.00 million
   └───────────┘

1 row in set. Elapsed: 0.072 sec. Processed 100.00 million rows, 800.00 MB (1.39 billion rows/s., 11.14 GB/s.)
Peak memory usage: 7.92 MiB.

SELECT sum(b) FROM test_table;

Query id: 8491a039-be0d-4a17-9130-ec286dfee039

   ┌────sum(b)─┐
1. │ 100000000 │ -- 100.00 million
   └───────────┘

1 row in set. Elapsed: 0.363 sec. Processed 100.00 million rows, 800.00 MB (275.73 million rows/s., 2.21 GB/s.)
Peak memory usage: 14.51 MiB.

For the first query, build a flame graph based on a CPU query profiler:
$ clickhouse-client \ -q "SELECT arrayJoin(flameGraph(arrayReverse(trace))) \ FROM system.trace_log \ WHERE trace_type = 'CPU' AND query_id = 'adde6caf-60c5-4820-8bb3-5f75a8edb98e'" \ | flamegraph.pl > flamegraph_cpu_1.svg
CPU flame graph for the query calculating the sum of the values in a column for which a data compression codec is used

The flame graph shows the full stack of C++ functions that were called. For example, note that decompression was performed during data reading (the decompress functions were called).
Create a flame graph for the second query:
$ clickhouse-client \ -q "SELECT arrayJoin(flameGraph(arrayReverse(trace))) \ FROM system.trace_log \ WHERE trace_type = 'CPU' AND query_id = '8491a039-be0d-4a17-9130-ec286dfee039'" \ | flamegraph.pl > flamegraph_cpu_2.svg
The graph shows that the CompressedReadBufferBase class is also used in this case, but it does not call the decompression functions (since data is read from a column where compression is not applied).

CPU flame graph for the query that reads data from the column without compression

Found a mistake? Seleсt text and press Ctrl+Enter to report it