ADQM monitoring metrics

This article describes metrics for monitoring an ADQM cluster. For information on how to install monitoring, refer to the sections:

Metric types

Two groups of metrics are available for an ADQM cluster: system metrics and ClickHouse server metrics. If monitoring is installed via the Monitoring service (available in ADQM starting with version 23.8.2.7), metrics are collected from all services of ADQM, including ZooKeeper, ClickHouse Keeper, Chproxy.

System metrics indicate general characteristics of cluster hosts, usually related to resource consumption. Available system metrics are listed in the table below.

System metrics
Metrics group Description

cpu

CPU utilization

diskspace

Disk capacity

files

File statistics

iostat

Input/output operation performance

loadavg

System load averages

memory

Memory usage

netstat

Network connection statistics

network

Network interface performance

ClickHouse server metrics available for an ADQM cluster include:

  • Metrics — metrics that are calculated instantly and have up-to-date current values (for example, the number of simultaneously processed queries or the current replica delay value).

  • ProfileEvents — information about the number of events that have occurred in the system (for example, the number of SELECT queries processed since the start of the ClickHouse server).

  • AsynchronousMetrics — metrics that are periodically calculated in the background (for example, the amount of RAM in use).

Tables below list ClickHouse server metrics.

Metrics
Metric name Description

ActiveAsyncDrainedConnections

Number of active connections drained asynchronously

ActiveSyncDrainedConnections

Number of active connections drained synchronously

AsyncDrainedConnections

Number of connections drained asynchronously

AsynchronousReadWait

Number of threads waiting for asynchronous read

BackgroundBufferFlushSchedulePoolTask

Number of active tasks in BackgroundBufferFlushSchedulePool (this pool is used for periodic Buffer flushes)

BackgroundCommonPoolTask

Number of active tasks in an associated background pool

BackgroundDistributedSchedulePoolTask

Number of active tasks in BackgroundDistributedSchedulePool (this pool is used for distributed sends that are done in background)

BackgroundFetchesPoolTask

Number of active fetches in an associated background pool

BackgroundMergesAndMutationsPoolTask

Number of active merges and mutations in an associated background pool

BackgroundMessageBrokerSchedulePoolTask

Number of active tasks in BackgroundProcessingPool for message streaming

BackgroundMovePoolTask

Number of active tasks in BackgroundProcessingPool for moves

BackgroundSchedulePoolTask

Number of active tasks in BackgroundSchedulePool (this pool is used for periodic ReplicatedMergeTree tasks, like cleaning old data parts, altering data parts, replica re-initialization, etc.)

BrokenDistributedFilesToInsert

Number of files for asynchronous insertion into Distributed tables that has been marked as broken. This metric starts from 0 on start. Number of files for each shard is summed

CacheDetachedFileSegments

Number of existing detached cache file segments

CacheDictionaryUpdateQueueBatches

Number of "batches" (a set of keys) in update queue in CacheDictionaries

CacheDictionaryUpdateQueueKeys

Exact number of keys in update queue in CacheDictionaries

CacheFileSegments

Number of existing cache file segments

ContextLockWait

Number of threads waiting for lock in Context (this is global lock)

DelayedInserts

Number of INSERT queries that are throttled due to high number of active data parts for partition in a MergeTree table

DictCacheRequests

Number of requests in fly to data sources of dictionaries of cache type

DiskSpaceReservedForMerge

Disk space reserved for currently running background merges. It is slightly more than the total size of currently merging parts

DistributedFilesToInsert

Number of pending files to process for asynchronous insertion into Distributed tables. Number of files for each shard is summed

DistributedSend

Number of connections to remote servers sending data that was INSERTed into Distributed tables. Both synchronous and asynchronous mode

EphemeralNode

Number of ephemeral nodes hold in ZooKeeper

FilesystemCacheElements

Filesystem cache elements (file segments)

FilesystemCacheReadBuffers

Number of active cache buffers

FilesystemCacheSize

Filesystem cache size in bytes

GlobalThread

Number of threads in the global thread pool

GlobalThreadActive

Number of threads in the global thread pool running a task

HTTPConnection

Number of connections to an HTTP server

InterserverConnection

Number of connections from other replicas to fetch data parts

KafkaAssignedPartitions

Number of partitions Kafka tables currently assigned to

KafkaBackgroundReads

Number of background reads currently working (populating materialized views from Kafka)

KafkaConsumers

Number of active Kafka consumers

KafkaConsumersInUse

Number of consumers which are currently used by direct or background reads

KafkaConsumersWithAssignment

Number of active Kafka consumers which have some partitions assigned

KafkaLibrdkafkaThreads

Number of active librdkafka threads

KafkaProducers

Number of active Kafka producer created

KafkaWrites

Number of currently running inserts to Kafka

KeeperAliveConnections

Number of alive connections

KeeperOutstandingRequets

Number of outstanding requests

LocalThread

Number of threads in local thread pools. Threads in local thread pools are taken from the global thread pool

LocalThreadActive

Number of threads in local thread pools running a task

MMappedFileBytes

Sum size of mmapped file regions

MMappedFiles

Total number of mmapped files

MaxDDLEntryID

Max processed DDL entry of DDLWorker

MaxPushedDDLEntryID

Max DDL entry of DDLWorker pushed to ZooKeeper

MemoryTracking

Total amount of memory (in bytes) allocated by the server

Merge

Number of executing background merges

MySQLConnection

Number of client connections using the MySQL protocol

NetworkReceive

Number of threads receiving data from network. Only ClickHouse-related network interaction is included, not by third-party libraries

NetworkSend

Number of threads sending data to network. Only ClickHouse-related network interaction is included, not by third-party libraries

OpenFileForRead

Number of files open for reading

OpenFileForWrite

Number of files open for writing

PartMutation

Number of mutations (ALTER DELETE/UPDATE)

PartsActive

Active data parts (used by current and upcoming SELECTs)

PartsCommitted

Deprecated. See PartsActive

PartsCompact

Compact parts

PartsDeleteOnDestroy

Parts that were moved to another disk and should be deleted in own destructors

PartsDeleting

Not active data parts with identity refcounters, they are deleting right now by a cleaner

PartsInMemory

In-memory parts

PartsOutdated

Not active data parts, but could be used by only current SELECTs, could be deleted after SELECTs finish

PartsPreActive

Data part are in data_parts, but not used for SELECTs

PartsPreCommitted

Deprecated. See PartsPreActive

PartsTemporary

Data part are generating now, they are not in the data_parts list

PartsWide

Wide parts

PendingAsyncInsert

Number of asynchronous inserts that are waiting for flush

PostgreSQLConnection

Number of client connections using the PostgreSQL protocol

Query

Number of executing queries

QueryPreempted

Number of queries that are stopped and waiting due to the priority setting

QueryThread

Number of query processing threads

RWLockActiveReaders

Number of threads holding read lock in a table RWLock

RWLockActiveWriters

Number of threads holding write lock in a table RWLock

RWLockWaitingReaders

Number of threads waiting for read on a table RWLock

RWLockWaitingWriters

Number of threads waiting for write on a table RWLock

Read

Number of read (read, pread, io_getevents, etc.) syscalls in fly

ReadonlyReplica

Number of Replicated tables that are currently in the readonly state due to re-initialization after ZooKeeper session loss or due to startup without ZooKeeper configured

ReplicatedChecks

Number of data parts checking for consistency

ReplicatedFetch

Number of data parts being fetched from replica

ReplicatedSend

Number of data parts being sent to replicas

Revision

Revision of the server. It is a number incremented for every release or release candidate except patch releases

S3Requests

S3 requests

SendExternalTables

Number of connections that are sending data for external tables to remote servers. External tables are used to implement the GLOBAL IN and GLOBAL JOIN operators with distributed subqueries

SendScalars

Number of connections that are sending data for scalars to remote servers

StorageBufferBytes

Number of bytes in buffers of Buffer tables

StorageBufferRows

Number of rows in buffers of Buffer tables

SyncDrainedConnections

Number of connections drained synchronously

TCPConnection

Number of connections to a TCP server (clients with native interface), also includes server-server distributed query connections

TablesToDropQueueSize

Number of dropped tables, that are waiting for background data removal

VersionInteger

Version of the server as a single integer number in base-1000. For example, version 11.22.33 is translated to 11022033

Write

Number of write (write, pwrite, io_getevents, etc.) syscalls in fly

ZooKeeperRequest

Number of requests to ZooKeeper in fly

ZooKeeperSession

Number of sessions (connections) to ZooKeeper. Should be no more than one, because using more than one connection to ZooKeeper may lead to bugs due to lack of linearizability (stale reads) that ZooKeeper consistency model allows

ZooKeeperWatch

Number of watches (event subscriptions) in ZooKeeper

ProfileEvents
Event Description

AIORead

Number of reads with the Linux or FreeBSD AIO interface

AIOReadBytes

Number of bytes read with the Linux or FreeBSD AIO interface

AIOWrite

Number of writes with the Linux or FreeBSD AIO interface

AIOWriteBytes

Number of bytes written with the Linux or FreeBSD AIO interface

AggregationHashTablesInitializedAsTwoLevel

How many hash tables were inited as two-level for aggregation

AggregationPreallocatedElementsInHashTables

How many elements were preallocated in hash tables for aggregation

ArenaAllocBytes

Total bytes allocated in the internal arena used for small objects

ArenaAllocChunks

Total number of memory chunks allocated in the arena used for small objects

AsyncInsertBytes

Data size in bytes of asynchronous INSERT queries

AsyncInsertQuery

Same as InsertQuery, but only for asynchronous INSERT queries

AsynchronousReadWaitMicroseconds

Time spent in waiting for asynchronous reads

CachedReadBufferCacheWriteBytes

Bytes written from source (remote fs, etc.) to filesystem cache

CachedReadBufferCacheWriteMicroseconds

Time spent writing data into filesystem cache

CachedReadBufferReadFromCacheBytes

Bytes read from filesystem cache

CachedReadBufferReadFromCacheMicroseconds

Time spent reading from filesystem cache

CachedReadBufferReadFromSourceBytes

Bytes read from filesystem cache source (from remote fs, etc.)

CachedReadBufferReadFromSourceMicroseconds

Time spent reading from filesystem cache source (from remote filesystem, etc.)

CachedWriteBufferCacheWriteBytes

Bytes written from source (remote fs, etc.) to filesystem cache

CachedWriteBufferCacheWriteMicroseconds

Time spent writing data into filesystem cache

CannotRemoveEphemeralNode

Number of times an error happened while trying to remove ephemeral node

CannotWriteToWriteBufferDiscard

Number of stack traces dropped by a query profiler or signal handler because pipe is full or cannot write to pipe

CompileExpressionsBytes

Number of bytes used for compilation of expressions

CompileExpressionsMicroseconds

Total time spent for compilation of expressions to LLVM code

CompileFunction

Number of times a compilation of generated LLVM code (to create fused function for complex expressions) was initiated

CompiledFunctionExecute

Number of times a compiled function was executed

CompressedReadBufferBlocks

Number of compressed blocks (blocks of data that are compressed independent of each other) read from compressed sources (files, network)

CompressedReadBufferBytes

Number of uncompressed bytes (the number of bytes after decompression) read from compressed sources (files, network)

ContextLock

Number of times the lock of Context (global lock) was acquired or tried to acquire

CreatedHTTPConnections

Total amount of created HTTP connections (the counter increases each time a connection is created)

CreatedLogEntryForMerge

Number of successfully created log entries to merge parts in ReplicatedMergeTree tables

CreatedLogEntryForMutation

Number of successfully created log entries to mutate parts in ReplicatedMergeTree tables

CreatedReadBufferDirectIO

Number of times a read buffer with O_DIRECT was created for reading data (while choosing among other read methods)

CreatedReadBufferDirectIOFailed

Number of times a read buffer with O_DIRECT was attempted to be created for reading data (while choosing among other read methods), but the OS did not allow it (due to lack of filesystem support or other reasons) and we fallen back to the ordinary reading method

CreatedReadBufferMMap

Number of times a read buffer using mmap was created for reading data (while choosing among other read methods)

CreatedReadBufferMMapFailed

Number of times a read buffer with mmap was attempted to be created for reading data (while choosing among other read methods), but the OS did not allow it (due to lack of filesystem support or other reasons) and we fallen back to the ordinary reading method

CreatedReadBufferOrdinary

Number of times a read buffer for ordinary files was created

DNSError

Total count of errors in DNS resolution

DataAfterMergeDiffersFromReplica

Number of times data after merge is not byte-identical to the data on another replicas. There could be several reasons:

  • using newer version of a compression library after server update;

  • using another compression method;

  • non-deterministic compression algorithm (highly unlikely);

  • non-deterministic merge algorithm due to a logical error in code;

  • data corruption in memory due to a bug in code;

  • data corruption in memory due to a hardware issue;

  • manual modification of source data after server startup;

  • manual modification of checksums stored in ZooKeeper;

  • settings related to a data part format (for example, enable_mixed_granularity_parts) are different on different replicas.

The server detectes this situation and downloads a merged part from replica to force byte-identical result

DataAfterMutationDiffersFromReplica

Number of times data after mutation is not byte-identical to the data on another replicas. In addition to the reasons described in DataAfterMergeDiffersFromReplica, it is also possible due to non-deterministic mutation

DelayedInserts

Number of times the INSERT of a block to a MergeTree table was throttled due to the high number of active data parts for a partition

DelayedInsertsMilliseconds

Total number of milliseconds spent while the INSERT of a block to a MergeTree table was throttled due to the high number of active data parts for a partition

DictCacheKeysExpired

Number of keys looked up in the dictionaries of cache types and found in the cache but they were obsolete

DictCacheKeysHit

Number of keys looked up in the dictionaries of cache types and found in the cache

DictCacheKeysNotFound

Number of keys looked up in the dictionaries of cache types and not found

DictCacheKeysRequested

Number of keys requested from the data source for the dictionaries of cache types

DictCacheKeysRequestedFound

Number of keys requested from the data source for dictionaries of cache types and found in the data source

DictCacheKeysRequestedMiss

Number of keys requested from the data source for dictionaries of cache types but not found in the data source

DictCacheLockReadNs

Number of nanoseconds spend in waiting for read lock to lookup the data for the dictionaries of cache types

DictCacheLockWriteNs

Number of nanoseconds spend in waiting for write lock to update the data for the dictionaries of cache types

DictCacheRequestTimeNs

Number of nanoseconds spend in querying the external data sources for the dictionaries of cache types

DictCacheRequests

Number of bulk requests to the external data sources for the dictionaries of cache types

DirectorySync

Number of times the F_FULLFSYNC/fsync/fdatasync function was called for directories

DirectorySyncElapsedMicroseconds

Total time spent waiting for F_FULLFSYNC/fsync/fdatasync syscall for directories

DiskReadElapsedMicroseconds

Total time spent waiting for read syscall. This includes reads from page cache

DiskWriteElapsedMicroseconds

Total time spent waiting for write syscall. This includes writes to page cache

DistributedConnectionFailAtAll

Total count when distributed connection fails after all retries finished

DistributedConnectionFailTry

Total count when distributed connection fails with retry

DistributedConnectionMissingTable

Number of times a replica was rejected from a distributed query, because it did not contain a table needed for the query

DistributedConnectionStaleReplica

Number of times a replica was rejected from a distributed query, because some table needed for a query had replication lag higher than the configured threshold

DistributedDelayedInserts

Number of times the INSERT of a block to a Distributed table was throttled due to the high number of pending bytes

DistributedDelayedInsertsMilliseconds

Total number of milliseconds spent while the INSERT of a block to a Distributed table was throttled due to the high number of pending bytes

DistributedRejectedInserts

Number of times the INSERT of a block to a Distributed table was rejected with the Too many bytes exception due to the high number of pending bytes

DistributedSyncInsertionTimeoutExceeded

A timeout has exceeded while waiting for shards during synchronous insertion into a Distributed table (with insert_distributed_sync = 1)

DuplicatedInsertedBlocks

Number of times a block INSERTed to a ReplicatedMergeTree table was deduplicated

ExecuteShellCommand

Number of shell command executions

ExternalAggregationCompressedBytes

Number of bytes written to a disk for aggregation in external memory

ExternalAggregationMerge

Number of times temporary files were merged for aggregation in external memory

ExternalAggregationUncompressedBytes

Amount of data (uncompressed, before compression) written to a disk for aggregation in external memory

ExternalAggregationWritePart

Number of times a temporary file was written to a disk for aggregation in external memory

ExternalDataSourceLocalCacheReadBytes

Bytes read from local cache buffer in RemoteReadBufferCache

ExternalSortMerge

Number of times temporary files were merged for sorting in external memory

ExternalSortWritePart

Number of times a temporary file was written to a disk for sorting in external memory

FailedInsertQuery

Same as FailedQuery, but only for INSERT queries

FailedQuery

Number of failed queries

FailedSelectQuery

Same as FailedQuery, but only for SELECT queries

FileOpen

Number of files opened

FileSegmentCacheWriteMicroseconds

Metric per file segment. Time spend writing data to the cache

FileSegmentPredownloadMicroseconds

Metric per file segment. Time spent predownloading data to the cache. Predownloading — finishing file segment download (after someone who failed to do that) up to the point current thread was requested to do

FileSegmentReadMicroseconds

Metric per file segment. Time spend reading from a file

FileSegmentUsedBytes

Metric per file segment. How many bytes were actually used from the current file segment

FileSegmentWaitReadBufferMicroseconds

Metric per file segment. Time spend waiting for internal read buffer (includes cache waiting)

FileSync

Number of times the F_FULLFSYNC/fsync/fdatasync function was called for files

FileSyncElapsedMicroseconds

Total time spent waiting for F_FULLFSYNC/fsync/fdatasync syscall for files

FunctionExecute

Number of times a function was executed

HardPageFaults

Number of hard page faults. These are the page faults that required IO activity to handle

HedgedRequestsChangeReplica

Total count when timeout for changing replica expired in hedged requests

IOBufferAllocBytes

Total number of bytes allocated in IO buffers

IOBufferAllocs

Total number of allocations made in IO buffers

InsertQuery

Same as Query, but only for INSERT queries

InsertQueryTimeMicroseconds

Total time of INSERT queries

InsertedBytes

Number of bytes (uncompressed; for columns as they stored in memory) INSERTed to all tables

InsertedCompactParts

Number of parts inserted in the Compact format

InsertedInMemoryParts

Number of parts inserted in the InMemory format

InsertedRows

Number of rows INSERTed to all tables

InsertedWideParts

Number of parts inserted in the Wide format

KafkaBackgroundReads

Number of background reads populating materialized views from Kafka since server start

KafkaCommitFailures

Number of failed commits of consumed offsets to Kafka (usually is a sign of some data duplication)

KafkaCommits

Number of successful commits of consumed offsets to Kafka (normally should be the same as KafkaBackgroundReads)

KafkaConsumerErrors

Number of errors reported by librdkafka during polls

KafkaDirectReads

Number of direct selects from Kafka tables since server start

KafkaMessagesFailed

Number of Kafka messages ClickHouse failed to parse

KafkaMessagesPolled

Number of Kafka messages polled from librdkafka to ClickHouse

KafkaMessagesProduced

Number of messages produced to Kafka

KafkaMessagesRead

Number of Kafka messages already processed by ClickHouse

KafkaProducerErrors

Number of errors during producing the messages to Kafka

KafkaProducerFlushes

Number of explicit flushes to Kafka producer

KafkaRebalanceAssignments

Number of partition assignments (the final stage of consumer group rebalance)

KafkaRebalanceErrors

Number of failed consumer group rebalances

KafkaRebalanceRevocations

Number of partition revocations (the first stage of consumer group rebalance)

KafkaRowsRead

Number of rows parsed from Kafka messages

KafkaRowsRejected

Number of parsed rows which were later rejected (due to rebalances/errors or similar reasons). Those rows will be consumed again after the rebalance

KafkaRowsWritten

Number of rows inserted into Kafka tables

KafkaWrites

Number of writes (inserts) to Kafka tables

KeeperCommits

Number of successful commits

KeeperCommitsFailed

Number of failed commits

KeeperLatency

Keeper latency

KeeperPacketsReceived

Packets received by keeper server

KeeperPacketsSent

Packets sent by keeper server

KeeperReadSnapshot

Number of snapshot read (serialization)

KeeperRequestTotal

Total number of requests on keeper server

KeeperSaveSnapshot

Number of snapshot save

KeeperSnapshotApplys

Number of snapshot applying

KeeperSnapshotApplysFailed

Number of failed snapshot applying

KeeperSnapshotCreations

Number of snapshots creations

KeeperSnapshotCreationsFailed

Number of failed snapshot creations

MMappedFileCacheHits

Number of times a file has been found in the MMap cache (for the mmap read_method), so we didn’t have to mmap it again

MMappedFileCacheMisses

Number of times a file has not been found in the MMap cache (for the mmap read_method), so we had to mmap it again

MainConfigLoads

Number of times the main configuration was reloaded

MarkCacheHits

Number of times an entry has been found in the mark cache, so we didn’t have to load a mark file

MarkCacheMisses

Number of misses in the mark cache (this cache is used in the MergeTree engine for faster data reading)

MemoryOvercommitWaitTimeMicroseconds

Total time spent in waiting for memory to be freed in OvercommitTracker

Merge

Number of launched background merges

MergeTreeDataProjectionWriterBlocks

Number of blocks INSERTed to MergeTree tables projection. Each block forms a data part of level zero

MergeTreeDataProjectionWriterBlocksAlreadySorted

Number of blocks INSERTed to MergeTree tables projection that appeared to be already sorted

MergeTreeDataProjectionWriterCompressedBytes

Bytes written to filesystem for data INSERTed to MergeTree tables projection

MergeTreeDataProjectionWriterRows

Number of rows INSERTed to MergeTree tables projection

MergeTreeDataProjectionWriterUncompressedBytes

Uncompressed bytes (for columns as they stored in memory) INSERTed to MergeTree tables projection

MergeTreeDataWriterBlocks

Number of blocks INSERTed to MergeTree tables. Each block forms a data part of level zero

MergeTreeDataWriterBlocksAlreadySorted

Number of blocks INSERTed to MergeTree tables that appeared to be already sorted

MergeTreeDataWriterCompressedBytes

Bytes written to filesystem for data INSERTed to MergeTree tables

MergeTreeDataWriterRows

Number of rows INSERTed to MergeTree tables

MergeTreeDataWriterUncompressedBytes

Uncompressed bytes (for columns as they stored in memory) INSERTed to MergeTree tables

MergeTreeMetadataCacheDelete

Number of rocksdb deletes (used for the merge tree metadata cache)

MergeTreeMetadataCacheGet

Number of rocksdb reads (used for the merge tree metadata cache)

MergeTreeMetadataCacheHit

Number of times the read of meta file was done from the MergeTree metadata cache

MergeTreeMetadataCacheMiss

Number of times the read of meta file was not done from the MergeTree metadata cache

MergeTreeMetadataCachePut

Number of rocksdb puts (used for the merge tree metadata cache)

MergeTreeMetadataCacheSeek

Number of rocksdb seeks (used for the merge tree metadata cache)

MergedIntoCompactParts

Number of parts merged into the Compact format

MergedIntoInMemoryParts

Number of parts in merged into the InMemory format

MergedIntoWideParts

Number of parts merged into the Wide format

MergedRows

Rows read for background merges (the number of rows before merge)

MergedUncompressedBytes

Uncompressed bytes (for columns as they stored in memory) that was read for background merges. This is the number before merge

MergesTimeMilliseconds

Total time spent for background merges

NetworkReceiveBytes

Total number of bytes received from network. Only ClickHouse-related network interaction is included, not by third-party libraries

NetworkReceiveElapsedMicroseconds

Total time spent waiting for data to receive or receiving data from network. Only ClickHouse-related network interaction is included, not by third-party libraries

NetworkSendBytes

Total number of bytes send to network. Only ClickHouse-related network interaction is included, not by third-party libraries

NetworkSendElapsedMicroseconds

Total time spent waiting for data to send to network or sending data to network. Only ClickHouse-related network interaction is included, not by third-party libraries

NotCreatedLogEntryForMerge

Log entry to merge parts in ReplicatedMergeTree is not created due to concurrent log update by another replica

NotCreatedLogEntryForMutation

Log entry to mutate parts in ReplicatedMergeTree is not created due to concurrent log update by another replica

OSCPUVirtualTimeMicroseconds

CPU time spent seen by OS. Does not include involuntary waits due to virtualization

OSCPUWaitMicroseconds

Total time a thread was ready for execution but waiting to be scheduled by OS, from the OS point of view

OSIOWaitMicroseconds

Total time a thread spent waiting for a result of IO operation, from the OS point of view. This is real IO that doesn’t include the page cache

OSReadBytes

Number of bytes read from disks or block devices. Doesn’t include bytes read from the page cache. May include excessive data due to block size, readahead, etc.

OSReadChars

Number of bytes read from filesystem, including the page cache

OSWriteBytes

Number of bytes written to disks or block devices. Doesn’t include bytes that are in page cache dirty pages. May not include data that was written by OS asynchronously

OSWriteChars

Number of bytes written to filesystem, including the page cache

ObsoleteReplicatedParts

Number of times a data part was covered by another data part that has been fetched from a replica (so, we have marked a covered data part as obsolete and no longer needed)

OpenedFileCacheHits

Number of times a file has been found in the opened file cache, so we didn’t have to open it again

OpenedFileCacheMisses

Number of misses in the opened file cache

OtherQueryTimeMicroseconds

Total time of queries that are not SELECT or INSERT

OverflowAny

Number of times approximate GROUP BY was in effect: when aggregation was performed only on top of first max_rows_to_group_by unique keys and other keys were ignored due to group_by_overflow_mode = any

OverflowBreak

Number of times when data processing was cancelled by query complexity limitation with the overflow_mode = break setting and the result was incomplete

OverflowThrow

Number of times when data processing was cancelled by query complexity limitation with the overflow_mode = throw setting and an exception was thrown

PerfAlignmentFaults

Number of alignment faults. These happen when unaligned memory accesses happen; the kernel can handle these but it reduces performance. This happens only on some architectures (never on x86)

PerfBranchInstructions

Retired branch instructions. Prior to Linux 2.6.35, this used the wrong event on AMD processors

PerfBranchMisses

Mispredicted branch instructions

PerfBusCycles

Bus cycles which can be different from total cycles

PerfCacheMisses

Cache misses. Usually this indicates Last Level Cache misses. This event is intended to be used in con‐junction with the PerfCacheReferences event to calculate cache miss rates

PerfCacheReferences

Cache accesses. Usually this event indicates Last Level Cache accesses but may vary depending on your CPU. The metric may include prefetches and coherency messages, and this also depends on the design of your CPU

PerfContextSwitches

Number of context switches

PerfCpuClock

The CPU clock, a high-resolution per-CPU timer

PerfCpuCycles

Total cycles. Be wary of what happens during CPU frequency scaling

PerfCpuMigrations

Number of times the process has migrated to a new CPU

PerfDataTLBMisses

Data TLB misses

PerfDataTLBReferences

Data TLB references

PerfEmulationFaults

Number of emulation faults. The kernel sometimes traps on unimplemented instructions and emulates them for user space. This can negatively impact performance

PerfInstructionTLBMisses

Instruction TLB misses

PerfInstructionTLBReferences

Instruction TLB references

PerfInstructions

Retired instructions. Be careful, these can be affected by various issues, most notably hardware interrupt counts

PerfLocalMemoryMisses

Local NUMA node memory read misses

PerfLocalMemoryReferences

Local NUMA node memory reads

PerfMinEnabledRunningTime

Running time for event with minimum enabled time. Used to track the amount of event multiplexing

PerfMinEnabledTime

For all events, minimum time that an event was enabled. Used to track event multiplexing influence

PerfRefCpuCycles

Total cycles (not affected by CPU frequency scaling)

PerfStalledCyclesBackend

Stalled cycles during retirement

PerfStalledCyclesFrontend

Stalled cycles during issue

PerfTaskClock

A clock count specific to the task that is running

PolygonsAddedToPool

A polygon has been added to the cache (pool) for the pointInPolygon function

PolygonsInPoolAllocatedBytes

The number of bytes for polygons added to the cache (pool) for the pointInPolygon function

Query

Number of queries to be interpreted and potentially executed. Does not include queries that failed to parse or were rejected due to AST size limits, quota limits or limits on the number of simultaneously running queries. May include internal queries initiated by ClickHouse itself. Does not count subqueries

QueryMaskingRulesMatch

Number of times when query masking rules were successfully matched

QueryMemoryLimitExceeded

Number of times when memory limit exceeded for query

QueryProfilerRuns

Number of times QueryProfiler had been run

QueryProfilerSignalOverruns

Number of times processing of a query profiler signal was dropped due to overrun plus the number of signals that OS has not delivered due to overrun

QueryTimeMicroseconds

Total time of all queries

RWLockAcquiredReadLocks

Number of times a read lock was acquired on a RWLock

RWLockAcquiredWriteLocks

Number of times a write lock was acquired (in a heavy RWLock)

RWLockReadersWaitMilliseconds

Total time in milliseconds that readers waited for acquiring a lock on a RWLock

RWLockWritersWaitMilliseconds

Total time spent waiting for a write lock to be acquired (in a heavy RWLock)

ReadBackoff

Number of times the number of query processing threads was lowered due to slow reads

ReadBufferFromFileDescriptorRead

Number of reads (read/pread) from a file descriptor. Does not include sockets

ReadBufferFromFileDescriptorReadBytes

Number of bytes read from file descriptors. If the file is compressed, this will show the compressed data size

ReadBufferFromFileDescriptorReadFailed

Number of times the read (read/pread) from a file descriptor have failed

ReadBufferFromS3Bytes

Bytes read from S3

ReadBufferFromS3Microseconds

Time spent on reading from S3

ReadBufferFromS3RequestsErrors

Number of exceptions while reading from S3

ReadBufferSeekCancelConnection

Number of seeks which lead to new connection (s3, http)

ReadCompressedBytes

Number of bytes (before decompression) read from compressed sources (files, network)

RealTimeMicroseconds

Total (wall clock) time spent in processing (queries and other tasks) threads (not that this is a sum)

RegexpCreated

Compiled regular expressions. Identical regular expressions compiled just once and cached forever

RejectedInserts

Number of times the INSERT of a block to a MergeTree table was rejected with the Too many parts exception due to the high number of active data parts for a partition

RemoteFSBuffers

Number of buffers created for asynchronous reading from a remote filesystem

RemoteFSCacheDownloadBytes

Number of bytes downloaded from a remote filesystem cache

RemoteFSCacheReadBytes

Number of bytes read from a remote filesystem cache

RemoteFSCancelledPrefetches

Number of cancelled prefecthes (because of seek)

RemoteFSLazySeeks

Number of lazy seeks

RemoteFSPrefetchedReads

Number of reads from prefecthed buffer

RemoteFSPrefetches

Number of prefetches made with asynchronous reading from a remote filesystem

RemoteFSReadBytes

Number of bytes read from a remote filesystem

RemoteFSReadMicroseconds

Time spent reading from a remote filesystem

RemoteFSSeeks

Total number of seeks for async buffer

RemoteFSSeeksWithReset

Number of seeks which lead to a new connection

RemoteFSUnprefetchedReads

Number of reads from an unprefetched buffer

RemoteFSUnusedPrefetches

Number of prefetches pending at buffer destruction

ReplicaPartialShutdown

How many times a Replicated table has to deinitialize its state due to session expiration in ZooKeeper. The state is reinitialized every time when ZooKeeper is available again

ReplicatedDataLoss

Number of times a wanted data part doesn’t exist on any replica (even on replicas that are offline right now). That data parts are definitely lost. This is normal due to asynchronous replication (if quorum inserts were not enabled), when the replica on which the data part was written was failed and when it became online after fail it doesn’t contain that data part

ReplicatedPartChecks

Number of times we had to perform advanced search for a data part on replicas or to clarify the need of an existing data part

ReplicatedPartChecksFailed

Number of times the advanced search for a data part on replicas did not give result or when unexpected part has been found and moved away

ReplicatedPartFailedFetches

Number of times a data part was failed to download from replica of a ReplicatedMergeTree table

ReplicatedPartFetches

Number of times a data part was downloaded from a replica of a ReplicatedMergeTree table

ReplicatedPartFetchesOfMerged

Number of times we prefer to download already merged part from a replica of a ReplicatedMergeTree table instead of performing a merge ourself (usually we prefer doing a merge ourself to save network traffic). This happens when we have not all source parts to perform a merge or when the data part is old enough

ReplicatedPartMerges

Number of times data parts of ReplicatedMergeTree tables were successfully merged

ReplicatedPartMutations

Number of times data parts of ReplicatedMergeTree tables were successfully mutated

S3ReadBytes

Number of bytes read from S3 storage

S3ReadMicroseconds

Time of GET and HEAD requests to S3 storage

S3ReadRequestsCount

Number of GET and HEAD requests to S3 storage

S3ReadRequestsErrors

Number of non-throttling errors in GET and HEAD requests to S3 storage

S3ReadRequestsRedirects

Number of redirects in GET and HEAD requests to S3 storage

S3ReadRequestsThrottling

Number of 429 and 503 errors in GET and HEAD requests to S3 storage

S3WriteBytes

Number of bytes written to S3 storage

S3WriteMicroseconds

Time of POST, DELETE, PUT, and PATCH requests to S3 storage

S3WriteRequestsCount

Number of POST, DELETE, PUT, and PATCH requests to S3 storage

S3WriteRequestsErrors

Number of non-throttling errors in POST, DELETE, PUT, and PATCH requests to S3 storage

S3WriteRequestsRedirects

Number of redirects in POST, DELETE, PUT, and PATCH requests to S3 storage

S3WriteRequestsThrottling

Number of 429 and 503 errors in POST, DELETE, PUT, and PATCH requests to S3 storage

ScalarSubqueriesCacheMiss

Number of times a read from a scalar subquery was not cached and had to be calculated completely

ScalarSubqueriesGlobalCacheHit

Number of times a read from a scalar subquery was done using the global cache

ScalarSubqueriesLocalCacheHit

Number of times a read from a scalar subquery was done using the local cache

SchemaInferenceCacheEvictions

Number of times a schema from cache was evicted due to overflow

SchemaInferenceCacheHits

Number of times a schema from cache was used for schema inference

SchemaInferenceCacheInvalidations

Number of times a schema in cache became invalid due to changes in data

SchemaInferenceCacheMisses

Number of times a schema is not in cache while schema inference

Seek

Number of times the lseek function was called

SelectQuery

Same as Query, but only for SELECT queries

SelectQueryTimeMicroseconds

Total time of SELECT queries

SelectedBytes

Number of bytes (uncompressed; for columns as they stored in memory) SELECTed from all tables

SelectedMarks

Number of marks (index granules) selected to read from a MergeTree table

SelectedParts

Number of data parts selected to read from a MergeTree table

SelectedRanges

Number of non-adjacent ranges in all data parts selected to read from a MergeTree table

SelectedRows

Number of rows SELECTed from all tables

SleepFunctionCalls

Number of times a sleep function (sleep, sleepEachRow) has been called

SleepFunctionMicroseconds

Time spent sleeping due to a sleep function call

SlowRead

Number of reads from a file that were slow. This indicate system overload. Thresholds are controlled by read_backoff_* settings

SoftPageFaults

Number of soft page faults. These are the page faults that were resolved without disk IO activity

StorageBufferErrorOnFlush

Number of times a buffer in the Buffer table has not been able to flush due to error writing in the destination table

StorageBufferFlush

Number of times a buffer in a Buffer table was flushed

StorageBufferLayerLockReadersWaitMilliseconds

Time for waiting for Buffer layer during reading

StorageBufferLayerLockWritersWaitMilliseconds

Time for waiting free Buffer layer to write to (can be used to tune Buffer layers)

StorageBufferPassedAllMinThresholds

Number of times a criteria on min thresholds has been reached to flush a buffer in a Buffer table

StorageBufferPassedBytesFlushThreshold

Number of times background-only flush threshold on bytes has been reached to flush a buffer in a Buffer table (this is an expert-only metric)

StorageBufferPassedBytesMaxThreshold

Number of times a criteria on max bytes threshold has been reached to flush a buffer in a Buffer table

StorageBufferPassedRowsFlushThreshold

Number of times a background-only flush threshold on rows has been reached to flush a buffer in a Buffer table (this is expert-only metric)

StorageBufferPassedRowsMaxThreshold

Number of times a criteria on max rows threshold has been reached to flush a buffer in a Buffer table

StorageBufferPassedTimeFlushThreshold

Number of times a background-only flush threshold on time has been reached to flush a buffer in a Buffer table (this is expert-only metric)

StorageBufferPassedTimeMaxThreshold

Number of times a criteria on max time threshold has been reached to flush a buffer in a Buffer table

SystemTimeMicroseconds

Total time spent in processing (queries and other tasks) threads executing CPU instructions in OS kernel space. This includes time CPU pipeline was stalled due to cache misses, branch mispredictions, hyper-threading, etc.

TableFunctionExecute

Number of times a table function was executed

ThreadPoolReaderPageCacheHit

Number of times the read inside ThreadPoolReader was done from the page cache

ThreadPoolReaderPageCacheHitBytes

Number of bytes read inside ThreadPoolReader when it was done from the page cache

ThreadPoolReaderPageCacheHitElapsedMicroseconds

Time spent reading data from the page cache in ThreadPoolReader

ThreadPoolReaderPageCacheMiss

Number of times the read inside ThreadPoolReader was not done from the page cache and was hand off to the thread pool

ThreadPoolReaderPageCacheMissBytes

Number of bytes read inside ThreadPoolReader when read was not done from the page cache and was hand off to the thread pool

ThreadPoolReaderPageCacheMissElapsedMicroseconds

Time spent reading data inside the asynchronous job in ThreadPoolReader — when read was not done from the page cache

ThreadpoolReaderReadBytes

Bytes read from a threadpool task in asynchronous reading

ThreadpoolReaderTaskMicroseconds

Time spent getting the data in asynchronous reading

ThrottlerSleepMicroseconds

Total time a query was sleeping to conform all throttling settings

UncompressedCacheHits

Number of times a block of data has been found in the uncompressed cache (and decompression was avoided)

UncompressedCacheMisses

Number of times a block of data has not been found in the uncompressed cache (and decompression was required)

UncompressedCacheWeightLost

Number of bytes evicted from the uncompressed cache

UserTimeMicroseconds

Total time spent in processing (queries and other tasks) threads executing CPU instructions in user space. This includes time CPU pipeline was stalled due to cache misses, branch mispredictions, hyper-threading, etc.

WriteBufferFromFileDescriptorWrite

Number of writes (write/pwrite) to a file descriptor. Does not include sockets

WriteBufferFromFileDescriptorWriteBytes

Number of bytes written to file descriptors. If the file is compressed, this will show compressed data size

WriteBufferFromFileDescriptorWriteFailed

Number of times the write (write/pwrite) to a file descriptor have failed

WriteBufferFromS3Bytes

Bytes written to S3

ZooKeeperBytesReceived

Total number of bytes received from ZooKeeper

ZooKeeperBytesSent

Total number of bytes sent to ZooKeeper

ZooKeeperCheck

Number of check requests to ZooKeeper. Usually they don’t make sense in isolation, only as part of a complex transaction

ZooKeeperClose

Number of times connection with ZooKeeper has been closed voluntary

ZooKeeperCreate

Number of times a node was created in ZooKeeper

ZooKeeperExists

Number of times an existence check was performed in ZooKeeper

ZooKeeperGet

Number of times data was fetched from ZooKeeper

ZooKeeperHardwareExceptions

Number of exceptions while working with ZooKeeper related to network (connection loss or similar)

ZooKeeperInit

Number of times a ZooKeeper session was initialized

ZooKeeperList

Number of times a list command was executed on ZooKeeper

ZooKeeperMulti

Number of times a multi command was executed on ZooKeeper

ZooKeeperOtherExceptions

Number of exceptions while working with ZooKeeper other than ZooKeeperUserExceptions and ZooKeeperHardwareExceptions

ZooKeeperRemove

Number of remove requests to ZooKeeper

ZooKeeperSet

Number of set requests to ZooKeeper

ZooKeeperSync

Number of sync requests to ZooKeeper. These requests are rarely needed or usable

ZooKeeperTransactions

Number of transactions performed on ZooKeeper

ZooKeeperUserExceptions

Number of exceptions while working with ZooKeeper related to the data (no node, bad version, or similar)

ZooKeeperWaitMicroseconds

Total time spent waiting for ZooKeeper in microseconds

ZooKeeperWatchResponse

Number of times watch notification has been received from ZooKeeper

AsynchronousMetrics
Metric name Description

AsynchronousMetricsCalculationTimeSpent

Time in seconds spent for calculation of asynchronous metrics (this is the overhead of asynchronous metrics)

BlockActiveTime_vda

Time in seconds the block device had the IO requests queued. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server

BlockDiscardBytes_vda

Number of discarded bytes on the block device. These operations are relevant for SSD. Discard operations are not used by ClickHouse, but can be used by other processes on the system. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server

BlockDiscardMerges_vda

Number of discard operations requested from the block device and merged together by the OS IO scheduler. These operations are relevant for SSD. Discard operations are not used by ClickHouse, but can be used by other processes on the system. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server

BlockDiscardOps_vda

Number of discard operations requested from the block device. These operations are relevant for SSD. Discard operations are not used by ClickHouse, but can be used by other processes on the system. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server

BlockDiscardTime_vda

Time in seconds spend in discard operations requested from the block device, summed across all the operations. These operations are relevant for SSD. Discard operations are not used by ClickHouse, but can be used by other processes on the system. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server

BlockInFlightOps_vda

This value counts the number of IO requests that have been issued to the device driver but have not yet completed. It does not include IO requests that are in the queue but not yet issued to the device driver. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server

BlockQueueTime_vda

This value counts the number of milliseconds that IO requests have waited on this block device. If there are multiple IO requests waiting, this value will increase as the product of the number of milliseconds times the number of requests waiting. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server

BlockReadBytes_vda

Number of bytes read from the block device. It can be lower than the number of bytes read from the filesystem due to the usage of the OS page cache, that saves IO. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server

BlockReadMerges_vda

Number of read operations requested from the block device and merged together by the OS IO scheduler. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server

BlockReadOps_vda

Number of read operations requested from the block device. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server

BlockReadTime_vda

Time in seconds spend in read operations requested from the block device, summed across all the operations. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server

BlockWriteBytes_vda

Number of bytes written to the block device. It can be lower than the number of bytes written to the filesystem due to the usage of the OS page cache, that saves IO. A write to the block device may happen later than the corresponding write to the filesystem due to write-through caching. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server

BlockWriteMerges_vda

Number of write operations requested from the block device and merged together by the OS IO scheduler. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server

BlockWriteOps_vda

Number of write operations requested from the block device. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server

BlockWriteTime_vda

Time in seconds spend in write operations requested from the block device, summed across all the operations. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server

CPUFrequencyMHz_N

The current frequency of the CPU, in MHz. Most of the modern CPUs adjust the frequency dynamically for power saving and Turbo Boosting

CompiledExpressionCacheBytes

Total bytes used for the cache of JIT-compiled code

CompiledExpressionCacheCount

Total entries in the cache of JIT-compiled code

DiskAvailable_default

Available bytes on the disk (virtual filesystem). Remote filesystems can show a large value like 16 EiB

DiskTotal_default

The total size in bytes of the disk (virtual filesystem). Remote filesystems can show a large value like 16 EiB

DiskUnreserved_default

Available bytes on the disk (virtual filesystem) without the reservations for merges, fetches, and moves. Remote filesystems can show a large value like 16 EiB

DiskUsed_default

Used bytes on the disk (virtual filesystem). Remote filesystems not always provide this information

FilesystemLogsPathAvailableBytes

Available bytes on the volume where ClickHouse logs path is mounted. If this value approaches zero, you should tune the log rotation in the configuration file

FilesystemLogsPathAvailableINodes

The number of available inodes on the volume where ClickHouse logs path is mounted

FilesystemLogsPathTotalBytes

The size of the volume where ClickHouse logs path is mounted, in bytes. It’s recommended to have at least 10 GB for logs

FilesystemLogsPathTotalINodes

The total number of inodes on the volume where ClickHouse logs path is mounted

FilesystemLogsPathUsedBytes

Used bytes on the volume where ClickHouse logs path is mounted

FilesystemLogsPathUsedINodes

The number of used inodes on the volume where ClickHouse logs path is mounted

FilesystemMainPathAvailableBytes

Available bytes on the volume where the main ClickHouse path is mounted

FilesystemMainPathAvailableINodes

The number of available inodes on the volume where the main ClickHouse path is mounted. If it is close to zero, it indicates a misconfiguration, and you will get 'no space left on device' even when the disk is not full

FilesystemMainPathTotalBytes

The size of the volume where the main ClickHouse path is mounted, in bytes

FilesystemMainPathTotalINodes

The total number of inodes on the volume where the main ClickHouse path is mounted. If it is less than 25 million, it indicates a misconfiguration

FilesystemMainPathUsedBytes

Used bytes on the volume where the main ClickHouse path is mounted

FilesystemMainPathUsedINodes

The number of used inodes on the volume where the main ClickHouse path is mounted. This value mostly corresponds to the number of files

HTTPThreads

Number of threads in the server of the HTTP interface (without TLS)

InterserverThreads

Number of threads in the server of the replicas communication protocol (without TLS)

Jitter

The difference in time the thread for calculation of the asynchronous metrics was scheduled to wake up and the time it was in fact, woken up. A proxy-indicator of overall system latency and responsiveness

LoadAverageN (LoadAverage1, LoadAverage15, LoadAverage5)

The whole system load, averaged with exponential smoothing over 1 minute. The load represents the number of threads across all the processes (the scheduling entities of the OS kernel), that are currently running by CPU or waiting for IO, or ready to run but not being scheduled at this point of time. This number includes all the processes, not only clickhouse-server. The number can be greater than the number of CPU cores, if the system is overloaded, and many processes are ready to run but waiting for CPU or IO

MMapCacheCells

The number of files opened with mmap (mapped in memory). This is used for queries with the setting local_filesystem_read_method set to mmap. The files opened with mmap are kept in the cache to avoid costly TLB flushes.

MarkCacheBytes

Total size of mark cache in bytes

MarkCacheFiles

Total number of mark files cached in the mark cache

MaxPartCountForPartition

Maximum number of parts per partition across all partitions of all tables of MergeTree family. Values larger than 300 indicates misconfiguration, overload, or massive data loading

MemoryCode

The amount of virtual memory mapped for the pages of machine code of the server process, in bytes

MemoryDataAndStack

The amount of virtual memory mapped for the use of stack and for the allocated memory, in bytes. It is unspecified whether it includes the per-thread stacks and most of the allocated memory, that is allocated with the mmap system call. This metric exists only for completeness reasons. It is recommended to use the MemoryResident metric for monitoring

MemoryResident

The amount of physical memory used by the server process, in bytes

MemoryShared

The amount of memory used by the server process, that is also shared by another processes, in bytes. ClickHouse does not use shared memory, but some memory can be labeled by OS as shared for its own reasons. This metric does not make a lot of sense to watch, and it exists only for completeness reasons

MemoryVirtual

The size of the virtual address space allocated by the server process, in bytes. The size of the virtual address space is usually much greater than the physical memory consumption, and should not be used as an estimate for the memory consumption. The large values of this metric are totally normal, and make only technical sense

NetworkReceiveBytes_eth0

Number of bytes received via the network interface. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server

NetworkReceiveDrop_eth0

Number of bytes a packet was dropped while received via the network interface. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server

NetworkReceiveErrors_eth0

Number of times error happened receiving via the network interface. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server

NetworkReceivePackets_eth0

Number of network packets received via the network interface. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server

NetworkSendBytes_eth0

Number of bytes sent via the network interface. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server

NetworkSendDrop_eth0

Number of times a packed was dropped while sending via the network interface. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server

NetworkSendErrors_eth0

Number of times error (e.g. TCP retransmit) happened while sending via the network interface. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server

NetworkSendPackets_eth0

Number of network packets sent via the network interface. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server

NumberOfDatabases

Total number of databases on the server

NumberOfTables

Total number of tables summed across the databases on the server, excluding the databases that cannot contain MergeTree tables. The excluded database engines are those which generate the set of tables on the fly, like Lazy, MySQL, PostgreSQL, SQlite

OSContextSwitches

The number of context switches that the system underwent on the host machine. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server

OSGuestNiceTime

The ratio of time spent running a virtual CPU for guest operating systems under the control of the Linux kernel, when a guest was set to a higher priority (see man procfs). This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server. This metric is irrelevant for ClickHouse, but still exists for completeness. The value for a single CPU core will be in the [0..1] interval. The value for all CPU cores is calculated as a sum across them [0..num cores]

OSGuestNiceTimeCPUN

The ratio of time spent running a virtual CPU for guest operating systems under the control of the Linux kernel, when a guest was set to a higher priority (see man procfs). This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server. This metric is irrelevant for ClickHouse, but still exists for completeness. The value for a single CPU core will be in the [0..1] interval. The value for all CPU cores is calculated as a sum across them [0..num cores]

OSGuestNiceTimeNormalized

The value is similar to OSGuestNiceTime but divided to the number of CPU cores to be measured in the [0..1] interval regardless of the number of cores. This allows you to average the values of this metric across multiple servers in a cluster even if the number of cores is non-uniform, and still get the average resource utilization metric

OSGuestTime

The ratio of time spent running a virtual CPU for guest operating systems under the control of the Linux kernel (see man procfs). This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server. This metric is irrelevant for ClickHouse, but still exists for completeness. The value for a single CPU core will be in the [0..1] interval. The value for all CPU cores is calculated as a sum across them [0..num cores]

OSGuestTimeCPUN

The ratio of time spent running a virtual CPU for guest operating systems under the control of the Linux kernel (see man procfs). This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server. This metric is irrelevant for ClickHouse, but still exists for completeness. The value for a single CPU core will be in the [0..1] interval. The value for all CPU cores is calculated as a sum across them [0..num cores]

OSGuestTimeNormalized

The value is similar to OSGuestTime but divided to the number of CPU cores to be measured in the [0..1] interval regardless of the number of cores. This allows you to average the values of this metric across multiple servers in a cluster even if the number of cores is non-uniform, and still get the average resource utilization metric

OSIOWaitTime

The ratio of time the CPU core was not running the code but when the OS kernel did not run any other process on this CPU as the processes were waiting for IO. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server. The value for a single CPU core will be in the [0..1] interval. The value for all CPU cores is calculated as a sum across them [0..num cores]

OSIOWaitTimeCPUN

The ratio of time the CPU core was not running the code but when the OS kernel did not run any other process on this CPU as the processes were waiting for IO. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server. The value for a single CPU core will be in the [0..1] interval. The value for all CPU cores is calculated as a sum across them [0..num cores]

OSIOWaitTimeNormalized

The value is similar to OSIOWaitTime but divided to the number of CPU cores to be measured in the [0..1] interval regardless of the number of cores. This allows you to average the values of this metric across multiple servers in a cluster even if the number of cores is non-uniform, and still get the average resource utilization metric

OSIdleTime

The ratio of time the CPU core was idle (not even ready to run a process waiting for IO) from the OS kernel standpoint. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server. This does not include the time when the CPU was under-utilized due to the reasons internal to the CPU (memory loads, pipeline stalls, branch mispredictions, running another SMT core). The value for a single CPU core will be in the [0..1] interval. The value for all CPU cores is calculated as a sum across them [0..num cores]

OSIdleTimeCPUN

The ratio of time the CPU core was idle (not even ready to run a process waiting for IO) from the OS kernel standpoint. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server. This does not include the time when the CPU was under-utilized due to the reasons internal to the CPU (memory loads, pipeline stalls, branch mispredictions, running another SMT core). The value for a single CPU core will be in the [0..1] interval. The value for all CPU cores is calculated as a sum across them [0..num cores]

OSIdleTimeNormalized

The value is similar to OSIdleTime but divided to the number of CPU cores to be measured in the [0..1] interval regardless of the number of cores. This allows you to average the values of this metric across multiple servers in a cluster even if the number of cores is non-uniform, and still get the average resource utilization metric

OSInterrupts

The number of interrupts on the host machine. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server

OSIrqTime

The ratio of time spent for running hardware interrupt requests on the CPU. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server. A high number of this metric may indicate hardware misconfiguration or a very high network load. The value for a single CPU core will be in the [0..1] interval . The value for all CPU cores is calculated as a sum across them [0..num cores]

OSIrqTimeCPUN

The ratio of time spent for running hardware interrupt requests on the CPU. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server. A high number of this metric may indicate hardware misconfiguration or a very high network load. The value for a single CPU core will be in the [0..1] interval. The value for all CPU cores is calculated as a sum across them [0..num cores]

OSIrqTimeNormalized

The value is similar to OSIrqTime but divided to the number of CPU cores to be measured in the [0..1] interval regardless of the number of cores. This allows you to average the values of this metric across multiple servers in a cluster even if the number of cores is non-uniform, and still get the average resource utilization metric

OSMemoryAvailable

The amount of memory available to be used by programs, in bytes. This is very similar to the OSMemoryFreePlusCached metric. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server

OSMemoryBuffers

The amount of memory used by OS kernel buffers, in bytes. This should be typically small, and large values may indicate a misconfiguration of the OS. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server

OSMemoryCached

The amount of memory used by the OS page cache, in bytes. Typically, almost all available memory is used by the OS page cache — high values of this metric are normal and expected. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server

OSMemoryFreePlusCached

The amount of free memory plus OS page cache memory on the host system, in bytes. This memory is available to be used by programs. The value should be very similar to OSMemoryAvailable. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server

OSMemoryFreeWithoutCached

The amount of free memory on the host system, in bytes. This does not include the memory used by the OS page cache memory, in bytes. The page cache memory is also available for usage by programs, so the value of this metric can be confusing. See the OSMemoryAvailable metric instead. For convenience we also provide the OSMemoryFreePlusCached metric, that should be somewhat similar to OSMemoryAvailable. See also https://www.linuxatemyram.com/. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server

OSMemoryTotal

The total amount of memory on the host system, in bytes

OSNiceTime

The ratio of time the CPU core was running userspace code with higher priority. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server. The value for a single CPU core will be in the [0..1] interval. The value for all CPU cores is calculated as a sum across them [0..num cores]

OSNiceTimeCPUN

The ratio of time the CPU core was running userspace code with higher priority. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server. The value for a single CPU core will be in the [0..1] interval. The value for all CPU cores is calculated as a sum across them [0..num cores]

OSNiceTimeNormalized

The value is similar to OSNiceTime but divided to the number of CPU cores to be measured in the [0..1] interval regardless of the number of cores. This allows you to average the values of this metric across multiple servers in a cluster even if the number of cores is non-uniform, and still get the average resource utilization metric

OSOpenFiles

The total number of opened files on the host machine. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server

OSProcessesBlocked

Number of threads blocked waiting for IO to complete (man procfs). This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server

OSProcessesCreated

The number of processes created. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server

OSProcessesRunning

The number of runnable (running or ready to run) threads by the operating system. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server

OSSoftIrqTime

The ratio of time spent for running software interrupt requests on the CPU. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server. A high number of this metric may indicate inefficient software running on the system. The value for a single CPU core will be in the [0..1] interval. The value for all CPU cores is calculated as a sum across them [0..num cores]

OSSoftIrqTimeCPUN

The ratio of time spent for running software interrupt requests on the CPU. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server. A high number of this metric may indicate inefficient software running on the system. The value for a single CPU core will be in the [0..1] interval. The value for all CPU cores is calculated as a sum across them [0..num cores]

OSSoftIrqTimeNormalized

The value is similar to OSSoftIrqTime but divided to the number of CPU cores to be measured in the [0..1] interval regardless of the number of cores. This allows you to average the values of this metric across multiple servers in a cluster even if the number of cores is non-uniform, and still get the average resource utilization metric

OSStealTime

The ratio of time spent in other operating systems by the CPU when running in a virtualized environment. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server. Not every virtualized environments present this metric, and most of them don’t. The value for a single CPU core will be in the [0..1] interval. The value for all CPU cores is calculated as a sum across them [0..num cores]

OSStealTimeCPUN

The ratio of time spent in other operating systems by the CPU when running in a virtualized environment. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server. Not every virtualized environments present this metric, and most of them don’t. The value for a single CPU core will be in the [0..1] interval. The value for all CPU cores is calculated as a sum across them [0..num cores]

OSStealTimeNormalized

The value is similar to OSStealTime but divided to the number of CPU cores to be measured in the [0..1] interval regardless of the number of cores. This allows you to average the values of this metric across multiple servers in a cluster even if the number of cores is non-uniform, and still get the average resource utilization metric

OSSystemTime

The ratio of time the CPU core was running OS kernel (system) code. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server. The value for a single CPU core will be in the [0..1] interval. The value for all CPU cores is calculated as a sum across them [0..num cores]

OSSystemTimeCPUN

The ratio of time the CPU core was running OS kernel (system) code. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server. The value for a single CPU core will be in the [0..1] interval. The value for all CPU cores is calculated as a sum across them [0..num cores]

OSSystemTimeNormalized

The value is similar to OSSystemTime but divided to the number of CPU cores to be measured in the [0..1] interval regardless of the number of cores. This allows you to average the values of this metric across multiple servers in a cluster even if the number of cores is non-uniform, and still get the average resource utilization metric

OSThreadsRunnable

The total number of runnable threads, as the OS kernel scheduler seeing it

OSThreadsTotal

The total number of threads, as the OS kernel scheduler seeing it

OSUptime

The uptime of the host server (the machine where ClickHouse is running), in seconds

OSUserTime

The ratio of time the CPU core was running userspace code. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server. This includes also the time when the CPU was under-utilized due to the reasons internal to the CPU (memory loads, pipeline stalls, branch mispredictions, running another SMT core). The value for a single CPU core will be in the [0..1] interval. The value for all CPU cores is calculated as a sum across them [0..num cores]

OSUserTimeCPUN

The ratio of time the CPU core was running userspace code. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server. This includes also the time when the CPU was under-utilized due to the reasons internal to the CPU (memory loads, pipeline stalls, branch mispredictions, running another SMT core). The value for a single CPU core will be in the [0..1] interval. The value for all CPU cores is calculated as a sum across them [0..num cores]

OSUserTimeNormalized

The value is similar to OSUserTime but divided to the number of CPU cores to be measured in the [0..1] interval regardless of the number of cores. This allows you to average the values of this metric across multiple servers in a cluster even if the number of cores is non-uniform, and still get the average resource utilization metric

ReplicasMaxAbsoluteDelay

Maximum difference in seconds between the most fresh replicated part and the most fresh data part still to be replicated, across Replicated tables. A very high value indicates a replica with no data

ReplicasMaxInsertsInQueue

Maximum number of INSERT operations in the queue (still to be replicated) across Replicated tables

ReplicasMaxMergesInQueue

Maximum number of merge operations in the queue (still to be applied) across Replicated tables

ReplicasMaxQueueSize

Maximum queue size (in the number of operations like get, merge) across Replicated tables

ReplicasMaxRelativeDelay

Maximum difference between the replica delay and the delay of the most up-to-date replica of the same table, across Replicated tables

ReplicasSumInsertsInQueue

Sum of INSERT operations in the queue (still to be replicated) across Replicated tables

ReplicasSumMergesInQueue

Sum of merge operations in the queue (still to be applied) across Replicated tables

ReplicasSumQueueSize

Sum queue size (in the number of operations like get, merge) across Replicated tables

TCPThreads

Number of threads in the server of the TCP protocol (without TLS)

TotalBytesOfMergeTreeTables

Total amount of bytes (compressed, including data and indices) stored in all tables of MergeTree family

TotalPartsOfMergeTreeTables

Total amount of data parts in all tables of MergeTree family. Numbers larger than 10000 will negatively affect the server startup time and it may indicate unreasonable choice of the partition key

TotalRowsOfMergeTreeTables

Total amount of rows (records) stored in all tables of MergeTree family

UncompressedCacheBytes

Total size of uncompressed cache in bytes. Uncompressed cache does not usually improve the performance and should be mostly avoided

UncompressedCacheCells

Total number of entries in the uncompressed cache. Each entry represents a decompressed block of data. Uncompressed cache does not usually improve performance and should be mostly avoided

Uptime

The server uptime in seconds. It includes the time spent for server initialization before accepting connections

jemalloc.arenas.all.dirty_purged

An internal metric of the low-level memory allocator (see jemalloc)

jemalloc.arenas.all.muzzy_purged

An internal metric of the low-level memory allocator (see jemalloc)

jemalloc.arenas.all.pactive

An internal metric of the low-level memory allocator (see jemalloc)

jemalloc.arenas.all.pdirty

An internal metric of the low-level memory allocator (see jemalloc)

jemalloc.arenas.all.pmuzzy

An internal metric of the low-level memory allocator (see jemalloc)

jemalloc.background_thread.num_runs

An internal metric of the low-level memory allocator (see jemalloc)

jemalloc.background_thread.num_threads

An internal metric of the low-level memory allocator (see jemalloc)

jemalloc.background_thread.run_intervals

An internal metric of the low-level memory allocator (see jemalloc)

jemalloc.active

An internal metric of the low-level memory allocator (see jemalloc)

jemalloc.allocated

An internal metric of the low-level memory allocator (see jemalloc)

jemalloc.epoch

An internal incremental update number of the statistics of jemalloc, used in all other jemalloc metrics

jemalloc.mapped

An internal metric of the low-level memory allocator (see jemalloc)

jemalloc.metadata

An internal metric of the low-level memory allocator (see jemalloc)

jemalloc.metadata_thp

An internal metric of the low-level memory allocator (see jemalloc)

jemalloc.resident

An internal metric of the low-level memory allocator (see jemalloc)

jemalloc.retained

An internal metric of the low-level memory allocator (see jemalloc)

View metrics

You can view monitoring metrics for ADQM in the system.metrics, system.events, and system.asynchronous_metrics system tables.

A query example:

SELECT * FROM system.metrics LIMIT 5;

The output:

┌─metric──────────┬─value─┬─description─────────────────────────────────────┐
│ Query           │     1 │ Number of executing queries                     │
│ Merge           │     0 │ Number of executing background merges           │
│ PartMutation    │     0 │ Number of mutations (ALTER DELETE/UPDATE)       │
│ ReplicatedFetch │     0 │ Number of data parts being fetched from replica │
│ ReplicatedSend  │     0 │ Number of data parts being sent to replicas     │
└─────────────────┴───────┴─────────────────────────────────────────────────┘

Depending on the way ADQM monitoring is installed, you can also view metrics in web interfaces:

Found a mistake? Seleсt text and press Ctrl+Enter to report it