ADQM Control monitoring metrics

The Monitoring service installed in an ADQM Control cluster collects two groups of metrics: system metrics from hosts and ADQM Control service metrics.

You can view metrics in your browser in the Prometheus format (ports and endpoints for access to metrics are described below), as well as use the Prometheus and Grafana web interfaces.

System metrics

System metrics indicate general characteristics of cluster hosts, usually related to resource consumption: for example, CPU utilization, disk capacity, memory usage, I/O performance, and other parameters.

To view system metrics of ADQM Control hosts in the Prometheus format, enter in the address bar of your browser: http://<adqmc_host_ip>:11203/metrics, where 11203 and /metrics are a port number and endpoint to listen for system metrics, which are set up in the Node Exporter settings section of the Monitoring service configuration.

In the Grafana interface, system metrics for ADQM Control hosts are displayed in the System dashboard and in the System row of the General dashboard.

The General dashboard in Grafana
The General dashboard in Grafana UI
The General dashboard in Grafana UI

ADQM Control service metrics

ADQM Control service metrics allow you to monitor and analyze various parameters of ADQM Control operation and performance: for example, availability of service components, types and frequency of errors, response time for user requests, frequency of alert generation and alert types, and other metrics.

Use the addresses listed in the table below to explore monitoring metrics of the ADQM Control service components in the Prometheus format. Provided addresses contain default ports for accessing metrics of service components. You can change these ports in the Network configuration section of the ADQM Control service configuration.

ADQM Control service component Address to access component metrics

Agents

http://<host_ip>:5002/api/v1/metrics

Alert Generator

http://<host_ip>:5001/api/v1/metrics

Alert Receiver

http://<host_ip>:12322/api/v1/metrics

Alertmanager

http://<host_ip>:9093/metrics

Backend

http://<host_ip>:5555/api/v1/metrics

 
The tables below describe the ADQM Control service metrics, grouped as they are presented on dashboards in the Grafana interface.

Adqm_agent

 
Metrics of the Agents component show which information about ADQM clusters is collected and stored by ADQM Control.

Metric name Description

adqm_agent_hosts

Number of ADQM cluster hosts

adqm_agent_databases

Number of ADQM databases

adqm_agent_tables

Number of ADQM tables

adqm_agent_columns

Number of columns in ADQM tables

adqm_agent_queries

Number of queries run in ADQM

adqm_agent_queries_normalized

Number of normalized (linked to tables) queries

jobs_total

Number of jobs performed by a service.

Job statuses: success, failed.

Job types: alerts_cleanup, check_query_log, hosts_collection, job_log_cleanup, queries_collection, queries_duration_aggregation, queries_duration_collection, queries_normalization, tables_collection

The Adqm_agent dashboard in Grafana UI
The Adqm_agent dashboard in Grafana UI
Alerts
Metric name Description

alertgenerator_alerts_lifetime_seconds

Alert lifetime (the period between the time an alert is sent and the time the alert is no longer considered valid)

alertgenerator_alerts_total

Number of generated alerts

alertgenerator_fired_alerts_total

Number of alerts sent to Alert Manager

alertreceiver_alerts_received_total

Number of alerts received by Alert Receiver

alertgenerator_alerts_resend_total

Number of alerts resent to Alert Manager

Rest

 
The Services availability panel of the Rest dashboard visualizes information about service availability based on the following metrics.

Metric name Description

http_client_error_total

Number of errors in connecting a service to Prometheus

zookeeper_client_error_total

Number of errors in connecting a service to the ZooKeeper client

postgres_client_error_total

Number of errors in connecting a service to the PostgreSQL client

chcpp_client_error_total

Number of errors in connecting a service to the ClickHouse client

The Rest dashboard also contains the Backend, Alert receiver, and Alert generator rows, where panels display the following metrics for the corresponding service components.

Metric name Description

http_request_count_total

Number of requests

http_request_duration_seconds

Duration of requests in seconds

http_request_size_bytes

Size of requests in bytes

http_response_size_bytes

Size of responses in bytes

REST API metrics in Grafana UI
REST API metrics in Grafana UI

REST API metrics, grouped differently, are also visualized in the API section of the General dashboard.

Found a mistake? Seleсt text and press Ctrl+Enter to report it