Thank you for rating our AI Search!

We would be grateful if you could share your thoughts so we can improve our AI Search for you and other readers.

Your feedback will help us improve AI Search experience.

Hello, I’m Arenadata AI!

Select language

EN | RU

Change color mode

ADB monitoring metrics

Anton Vasilev

Collapse content Expand content

Contents

Collect metrics
- Diamond
- Custom scripts
View metrics
- Graphite
- Grafana
Grafana dashboards

This article describes metrics for monitoring an ADB cluster. For information on how to install monitoring, refer to the sections:

Collect metrics

The monitoring cluster collects metrics from all cluster hosts on which the Monitoring Clients service is installed. Refer to the ADCM Mapping tab to check which hosts are currently monitored. Metrics are collected on hosts using Diamond and custom scripts. Both tools send metrics to Graphite, which is installed on the monitoring cluster.

Diamond

Diamond is a Python daemon that collects system metrics. The Diamond configuration file is /etc/diamond/diamond.conf.

Diamond uses program components, called collectors, to gather system metrics such as CPU utilization, disk space usage, and input/output load. Collector configuration files reside in the /etc/diamond/collectors directory on each monitoring client host. Refer to the Diamond documentation for more information about Diamond and collectors. Besides common collectors, additional collectors can be used:

If the PXF service is installed on a host, the PXFCollector component is used to collect metrics related to PXF as a Java service.
If ADBM or ADB Control are monitored, the DockerCollector component is used to collect metrics related to ADBM and ADB Control Docker containers.

Custom scripts

Custom ADB scripts extend monitoring capabilities by providing data on the ADB cluster and database activity. By default, monitoring scripts are located in the /home/gpadmin/arenadata_configs directory on the master host. These are:

arenadata_segments_monitor.sh — collects information about the cluster, such as the number of segments and their mirrors, their state, and replication lag.
db_datfrozenxid_alerter.sh — monitors transaction ID wraparound risk.
pxf-monitor.sh — monitors the status and uptime of the PXF service. It is available only on hosts where the PXF service is installed.

View metrics

Graphite

Graphite gets metrics from Diamond and monitoring scripts. To view metrics in the Graphite UI, enter the address of the host with the monitoring cluster and the Graphite port (80 by default) in a browser URL bar, for example, 192.0.2.5:80. You can check and modify this IP address and port in the ADCM user interface, in the Graphite service configuration, using the ip_and_ports → Host IP address and ip_and_ports → Web-interface TCP port parameters. These values are set when you configure the Graphite service during installation of the monitoring cluster.

On the left side of the window that opens, expand the Metrics → Arenadata → DB → <Cluster_id> node. Two groups of metrics are available:

System_metrics — shows general characteristics of hosts, usually related to resource consumption.

System metrics

Metrics group

Description

cpu

CPU utilization

diskspace

Disk space usage

docker

Metrics related to Docker containers. They are available if ADBM or ADB Control are being monitored

files

File statistics

iostat

Input/output operation performance

loadavg

System load averages

memory

Memory usage

netstat

Network connection statistics

network

Network interface performance

pxfjson

PXF service metrics. The group is available only for hosts where the PXF service is installed

uptime

How long the system has been on since it was last restarted

database — provides data on the database, segments, and transactions.

Database metrics

Metric group Metric name Description

available

is_available

Whether a database is available

db_datfrozenxid

Existing database names

The oldest transaction age

replication

REPLICATION_LAG

Sync delay (in bytes) between the master and the standby master

REPLICATION_STATE

The state of the WAL streaming replication process. Possible values:

streaming — the master is streaming changes after its connected standby server has caught up with the primary.
startup — the master is starting up.
catchup — the master’s connected standby is catching up with the primary.
backup — the master is sending a backup.
inactive — replication is disabled.

segments

MIRRORS_AS_PRIMARY

The number of mirror segments that are currently running in the primary role

TOTAL_PRIMARY_SEGMENTS

The number of segments that are configured to operate in the primary role (preferred_role = 'p' in gp_segment_configuration)

TOTAL_SEGMENTS

The total number of configured primary and mirror segments

UP_SEGMENTS

The number of segments that are currently online and operational (status ='u' in gp_segment_configuration)

sessions

LONGEST_XACT_SESS_ID

The session ID of the longest-running active transaction

LONGEST_XACT_TIME

The duration (in seconds) of the longest-running active transaction

Metrics available in Graphite

Grafana

Grafana allows you to visualize metrics stored in Graphite, create your own dashboards, or modify existing ones.

To view Grafana dashboards, enter the address of the host with the monitoring cluster and the Grafana port (3000 by default) in a browser URL bar, for example, 192.0.2.5:3000. You can check and modify this IP address and port in the ADCM user interface, in the Grafana service configuration, using the ip_and_ports → Host IP address and ip_and_ports → Port parameters. To log in to Grafana, use the values of the security → Username and security → Password parameters of the Grafana service configuration. These values are set when you configure the Grafana service during installation of the monitoring cluster.

Grafana dashboards

By default, the following dashboards are available in Grafana:

Arenadata DB system cluster <Cluster name>
Arenadata DB cluster <Cluster name> ADCC
Arenadata DB cluster <Cluster name> PXF
Arenadata System metrics

Arenadata DB system cluster <Cluster name>

The Arenadata DB system cluster <Cluster name> dashboard consists of two sections: Database and System.

The Database section contains panels that show information about the cluster, such as the master replication state and segment statuses.

Arenadata DB System cluster dashboard in Grafana. The Database section

The table below describes the Grafana dashboard panels available in the Database section.

Database
Panel name	Description	Graphite source
Database is	The database state: `Up` or `Down`	<Cluster_id>/database/available/is_available
Mirrors as primaries	The number of mirror segments that are running in the primary role	<Cluster_id>/database/segments/MIRRORS_AS_PRIMARY
Database segments	The total number of segments and their status	<Cluster_id>/database/segments/TOTAL_SEGMENTS
Longest Transaction (sec)	The duration (in seconds) of the longest-running active transaction	<Cluster_id>/database/sessions/LONGEST_XACT_TIME
Longest transaction (sess_id)	The session ID of the longest-running active transaction	<Cluster_id>/database/sessions/LONGEST_XACT_SESS_ID
Master replication state is	The state of the WAL streaming replication process	<Cluster_id>/database/replication/REPLICATION_STATE
Replication delay	Sync delay (in bytes) between the master and the standby master	<Cluster_id>/database/replication/REPLICATION_LAG
Wraparound warn percentage	Shows how close the current transaction ID age is to the warning limit	<Cluster_id>/database/db_datfrozenxid

The System section shows the system performance metrics.

Arenadata DB System cluster dashboard in Grafana. The System section

The table below describes the Grafana dashboard panels available in the System section.

System
Panel name	Description	Graphite source
CPU usage	CPU utilization rate in percent	System_metrics/<host>/cpu
IOPS	The number of input/output operations per second	System_metrics/<host>/iostat/<disk_device>/iops
IO %	Percentage of time the disk is performing I/O operations	System_metrics/<host>/iostat/<disk_device>/util_percentage
Mb per sec	Read/write data transfer rate	System_metrics/<host>/iostat/<disk_device>/read_byte_per_second System_metrics/<host>/iostat/<disk_device>/write_byte_per_second
Await	The average time (in milliseconds) for I/O requests issued to the device to be served	System_metrics/<host>/iostat/<disk_device>/await
Service time	The average service time (in milliseconds) for I/O requests that were issued to the device	System_metrics/<host>/iostat/<disk_device>/service_time
Network receive bytes	Bytes received by a network interface	System_metrics/<host>/network/<network_interface>/rx_byte
Network transmit bytes	Bytes transmitted by a network interface	System_metrics/<host>/network/<network_interface>/tx_byte
Available memory	How much memory is available for starting new applications without triggering swapping	System_metrics/<host>/memory/MemAvailable
Memory free	Absolute amount of unused physical memory	System_metrics/<host>/memory/MemFree
Disk Space Usage - datadirs	Available disk space as a percentage of the total	System_metrics/<host>/diskspace/<data_directory>/byte_percentfree
Disk Space Usage - /	Available disk space in the /root directory as a percentage of the total	System_metrics/<host>/diskspace/root/byte_per_second
LoadAVG	1-minute load average	System_metrics/<host>/loadavg/01
Processes running	The number of processes running	System_metrics/<host>/loadavg/processes_running
Processes total	The total number of processes on the system	System_metrics/<host>/loadavg/processes_total

Arenadata DB cluster <Cluster name> ADCC

This dashboard shows information about ADB Control and ADBM agents that are installed on the master host and on every segment host. The dashboard also includes information about ADB Control and ADBM Docker containers.

ADCC dashboard in Grafana

ADCC
Panel name	Description	Graphite source
Agents uptime	How long ADB Control and ADBM agents have been on since they were last restarted	System_metrics/<host>/adbm_agent_uptime System_metrics/<host>/adcc_agent_uptime
Agent CPU usage	CPU utilization rate (in percent) by ADB Control and ADBM agents	System_metrics/<host>/adbm_agent_cpu System_metrics/<host>/adcc_agent_cpu
Agent memory usage	Agent memory utilization rate in MB	System_metrics/<host>/adbm_agent_mem System_metrics/<host>/adcc_agent_mem
Containers uptime	Uptime of ADBM and ADB Control Docker containers	System_metrics/<host>/docker/containers/<container_name>/uptime
Container memory usage	Memory utilization rate by ADBM and ADB Control Docker containers	System_metrics/<host>/docker/containers/<container_name>/RSS_byte
Container CPU usage	CPU utilization rate by ADBM and ADB Control Docker containers	System_metrics/<host>/docker/containers/<container_name>/cpu/cpuperc

Arenadata DB cluster <Cluster name> PXF

This dashboard monitors the uptime of PXF hosts and performance metrics for the PXF application.

PXF dashboard in Grafana

PXF
Panel name	Description	Graphite source
PXF UPTIME	How long the PXF service has been on since it was last restarted	System_metrics/<host>/pxfjson/pxf/pxf_uptime
Active threads	The number of threads that are actively executing tasks	System_metrics/<host>/pxfjson/pxf/executor/active
Queue capacity	The maximum number of threads to be added to the queue. Configured using the `pxf.task.pool.queue-capacity` parameter in the /etc/pxf/conf/pxf-application.properties file. By default, it’s `0` meaning no queue is used	System_metrics/<host>/pxfjson/pxf/executor/queue/capacity
Bytes recieved	The number of bytes per second received by PXF from ADB	System_metrics/<host>/pxfjson/pxf/bytes/received
Records recieved	The number of records per second received by PXF from ADB	System_metrics/<host>/pxfjson/pxf/records/received
Bytes sent	The number of bytes per second sent by PXF to ADB	System_metrics/<host>/pxfjson/pxf/bytes/sent
Records sent	The number of records per second sent by PXF to ADB	System_metrics/<host>/pxfjson/pxf/records/sent
JVM memory committed	The amount of committed memory (in bytes) for the Java virtual machine that runs the PXF application	System_metrics/<host>/pxfjson/jvm/memory/committed
JVM memory max	The maximum amount of memory in bytes that can be used for memory management by the JVM that runs the PXF application	System_metrics/<host>/pxfjson/jvm/memory/max
JVM memory used	The amount of memory used by the JVM that runs the PXF application	System_metrics/<host>/pxfjson/jvm/memory/used

Arenadata System metrics

This dashboard contains the same set of panels as the System dashboard described above, but allows you to monitor hosts from multiple clusters and Arenadata products. Use the Arenadata product list to select which products to display.

Arenadata System metrics dashboard in Grafana

Found a mistake? Seleсt text and press Ctrl+Enter to report it