ADB monitoring metrics

This article describes metrics for monitoring an ADB cluster. For information on how to install monitoring, refer to the sections:

Collect metrics

The monitoring cluster collects metrics from all cluster hosts on which the Monitoring Clients service is installed. Refer to the ADCM Mapping tab to check which hosts are currently monitored. Metrics are collected on hosts using Diamond and custom scripts. Both tools send metrics to Graphite, which is installed on the monitoring cluster.

Diamond

Diamond is a Python daemon that collects system metrics. The Diamond configuration file is /etc/diamond/diamond.conf.

Diamond uses program components, called collectors, to gather system metrics such as CPU utilization, disk space usage, and input/output load. Collector configuration files reside in the /etc/diamond/collectors directory on each monitoring client host. Refer to the Diamond documentation for more information about Diamond and collectors. Besides common collectors, additional collectors can be used:

  • If the PXF service is installed on a host, the PXFCollector component is used to collect metrics related to PXF as a Java service.

  • If ADBM or ADB Control are monitored, the DockerCollector component is used to collect metrics related to ADBM and ADB Control Docker containers.

Custom scripts

Custom ADB scripts extend monitoring capabilities by providing data on the ADB cluster and database activity. By default, monitoring scripts are located in the /home/gpadmin/arenadata_configs directory on the master host. These are:

  • arenadata_segments_monitor.sh — collects information about the cluster, such as the number of segments and their mirrors, their state, and replication lag.

  • db_datfrozenxid_alerter.sh — monitors transaction ID wraparound risk.

  • pxf-monitor.sh — monitors the status and uptime of the PXF service. It is available only on hosts where the PXF service is installed.

View metrics

Graphite

Graphite gets metrics from Diamond and monitoring scripts. To view metrics in the Graphite UI, enter the address of the host with the monitoring cluster and the Graphite port (80 by default) in a browser URL bar, for example, 192.0.2.5:80. You can check and modify this IP address and port in the ADCM user interface, in the Graphite service configuration, using the ip_and_ports → Host IP address and ip_and_ports → Web-interface TCP port parameters. These values are set when you configure the Graphite service during installation of the monitoring cluster.

On the left side of the window that opens, expand the Metrics → Arenadata → DB → <Cluster_id> node. Two groups of metrics are available:

  • System_metrics — shows general characteristics of hosts, usually related to resource consumption.

    System metrics
    Metrics group Description

    cpu

    CPU utilization

    diskspace

    Disk space usage

    docker

    Metrics related to Docker containers. They are available if ADBM or ADB Control are being monitored

    files

    File statistics

    iostat

    Input/output operation performance

    loadavg

    System load averages

    memory

    Memory usage

    netstat

    Network connection statistics

    network

    Network interface performance

    pxfjson

    PXF service metrics. The group is available only for hosts where the PXF service is installed

    uptime

    How long the system has been on since it was last restarted

  • database — provides data on the database, segments, and transactions.

    Database metrics
    Metric group Metric name Description

    available

    is_available

    Whether a database is available

    db_datfrozenxid

    Existing database names

    The oldest transaction age

    replication

    REPLICATION_LAG

    Sync delay (in bytes) between the master and the standby master

    REPLICATION_STATE

    The state of the WAL streaming replication process. Possible values:

    • streaming — the master is streaming changes after its connected standby server has caught up with the primary.

    • startup — the master is starting up.

    • catchup — the master’s connected standby is catching up with the primary.

    • backup — the master is sending a backup.

    • inactive — replication is disabled.

    segments

    MIRRORS_AS_PRIMARY

    The number of mirror segments that are currently running in the primary role

    TOTAL_PRIMARY_SEGMENTS

    The number of segments that are configured to operate in the primary role (preferred_role = 'p' in gp_segment_configuration)

    TOTAL_SEGMENTS

    The total number of configured primary and mirror segments

    UP_SEGMENTS

    The number of segments that are currently online and operational (status ='u' in gp_segment_configuration)

    sessions

    LONGEST_XACT_SESS_ID

    The session ID of the longest-running active transaction

    LONGEST_XACT_TIME

    The duration (in seconds) of the longest-running active transaction

Metrics available in Graphite
Metrics available in Graphite
Metrics available in Graphite
Metrics available in Graphite

Grafana

Grafana allows you to visualize metrics stored in Graphite, create your own dashboards, or modify existing ones.

To view Grafana dashboards, enter the address of the host with the monitoring cluster and the Grafana port (3000 by default) in a browser URL bar, for example, 192.0.2.5:3000. You can check and modify this IP address and port in the ADCM user interface, in the Grafana service configuration, using the ip_and_ports → Host IP address and ip_and_ports → Port parameters. To log in to Grafana, use the values of the security → Username and security → Password parameters of the Grafana service configuration. These values are set when you configure the Grafana service during installation of the monitoring cluster.

Grafana dashboards

By default, the following dashboards are available in Grafana:

Arenadata DB system cluster <Cluster name>

The Arenadata DB system cluster <Cluster name> dashboard consists of two sections: Database and System.

The Database section contains panels that show information about the cluster, such as the master replication state and segment statuses.

Arenadata DB System cluster dashboard in Grafana. The Database section
Arenadata DB System cluster dashboard in Grafana. The Database section

The table below describes the Grafana dashboard panels available in the Database section.

Database
Panel name Description Graphite source

Database is

The database state: Up or Down

<Cluster_id>/database/available/is_available

Mirrors as primaries

The number of mirror segments that are running in the primary role

<Cluster_id>/database/segments/MIRRORS_AS_PRIMARY

Database segments

The total number of segments and their status

<Cluster_id>/database/segments/TOTAL_SEGMENTS

Longest Transaction (sec)

The duration (in seconds) of the longest-running active transaction

<Cluster_id>/database/sessions/LONGEST_XACT_TIME

Longest transaction (sess_id)

The session ID of the longest-running active transaction

<Cluster_id>/database/sessions/LONGEST_XACT_SESS_ID

Master replication state is

The state of the WAL streaming replication process

<Cluster_id>/database/replication/REPLICATION_STATE

Replication delay

Sync delay (in bytes) between the master and the standby master

<Cluster_id>/database/replication/REPLICATION_LAG

Wraparound warn percentage

Shows how close the current transaction ID age is to the warning limit

<Cluster_id>/database/db_datfrozenxid

The System section shows the system performance metrics.

Arenadata DB System cluster dashboard in Grafana. The System section
Arenadata DB System cluster dashboard in Grafana. The System section

The table below describes the Grafana dashboard panels available in the System section.

System
Panel name Description Graphite source

CPU usage

CPU utilization rate in percent

System_metrics/<host>/cpu

IOPS

The number of input/output operations per second

System_metrics/<host>/iostat/<disk_device>/iops

IO %

Percentage of time the disk is performing I/O operations

System_metrics/<host>/iostat/<disk_device>/util_percentage

Mb per sec

Read/write data transfer rate

System_metrics/<host>/iostat/<disk_device>/read_byte_per_second

System_metrics/<host>/iostat/<disk_device>/write_byte_per_second

Await

The average time (in milliseconds) for I/O requests issued to the device to be served

System_metrics/<host>/iostat/<disk_device>/await

Service time

The average service time (in milliseconds) for I/O requests that were issued to the device

System_metrics/<host>/iostat/<disk_device>/service_time

Network receive bytes

Bytes received by a network interface

System_metrics/<host>/network/<network_interface>/rx_byte

Network transmit bytes

Bytes transmitted by a network interface

System_metrics/<host>/network/<network_interface>/tx_byte

Available memory

How much memory is available for starting new applications without triggering swapping

System_metrics/<host>/memory/MemAvailable

Memory free

Absolute amount of unused physical memory

System_metrics/<host>/memory/MemFree

Disk Space Usage - datadirs

Available disk space as a percentage of the total

System_metrics/<host>/diskspace/<data_directory>/byte_percentfree

Disk Space Usage - /

Available disk space in the /root directory as a percentage of the total

System_metrics/<host>/diskspace/root/byte_per_second

LoadAVG

1-minute load average

System_metrics/<host>/loadavg/01

Processes running

The number of processes running

System_metrics/<host>/loadavg/processes_running

Processes total

The total number of processes on the system

System_metrics/<host>/loadavg/processes_total

Arenadata DB cluster <Cluster name> ADCC

This dashboard shows information about ADB Control and ADBM agents that are installed on the master host and on every segment host. The dashboard also includes information about ADB Control and ADBM Docker containers.

ADCC dashboard in Grafana
ADCC dashboard in Grafana
ADCC
Panel name Description Graphite source

Agents uptime

How long ADB Control and ADBM agents have been on since they were last restarted

System_metrics/<host>/adbm_agent_uptime

System_metrics/<host>/adcc_agent_uptime

Agent CPU usage

CPU utilization rate (in percent) by ADB Control and ADBM agents

System_metrics/<host>/adbm_agent_cpu

System_metrics/<host>/adcc_agent_cpu

Agent memory usage

Agent memory utilization rate in MB

System_metrics/<host>/adbm_agent_mem

System_metrics/<host>/adcc_agent_mem

Containers uptime

Uptime of ADBM and ADB Control Docker containers

System_metrics/<host>/docker/containers/<container_name>/uptime

Container memory usage

Memory utilization rate by ADBM and ADB Control Docker containers

System_metrics/<host>/docker/containers/<container_name>/RSS_byte

Container CPU usage

CPU utilization rate by ADBM and ADB Control Docker containers

System_metrics/<host>/docker/containers/<container_name>/cpu/cpuperc

Arenadata DB cluster <Cluster name> PXF

This dashboard monitors the uptime of PXF hosts and performance metrics for the PXF application.

PXF dashboard in Grafana
PXF dashboard in Grafana
PXF
Panel name Description Graphite source

PXF UPTIME

How long the PXF service has been on since it was last restarted

System_metrics/<host>/pxfjson/pxf/pxf_uptime

Active threads

The number of threads that are actively executing tasks

System_metrics/<host>/pxfjson/pxf/executor/active

Queue capacity

The maximum number of threads to be added to the queue.

Configured using the pxf.task.pool.queue-capacity parameter in the /etc/pxf/conf/pxf-application.properties file. By default, it’s 0 meaning no queue is used

System_metrics/<host>/pxfjson/pxf/executor/queue/capacity

Bytes recieved

The number of bytes per second received by PXF from ADB

System_metrics/<host>/pxfjson/pxf/bytes/received

Records recieved

The number of records per second received by PXF from ADB

System_metrics/<host>/pxfjson/pxf/records/received

Bytes sent

The number of bytes per second sent by PXF to ADB

System_metrics/<host>/pxfjson/pxf/bytes/sent

Records sent

The number of records per second sent by PXF to ADB

System_metrics/<host>/pxfjson/pxf/records/sent

JVM memory committed

The amount of committed memory (in bytes) for the Java virtual machine that runs the PXF application

System_metrics/<host>/pxfjson/jvm/memory/committed

JVM memory max

The maximum amount of memory in bytes that can be used for memory management by the JVM that runs the PXF application

System_metrics/<host>/pxfjson/jvm/memory/max

JVM memory used

The amount of memory used by the JVM that runs the PXF application

System_metrics/<host>/pxfjson/jvm/memory/used

Arenadata System metrics

This dashboard contains the same set of panels as the System dashboard described above, but allows you to monitor hosts from multiple clusters and Arenadata products. Use the Arenadata product list to select which products to display.

Arenadata System metrics dashboard in Grafana
Arenadata System metrics dashboard in Grafana
Found a mistake? Seleсt text and press Ctrl+Enter to report it