ADPG monitoring metrics

This article describes available monitoring metrics for ADPG. For information on monitoring and its installation, refer to the Install monitoring and Monitoring articles.

ADPG uses Grafana to visualize metrics. To view metric dashboards, enter an address of the host where Grafana is deployed and add the port number — the Grafana TCP port parameter (the default value is 12012). For example, http://10.92.6.91:12012. For login, use admin as a username, and the Grafana admin’s password parameter value as a password. You can find the Grafana parameters on the Configuration tab of the Metrics storage service.

The Grafana interface
The Grafana interface
NOTE
  • A host with HAProxy (the Balancer service) is the data source for Grafana. If this host fails, you need to manually change the Grafana data source. In the Grafana interface, open the Configuration page, switch to the Data sources tab, and add a new data source of the PostgreSQL type.

  • You need to create the pg_stat_statements extension to display some dashboard charts. See Work with extensions.

You can find the following dashboards in Grafana:

Global ADPG dashboard

The Global ADPG dashboard provides general database statistics and can alert you to critical issues in the cluster.

Global ADPG dashboard
Global ADPG dashboard

The Global ADPG dashboard includes the following values:

  • Monitored PRIMARY DB-s.

  • Monitored REPLICA DB-s.

  • Offline nodes.

It is also contains the tables listed below:

  • Top N by TPS.

  • Top N by QPS.

  • Top N by TX rollback.

  • Top N by shared buffers hit ratio.

  • Top N by replication lag.

  • Top N by DB size.

  • Top N by idle sessions %.

  • Top N by blocked sessions %.

  • Top N by longest TX time.

  • Top N by WAL rate.

  • Top N by WAL folder size.

  • Top N by longest session duration.

  • Top N by used connections.

  • Top N by CPU utilization %.

  • Top N by waiting time.

  • Top N by temp files.

  • Top N by lowest free disk %.

  • Top N by duration of running autovacuums.

  • Top N by autovacuum warn percent.

  • Top N by checkpoint write and sync duration.

Where Top N is the top limit size, the size of the output that is set by the top_limit filter, the default value is 3.

ADPG Checkpointer (Bgwriter, Block IO Stats)

This dashboard contains statistics of checkpoints and bgwriter.

ADPG Checkpointer dashboard
ADPG Checkpointer dashboard

ADPG Checkpointer includes the following graphs:

  • Checkpoints. It represents the number of checkpoints during the aggregation period.

  • Checkpointer Write / Sync durations.

  • Bgwriter Stats. It displays the buffers_checkpoint, buffers_clean, and buffers_backend values.

  • Backend Read / Write times. It is based on the pg_stat_database view. It requires that the track_io_timing option is set to on.

  • Table / Index / Toast Blocks Read. Note, that Reads could be served by the file system cache.

ADPG DB overview

This dashboard contains visual graphs for ADPG node characteristics and helps to analyze the weaknesses of a particular node.

ADPG DB overview dashboard
ADPG DB overview dashboard

The ADPG DB overview dashboard includes the following states:

  • Instance state — PRIMARY/REPLICA.

  • Instance uptime.

  • TPS — transactions per second.

  • QPS — queries per second.

  • Query runtime — average query runtime.

  • DB size ch. 1h — DB size that is calculated for each hour.

  • Approx Table Bloat.

  • Tuples fetched vs returned.

The ADPG DB overview dashboard contains the following graphs:

  • Tuple ins. / upd. / del. statistics.

  • Shared Buffers hit ratio + Rollback ratio.

  • TPS / QPS avg.

  • WAL rate + DB size.

  • Seq. / Idx. scans.

  • Sessions by state — active, idle, total, waiting, idleintransaction, av_workers.

  • CPU load + avg.query runtime.

  • Temp bytes — it appears when large grouping and sorting operations require more memory than the work_mem value.

ADPG Health-check

This dashboard displays state characteristics of a particular node.

ADPG Health-check dashboard
ADPG Health-check dashboard

The ADPG Health-check dashboard includes the following states:

  • Instance state.

  • Instance uptime.

  • PG version number.

  • Longest query runtime.

  • Number of active connection.

  • Number of max. connections.

  • Number of blocked sessions.

  • Shared buffer hit percent.

  • Avg. TX rollback percent.

  • TPS(avg.).

  • QPS(avg.).

  • "idle" in TX count.

  • DB size(last).

  • DB size change(diff).

  • DATADIR disk space left.

  • Query runtime(avg.).

  • Config change events.

  • Table changes.

  • WAL archiving status.

  • WAL folder size.

  • Invalid/duplicate indexes.

  • Autovacuum issues.

  • Checkpoints requested.

  • Approx table bloat.

  • WAL per second(avg.).

  • Temp bytes per second(avg.).

  • Longest autovacuum duration.

  • Seq. scans on >100MB tables per minute(avg.).

  • INSERT-s per minute(avg.).

  • UPDATE-s per minute(avg.).

  • DELETE-s per minute(avg.).

  • Backup duration.

  • Max table FREEZE age.

  • Max. XMIN horizon age.

  • Inactive replication slots.

  • Max replication lag.

ADPG Replication

The ADPG Replication dashboard contains replication metrics.

ADPG Replication dashboard
ADPG Replication dashboard

The ADPG Replication dashboard includes the following states:

  • Inactive repl. slots.

  • Active repl. slots.

  • Active replicas.

  • Active "sync" replicas.

  • Slot max. restart_lsn lag.

  • Max. write lag.

  • Max. flush lag.

  • Max. replay lag.

The ADPG Replication dashboard contains the following graphs:

  • Replication slot restart_lsn lag (primary extra WAL size). It is calculated based on the pg_replication_slots view only for primary nodes.

  • Replication flush lag. It is calculated based on the pg_stat_replication view for primary nodes. Note that data is available only on connected replicas.

  • Replication replay lag. It is calculated based on the pg_stat_replication view for primary nodes. Note that data is available only on connected replicas.

  • Repl. slot XMIN age (in transactions). It is calculated based on the xmin field of the pg_replication_slots view.

ADPG Sessions overview

The ADPG Sessions overview dashboard provides session statistics.

ADPG Sessions overview dashboard
ADPG Sessions overview dashboard

The ADPG Sessions overview dashboard contains the following graphs:

  • Max. TPS/QPS.

  • Longest query duration.

  • Longest TX duration.

  • Longest wait duration.

  • Longest session duration.

  • Longest Autovacuum duration.

  • Sessions by state.

  • Instance total connections.

ADPG System metrics

The ADPG System metrics dashboard contains node system metrics (CPU load, network load, disk storage analysis, and others).

ADPG System metrics dashboard
ADPG System metrics dashboard

The ADPG System metrics dashboard contains the following graphs:

  • CPU usage %.

  • LoadAVG 1m normalized.

  • IO Write, bytes/sec.

  • IO Read, bytes/sec.

  • Network receive bytes.

  • Network transmit bytes.

  • Memory cached.

  • Memory free.

  • Disk space usage %.

  • Disk space available bytes.

  • Processes total.

PgBouncer statistics

The PgBouncer statistics dashboard contains statistics that reflect the PgBouncer performance.

PgBouncer statistics dashboard
PgBouncer statistics dashboard

The PgBouncer statistics dashboard displays the following graphs:

  • TPS — transactions per second.

  • QPS — queries per second.

  • Avg. query runtime — the average query runtime, in microseconds.

  • Pool wait time per Query — the average waiting time for queries in the pool, in microseconds.

  • Incoming traffic rate — the incoming traffic rate, bytes/sec.

  • Outgoing traffic rate — the outgoing traffic rate, bytes/sec.

NOTE

To make PgBouncer monitoring work correctly, call the cluster action Reconfigure monitoring agents after changing settings from the Enable pgbouncer configuration section.

Found a mistake? Seleсt text and press Ctrl+Enter to report it