ADPG monitoring metrics
This article describes available monitoring metrics for ADPG. For information on monitoring and its installation, refer to the Install monitoring and Monitoring articles.
ADPG uses Grafana to visualize metrics. To view metric dashboards, enter an address of the host where Grafana is deployed and add the port number — the Grafana TCP port parameter (the default value is 12012
). For example, http://10.92.6.91:12012. For login, use admin
as a username, and the Grafana admin’s password parameter value as a password. You can find the Grafana parameters on the Configuration tab of the Metrics storage service.
NOTE
|
You can find the following dashboards in Grafana:
-
Global ADPG dashboard — includes the general information on the system state and is supposed to be the first warning dashboard to detect the problem.
-
ADPG Checkpointer (Bgwriter, Block IO Stats) — displays statistics of checkpoints and bgwriter.
-
ADPG DB overview — visual graphs for an ADPG node characteristics.
-
ADPG Health-check — shows information on a selected cluster node.
-
ADPG Replication — displays replication parameters.
-
ADPG Sessions overview — contains visual graphs for session analysis.
-
ADPG System metrics — displays system metrics of all cluster nodes.
-
PgBouncer statistics — displays PgBouncer statistics.
Global ADPG dashboard
The Global ADPG dashboard provides general database statistics and can alert you to critical issues in the cluster.
The Global ADPG dashboard includes the following values:
-
Monitored PRIMARY DB-s.
-
Monitored REPLICA DB-s.
-
Offline nodes.
It is also contains the tables listed below:
-
Top N by TPS.
-
Top N by QPS.
-
Top N by TX rollback.
-
Top N by shared buffers hit ratio.
-
Top N by replication lag.
-
Top N by DB size.
-
Top N by idle sessions %.
-
Top N by blocked sessions %.
-
Top N by longest TX time.
-
Top N by WAL rate.
-
Top N by WAL folder size.
-
Top N by longest session duration.
-
Top N by used connections.
-
Top N by CPU utilization %.
-
Top N by waiting time.
-
Top N by temp files.
-
Top N by lowest free disk %.
-
Top N by duration of running autovacuums.
-
Top N by autovacuum warn percent.
-
Top N by checkpoint write and sync duration.
Where Top N
is the top limit size, the size of the output that is set by the top_limit
filter, the default value is 3
.
ADPG Checkpointer (Bgwriter, Block IO Stats)
This dashboard contains statistics of checkpoints and bgwriter.
ADPG Checkpointer includes the following graphs:
-
Checkpoints. It represents the number of checkpoints during the aggregation period.
-
Checkpointer Write / Sync durations.
-
Bgwriter Stats. It displays the
buffers_checkpoint
,buffers_clean
, andbuffers_backend
values. -
Backend Read / Write times. It is based on the
pg_stat_database
view. It requires that thetrack_io_timing
option is set toon
. -
Table / Index / Toast Blocks Read. Note, that Reads could be served by the file system cache.
ADPG DB overview
This dashboard contains visual graphs for ADPG node characteristics and helps to analyze the weaknesses of a particular node.
The ADPG DB overview dashboard includes the following states:
-
Instance state — PRIMARY/REPLICA.
-
Instance uptime.
-
TPS — transactions per second.
-
QPS — queries per second.
-
Query runtime — average query runtime.
-
DB size ch. 1h — DB size that is calculated for each hour.
-
Approx Table Bloat.
-
Tuples fetched vs returned.
The ADPG DB overview dashboard contains the following graphs:
-
Tuple ins. / upd. / del. statistics.
-
Shared Buffers hit ratio + Rollback ratio.
-
TPS / QPS avg.
-
WAL rate + DB size.
-
Seq. / Idx. scans.
-
Sessions by state —
active
,idle
,total
,waiting
,idleintransaction
,av_workers
. -
CPU load + avg.query runtime.
-
Temp bytes — it appears when large grouping and sorting operations require more memory than the
work_mem
value.
ADPG Health-check
This dashboard displays state characteristics of a particular node.
The ADPG Health-check dashboard includes the following states:
-
Instance state.
-
Instance uptime.
-
PG version number.
-
Longest query runtime.
-
Number of active connection.
-
Number of max. connections.
-
Number of blocked sessions.
-
Shared buffer hit percent.
-
Avg. TX rollback percent.
-
TPS(avg.).
-
QPS(avg.).
-
"idle" in TX count.
-
DB size(last).
-
DB size change(diff).
-
DATADIR disk space left.
-
Query runtime(avg.).
-
Config change events.
-
Table changes.
-
WAL archiving status.
-
WAL folder size.
-
Invalid/duplicate indexes.
-
Autovacuum issues.
-
Checkpoints requested.
-
Approx table bloat.
-
WAL per second(avg.).
-
Temp bytes per second(avg.).
-
Longest autovacuum duration.
-
Seq. scans on >100MB tables per minute(avg.).
-
INSERT-s per minute(avg.).
-
UPDATE-s per minute(avg.).
-
DELETE-s per minute(avg.).
-
Backup duration.
-
Max table FREEZE age.
-
Max. XMIN horizon age.
-
Inactive replication slots.
-
Max replication lag.
ADPG Replication
The ADPG Replication dashboard contains replication metrics.
The ADPG Replication dashboard includes the following states:
-
Inactive repl. slots.
-
Active repl. slots.
-
Active replicas.
-
Active "sync" replicas.
-
Slot max. restart_lsn lag.
-
Max. write lag.
-
Max. flush lag.
-
Max. replay lag.
The ADPG Replication dashboard contains the following graphs:
-
Replication slot restart_lsn lag (primary extra WAL size). It is calculated based on the pg_replication_slots view only for primary nodes.
-
Replication flush lag. It is calculated based on the pg_stat_replication view for primary nodes. Note that data is available only on connected replicas.
-
Replication replay lag. It is calculated based on the pg_stat_replication view for primary nodes. Note that data is available only on connected replicas.
-
Repl. slot XMIN age (in transactions). It is calculated based on the
xmin
field of the pg_replication_slots view.
ADPG Sessions overview
The ADPG Sessions overview dashboard provides session statistics.
The ADPG Sessions overview dashboard contains the following graphs:
-
Max. TPS/QPS.
-
Longest query duration.
-
Longest TX duration.
-
Longest wait duration.
-
Longest session duration.
-
Longest Autovacuum duration.
-
Sessions by state.
-
Instance total connections.
ADPG System metrics
The ADPG System metrics dashboard contains node system metrics (CPU load, network load, disk storage analysis, and others).
The ADPG System metrics dashboard contains the following graphs:
-
CPU usage %.
-
LoadAVG 1m normalized.
-
IO Write, bytes/sec.
-
IO Read, bytes/sec.
-
Network receive bytes.
-
Network transmit bytes.
-
Memory cached.
-
Memory free.
-
Disk space usage %.
-
Disk space available bytes.
-
Processes total.
PgBouncer statistics
The PgBouncer statistics dashboard contains statistics that reflect the PgBouncer performance.
The PgBouncer statistics dashboard displays the following graphs:
-
TPS — transactions per second.
-
QPS — queries per second.
-
Avg. query runtime — the average query runtime, in microseconds.
-
Pool wait time per Query — the average waiting time for queries in the pool, in microseconds.
-
Incoming traffic rate — the incoming traffic rate, bytes/sec.
-
Outgoing traffic rate — the outgoing traffic rate, bytes/sec.
NOTE
To make PgBouncer monitoring work correctly, call the cluster action Reconfigure monitoring agents after changing settings from the Enable pgbouncer configuration section. |