Work with a cluster topology

Eugenia Kuzina

Contents

Overview
Display node details
Run actions

Overview

The cluster topology is displayed on the Topology page of the ADPG Control web interface. You can select the cluster for which the topology should be provided from the Cluster combo box at the top of the page. The topology diagram shows all cluster nodes with the ADPG service. The nodes are connected to each other by arrows indicating the direction of replication streams.

The "Topology" page

The topology diagram displays the following information for each node:

Role in the cluster: Leader, Async replica, Sync replica.
Node name.
Status: Running, Stopping, Initializing, Stopped, Failed, Unknown, Out of patroni.

For replica nodes, the topology diagram shows lag (in bytes) — the number of bytes by which the replica’s state lags behind the leader state.

The table below shows mapping patroni node statuses to node statuses on the diagram.

Status on the topology diagram

Patroni status

Running

running, streaming

Stopping

stopping

Initializing

starting
restarting
initializing new cluster
running custom bootstrap script
creating replica
in archive recovery

Stopped

stopped

Failed

stop failed
crashed
start failed
restart failed
initdb failed
custom bootstrap failed

Unknown

An instance with unknown status

Out of patroni

A node is not in a cluster, for example, if the patroni service is stopped on the node

Display node details

You can click a node on the diagram to display the following node details:

CPU — percentage of CPU usage;
RAM — percentage of RAM usage;
Disk — percentage of disk usage;
Lag — number of bytes by which the replica’s state lags behind the leader state;
Delay — time delay between the creation of a WAL record on the leader and its replay on the replica that is set via the recovery_min_apply_delay parameter;
Timeline — PostgreSQL timeline;
Host — host IP address;
Tags — patroni tags.

Display node details

The Topology page displays a toolbar that allows you to close a window with node details and zoom the topology diagram. The toolbar also shows a zoom factor as a percentage value.

A toolbar on the "Topology" page

It includes the following buttons:

— zoom in;
— zoom out;
— maximize the diagram to fill the entire screen;
— reduce the diagram size to 100%;
— hide node details window.

Run actions

On the Topology page, you can execute the following actions:

Switchover
Failover
Reinit

The actions can be run only for replicas.

To execute actions, click Actions at the top of the page.

Run actions

Alternatively, you can run actions from the node menu.

Run actions from the node menu

In this case, the node for which an action is executed cannot be changed from the action window — the Candidate field for Switchover, Failover and the Instance field for Reinit are filled automatically.

All ADPG Control actions are logged on the Actions page. On this page, you can obtain information about successful actions and errors that occur if actions fail.

IMPORTANT

Note that the Failover and Switchover actions can cause data loss. It depends on how up-to-date the promoted replica is in comparison to the leader. Both actions also interrupt ongoing transactions and sessions on the leader.

Switchover

The Switchover action moves the leader role to a specified replica node. A former leader becomes a replica. If a cluster contains any synchronous replicas, you should select a synchronous replica as a switchover candidate. Otherwise, the action fails with the candidate name does not match with sync_standby error.

You can use this action when the cluster is healthy:

The cluster has a leader.
In a cluster with the synchronous replication, synchronous replicas are available.

If the cluster is unhealthy, use Failover instead.

To run the action, click Switchover in the action list, specify a node candidate (if not set) that should be promoted to the leader, and click Run.

The Switchover action’s window

Failover

The Failover action can move the leader role to an asynchronous or synchronous replica. The previous leader turns into a replica.

You can use this action when the cluster is not healthy, for example, there is no leader in a cluster, or there is no synchronous replica available in a synchronous cluster.

Nothing prevents you from running Failover in a healthy cluster. However, it is recommended to utilize the Switchover action in this case.

To run the action, click Failover in the action list, specify a node candidate (if not set) that should be promoted to the leader, and click Run. Also, when choosing a node, you can select the Autoselect option from the combo box values — the node candidate will be determined automatically.

The Failover action’s window

NOTE

If an automatic failover action occurs in a cluster, it is not displayed on the Actions page. This page logs actions initiated only manually.

Reinit

The Reinit action reinitializes a cluster node. All data on this node will be overwritten.

You can perform the Reinit action when the PostgreSQL instance on a replica is unable to catch up with the primary database, and patroni cannot automatically recover it. Reinit removes the existing data directory and creates a new replica from the current leader.

To run the action, click Reinit in the action list, specify an instance (if not set) that should be reinitialized, and click Run.

The Reinit action’s window

Found a mistake? Seleсt text and press Ctrl+Enter to report it