Arenadata DB Backup Manager overview
Features
Arenadata DB Backup Manager (ADBM) is a fault-tolerant system for ADB binary backups management, which is built on the base of pgBackRest.
ADBM allows you to perform the following functions:
-
Configure backup policies including separate schedules for different backup types.
-
Create database backups according to the configured schedule and on demand.
-
Maintain a list of backups with the ability to search for them and view details (including a cluster topology).
-
Cleanup backups according to existing policies on a schedule and manually.
-
Create restore points according to the configured schedule and manually.
-
Restore databases at the moment of the selected restore point.
-
Log all system actions with the ability to view details of the failed actions.
-
Work with multiple ADB clusters of different types (with/without Standby and mirroring).
-
Use S3 and Posix compatible repositories to store backups.
NOTE
|
Architecture
The high-level architecture view of ADBM is shown below.
ADBM architecture is based on the following components:
-
Backup Manager. Orchestrates cluster actions and performs background work according to the configured schedules: creates backups and recovery points, cleans up old and invalid data, etc. Backup Manager and ADB clusters interact via Backup Agents.
-
Backup Agent. Agents are installed one per ADB cluster host. They are responsible for segment data management:
-
Run pgBackRest commands.
-
Manage pgBackRest configuration.
-
On Master host — manage ADB cluster operations (for example, cluster restart).
-
-
etcd. It is a consistent distributed key/value storage that is used as a separate coordination service in distributed systems. In ADBM, etcd provides two main functions:
-
Store distributed locks at cluster level to ensure the exclusivity of the actions to be launched.
-
Store information about the current active action state. If Backup Manager stops or fails — the current action metadata can be retrieved from etcd.
-
-
Service Registry. Responsible for Service Discovery in ADBM. Due to Service Registry, ADBM discovers available agents (there is no pre-configured agent map in ADBM). And Backup Agents, in turn, find necessary ADBM services to send responses. Since Backup Manager can work with multiple ADB clusters at the same time — when you add a cluster, new agents are also registered via Service Registry.
-
PostgreSQL. Used to store historical data on backups and configurations. By default, it is deployed via the Docker container. The external PostgreSQL database can also be used.
NOTE
|
Concepts
ADBM implements Point-in-Time Recovery (PITR) — the ability to restore databases to the selected point in time. This approach is based on several concepts, which are summarized below.
Backup
Backup is a consistent copy of ADB cluster that can be used to restore databases in case of hardware and other failures. ADBM supports the following backup types:
-
Full. When a full backup is created, the entire database content is backed up. The first backup running in ADBM within each timeline always has the full type, even if according to the schedule (or user choice — when launching a backup manually) the first should be a differential or incremental backup. A full backup can be used to restore data directly as it does not depend on any external files from other backups. The advantage of full backups is that all files are quickly restored (compared to other types). However, it is not recommended to create full backups regularly (hourly or daily) as they take a lot of time to be generated and occupy significant disk space.
-
Differential. A differential backup contains only those database files that have changed since the last full backup was launched. In comparison with full backups, differential backups are faster and require less disk space. However, to restore data from a backup of this type, it is necessary to copy both its contents and files from the last full backup.
-
Incremental. An incremental backup contains only those database files that have changed since the last backup of any other type was launched (full, differential, or incremental). Of all types, incremental backups are the fastest and occupy minimal disk space. However, their restoring takes longer because it is necessary to extract files from the last full and, if available, differential backup, and then apply all incremental backups sequentially.
WAL
Write-Ahead Log (WAL) is a standard mechanism to ensure that no committed changes are lost. All changes to data files (that contain tables, indexes, etc.) are written sequentially to the WAL. Afterwards, a background process writes the changes into the main database cluster files. In case of failures, the WAL can be replayed to make the database consistent.
WAL is broken up into individual 64 MB files called segments. Every record in a segment has a 16-digit logical sequence number (LSN) that is used to find this record by offset in the current segment file. A segment name, in turn, consists of the timeline number and LSN of the first segment record.
Using WAL significantly reduces the number of disk writes since there is no need to flush data pages to disk on every transaction commit. Only WAL files should be flushed to disk to guarantee that a transaction is committed.
WAL plays a central role in the mechanism of Point-in-Time Recovery (PITR) that is used in ADBM. By archiving the WAL data, ADBM allows you to restore databases at moment of any restore point covered by the available WAL data. In fact, the restore process requires two steps:
-
Install a prior backup (or backups).
-
Replay the WAL records, written after this backup, up to the specified named point.
Restore point
Restore point means a named recovery point, which is the minimum unit of data consistency granularity in ADB clusters. After a restore point is created (on a schedule or manually), you can restore the database state at the moment of the point creation. For each existing restore point, ADBM stores information on which backups should be used and which WAL segments should be reproduced to return the database state at the creation time of this point.
To better understand the mechanism of restore points, look at the figure below.
The following table explains which backups and WAL files will be used if you select any of the shown restore points when running the Restore action.
Selected restore point | Objects to be used for recovery |
---|---|
rp1 |
full1, WAL1 |
rp2 |
full1, diff1, incr1, incr2, WAL2 |
rp3 |
full1, diff2, incr3, incr4, WAL3 |
Timeline
Timeline is a mechanism that is used in ADBM to distinguish the WAL series generated after the database recovery at the specified restore point from those created in the original database history (before the restore action application).
Suppose that to recover the database at the creation time of some restore point, after loading one or more backups it is necessary to reproduce WAL1
(the number is used to simplify). After the archive recovery completes, WAL will continue to populate. If the WAL archives are added to the same directory, the subsequent WAL records will overwrite those that exist in the database original history (WAL2
, WAL3
and so on). This, in turn, will result in the failure of database recovery for other restore points added before recovery.
To avoid the conflicts described, ADBM initializes a new timeline number after each data recovery to identify post-recovery WAL archives. Timelines are numbered starting with 0
, with each successful Restore action increasing the number by one. This number is recorded in the names of the directories that store WAL archives and backups divided by segments (so-called stanza). With each data recovery, all necessary directories are created with a new timeline number in the name, and all subsequent WAL records are placed into new directories without overwriting files in the directories that have the previous number.
IMPORTANT
Since newly created directories initially do not contain backups, the first backup after each data restore action will be of type |
The following figure illustrates the timeline logic in a simplified way. After you restore data at the moment of the rp1
point, WAL1
is replayed and timeline1
is created. Subsequent WAL records are added to new directories. The next restore action creates timeline2
and so on. Note that after creation of timeline1
and timeline2
it is still possible to restore data at the moment of the rp2
point — without applying the timeline concept, this is impossible.
Quick start
The typical sequence of actions when working with ADBM via the web interface includes the following steps:
-
Create a configuration, which will be used to create backups and perform other related actions.
-
Run a backup manually or wait for the automatic backup launch on a schedule. If necessary — view a list of backups.
-
Create a restore point manually or wait for the automatic restore point creation on a schedule.
-
Run the auxiliary operations if necessary:
-
Restore data from a backup if it is necessary to return the database state at the moment of one of existing restore points.
On each of the steps, you can view the details of running actions, including causes of the failures (if they are fixed).