Конференция Arenadata
Новое время — новый Greenplum
Мы приглашаем вас принять участие в конференции, посвященной будущему Open-Source Greenplum 19 сентября в 18:00:00 UTC +3. Встреча будет проходить в гибридном формате — и офлайн, и онлайн. Онлайн-трансляция будет доступна для всех желающих.
Внезапное закрытие Greenplum его владельцем — компанией Broadcom - стало неприятным сюрпризом для всех, кто использует или планирует начать использовать решения на базе этой технологии. Многие ожидают выхода стабильной версии Greenplum 7 и надеются на её дальнейшее активное развитие.
Arenadata не могла допустить, чтобы разрабатываемый годами Open-Source проект Greenplum прекратил своё существование, поэтому 19 сентября мы представим наш ответ на данное решение Broadcom, а участники сообщества получат исчерпывающие разъяснения на все вопросы о дальнейшей судьбе этой технологии.

На конференции вас ждёт обсуждение следующих тем:

  • План возрождения Greenplum;
  • Дорожная карта;
  • Экспертное обсуждение и консультации.
Осталось до события

Arenadata DB Backup Manager overview

Features

Arenadata DB Backup Manager (ADBM) is a fault-tolerant system for ADB binary backups management, which is built on the base of pgBackRest.

ADBM allows you to perform the following functions:

  • Configure backup policies including separate schedules for different backup types.

  • Create database backups according to the configured schedule and on demand.

  • Maintain a list of backups with the ability to search for them and view details (including a cluster topology).

  • Cleanup backups according to existing policies on a schedule and manually.

  • Create restore points according to the configured schedule and manually.

  • Restore databases at the moment of the selected restore point.

  • Log all system actions with the ability to view details of the failed actions.

  • Work with multiple ADB clusters of different types (with/without Standby and mirroring).

  • Use S3 and Posix compatible repositories to store backups.

NOTE
  • ADBM is available in the ADB Enterprise Edition.

  • Currently, ADBM can be installed only offline.

Architecture

The high-level architecture view of ADBM is shown below.

ADBM architecture
ADBM architecture
ADBM architecture
ADBM architecture

ADBM architecture is based on the following components:

  • Backup Manager. Orchestrates cluster actions and performs background work according to the configured schedules: creates backups and recovery points, cleans up old and invalid data, etc. Backup Manager and ADB clusters interact via Backup Agents.

  • Backup Agent. Agents are installed one per ADB cluster host. They are responsible for segment data management:

    • Run pgBackRest commands.

    • Manage pgBackRest configuration.

    • On Master host — manage ADB cluster operations (for example, cluster restart).

  • etcd. It is a consistent distributed key/value storage that is used as a separate coordination service in distributed systems. In ADBM, etcd provides two main functions:

    • Store distributed locks at cluster level to ensure the exclusivity of the actions to be launched.

    • Store information about the current active action state. If Backup Manager stops or fails — the current action metadata can be retrieved from etcd.

  • Service Registry. Responsible for Service Discovery in ADBM. Due to Service Registry, ADBM discovers available agents (there is no pre-configured agent map in ADBM). And Backup Agents, in turn, find necessary ADBM services to send responses. Since Backup Manager can work with multiple ADB clusters at the same time — when you add a cluster, new agents are also registered via Service Registry.

  • PostgreSQL. Used to store historical data on backups and configurations. By default, it is deployed via the Docker container. The external PostgreSQL database can also be used.

NOTE

Concepts

ADBM implements Point-in-Time Recovery (PITR) — the ability to restore databases to the selected point in time. This approach is based on several concepts, which are summarized below.

Backup

Backup is a consistent copy of ADB cluster that can be used to restore databases in case of hardware and other failures. ADBM supports the following backup types:

  • Full. When a full backup is created, the entire database content is backed up. The first backup running in ADBM within each timeline always has the full type, even if according to the schedule (or user choice — when launching a backup manually) the first should be a differential or incremental backup. A full backup can be used to restore data directly as it does not depend on any external files from other backups. The advantage of full backups is that all files are quickly restored (compared to other types). However, it is not recommended to create full backups regularly (hourly or daily) as they take a lot of time to be generated and occupy significant disk space.

  • Differential. A differential backup contains only those database files that have changed since the last full backup was launched. In comparison with full backups, differential backups are faster and require less disk space. However, to restore data from a backup of this type, it is necessary to copy both its contents and files from the last full backup.

  • Incremental. An incremental backup contains only those database files that have changed since the last backup of any other type was launched (full, differential, or incremental). Of all types, incremental backups are the fastest and occupy minimal disk space. However, their restoring takes longer because it is necessary to extract files from the last full and, if available, differential backup, and then apply all incremental backups sequentially.

WAL

Write-Ahead Log (WAL) is a standard mechanism to ensure that no committed changes are lost. All changes to data files (that contain tables, indexes, etc.) are written sequentially to the WAL. Afterwards, a background process writes the changes into the main database cluster files. In case of failures, the WAL can be replayed to make the database consistent.

WAL is broken up into individual 64 MB files called segments. Every record in a segment has a 16-digit logical sequence number (LSN) that is used to find this record by offset in the current segment file. A segment name, in turn, consists of the timeline number and LSN of the first segment record.

Using WAL significantly reduces the number of disk writes since there is no need to flush data pages to disk on every transaction commit. Only WAL files should be flushed to disk to guarantee that a transaction is committed.

WAL plays a central role in the mechanism of Point-in-Time Recovery (PITR) that is used in ADBM. By archiving the WAL data, ADBM allows you to restore databases at moment of any restore point covered by the available WAL data. In fact, the restore process requires two steps:

Restore point

Restore point means a named recovery point, which is the minimum unit of data consistency granularity in ADB clusters. After a restore point is created (on a schedule or manually), you can restore the database state at the moment of the point creation. For each existing restore point, ADBM stores information on which backups should be used and which WAL segments should be reproduced to return the database state at the creation time of this point.

To better understand the mechanism of restore points, look at the figure below.

Example with restore points and different backup types
Example with restore points and different backup types
Example with restore points and different backup types
Example with restore points and different backup types

The following table explains which backups and WAL files will be used if you select any of the shown restore points when running the Restore action.

Logic of restore points
Selected restore point Objects to be used for recovery

rp1

full1, WAL1

rp2

full1, diff1, incr1, incr2, WAL2

rp3

full1, diff2, incr3, incr4, WAL3

Timeline

Timeline is a mechanism that is used in ADBM to distinguish the WAL series generated after the database recovery at the specified restore point from those created in the original database history (before the restore action application).

Suppose that to recover the database at the creation time of some restore point, after loading one or more backups it is necessary to reproduce WAL1 (the number is used to simplify). After the archive recovery completes, WAL will continue to populate. If the WAL archives are added to the same directory, the subsequent WAL records will overwrite those that exist in the database original history (WAL2, WAL3 and so on). This, in turn, will result in the failure of database recovery for other restore points added before recovery.

To avoid the conflicts described, ADBM initializes a new timeline number after each data recovery to identify post-recovery WAL archives. Timelines are numbered starting with 0, with each successful Restore action increasing the number by one. This number is recorded in the names of the directories that store WAL archives and backups divided by segments (so-called stanza). With each data recovery, all necessary directories are created with a new timeline number in the name, and all subsequent WAL records are placed into new directories without overwriting files in the directories that have the previous number.

IMPORTANT

Since newly created directories initially do not contain backups, the first backup after each data restore action will be of type full, even if a differential or incremental backup should be launched first according to the schedule (or user choice — when launching a backup manually).

The following figure illustrates the timeline logic in a simplified way. After you restore data at the moment of the rp1 point, WAL1 is replayed and timeline1 is created. Subsequent WAL records are added to new directories. The next restore action creates timeline2 and so on. Note that after creation of timeline1 and timeline2 it is still possible to restore data at the moment of the rp2 point — without applying the timeline concept, this is impossible.

Timeline logic
Timeline logic
Timeline logic
Timeline logic

Quick start

The typical sequence of actions when working with ADBM via the web interface includes the following steps:

  1. Connect to ADBM.

  2. Create a configuration, which will be used to create backups and perform other related actions.

  3. Run a backup manually or wait for the automatic backup launch on a schedule. If necessary — view a list of backups.

  4. Create a restore point manually or wait for the automatic restore point creation on a schedule.

  5. Run the auxiliary operations if necessary:

  6. Restore data from a backup if it is necessary to return the database state at the moment of one of existing restore points.

On each of the steps, you can view the details of running actions, including causes of the failures (if they are fixed).

Found a mistake? Seleсt text and press Ctrl+Enter to report it