Continuous archiving and point-in-time recovery (PITR)

Eugenia Kuzina

Contents

Overview
Set up WAL archiving
Make a base backup
Use the low level API to make a base backup
Recover from a continuous archive backup

Overview

PostgreSQL writes every change made to the database data files to the write-ahead log (WAL). WAL files are located in the pg_wal directory. By default, ADPG places this directory at the following path: /pg_data1/adpg16/pg_wal, where pg_data1 is the Data directory parameter value. See Configuration parameters for details. If the system crashes, PostgreSQL "replays" WAL entries made since the last checkpoint.

It is possible to combine a file system level backup with WAL files for backup creation. The file system backup should be restored first, and after that WAL entries should be applied to a database to bring the system to a current state. This approach has the following advantages:

It does not require a perfectly consistent file system backup as the starting point. When PostgreSQL applies WAL entries, it corrects any inconsistency in the backup. You do not need a file system snapshot capability and can use tar or a similar archiving tool.
Since we can combine an indefinitely long sequence of WAL files for replay, continuous backup can be obtained by simply continuing to archive the WAL files. This is useful for large databases that require a significant amount of resources to create full backups frequently.
It is not necessary to repeat operations from all WAL entries to the end. You can stop at any point and have a consistent point-in-time snapshot of the database. Therefore, this technique supports point-in-time recovery: it is possible to restore the database to its state at any point in time since the base backup was taken.
If the server is continuously transferring WAL files to another machine that is loaded with the same base backup file, this machine has a current copy of the database and can replace the server in case of failure.

NOTE

The pg_dump and pg_dumpall tools do not produce file system level backups and cannot be used as part of a continuous-archiving solution.

Note that PITR supports restoring an entire database cluster, not a part of it. It also requires additional storage for the base backup and archived WAL files.

To successfully use continuous archiving, you need a continuous sequence of archived WAL files that starts no later than the backup start. You should set up and test the procedure for archiving WAL files before you take the first base backup.

Set up WAL archiving

PostgreSQL produces an indefinitely long sequence of WAL records. The system divides this sequence into WAL segment files, which are typically 16 MB in size. The segment files have numeric names that reflect their position in the abstract WAL sequence. If the server does not use WAL archiving, the system creates a few segment files and then "recycles" them by renaming segment files to higher segment numbers. It is assumed that segment files whose contents precede the last checkpoint can be recycled.

To archive WAL data, you need to capture the contents of each segment file once it is filled and save it somewhere before the segment file is recycled. For example, you can copy the segment files to an NFS-mounted directory on another machine or write them to any external drive. PostgreSQL lets the administrator specify a shell command for making a copy of a completed segment file. The command can be simple or invokes a complex shell script.

To enable WAL archiving, set the wal_level configuration parameter to replica or higher, archive_mode to on, and specify the shell command for the archive_command configuration parameter. Use the ADPG configurations section on the Clusters → ADPG cluster → Services → ADPG → Primary configuration tab in ADCM UI to specify these parameters. See Configuration parameters for details.

In archive_command, use %p instead of a path name of the file to archive, and %f instead of a file name. The path name is relative to the cluster data directory. For example, the following archive_command copies WAL segments to the /mnt/server/archivedir directory:

archive_command : 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f'

During execution the server replaces the %p and %f parameters with values and the command looks as follows:

test ! -f /mnt/server/archivedir/0000000600000000000000A0 && cp pg_wal/0000000600000000000000A0 /mnt/server/archivedir/0000000600000000000000A0

A similar command is generated for each new file to be archived. The archive command is executed on behalf of the same user that runs ADPG.

The archive command should not overwrite any existing archive file. This is an important security feature to preserve the archive integrity in case of administrator errors (for example, when sending the output of two different servers to the same archive directory). In the example above, the archive command includes the test command for this purpose.

An example of setting the archive_command parameter with a shell script:

archive_command : 'local_backup_script.sh "%p" "%f"'

Since archived WAL files contain information from your database, make sure that the directory with these files has appropriate access permissions.

The archive command should only return a zero exit status if it succeeds. When PostgreSQL gets a zero result, it indicates that the file has been successfully archived and can be recycled. If the command returns a non-zero status, PostgreSQL periodically tries to archive the file until it succeeds.

When a signal (other than SIGTERM that is used as part of a server shutdown) or a shell error terminates an archive command execution, the archive process is aborted and restarted. In such cases, the failure is not reported in pg_stat_archiver.

PostgreSQL invokes the archive command for completed WAL segments. If the server generates little WAL traffic, it can cause a long delay between the completion of a transaction and its recording in the archive storage. You can set the archive_timeout parameter to limit this delay. This parameter forces the server to switch to a new WAL segment file after the specified period of time.

You can also force the server to switch to a new WAL file with the pg_switch_wal function if you want to archive a completed transaction as soon as possible. Other utility functions related to WAL management are listed in Backup Control Functions.

Make a base backup

You can use the pg_basebackup tool as described in File system level backup to create a base backup.

To make use of the backup, you need to keep all the WAL segment files generated during and after the file system backup. The base backup process creates a backup history file that is stored in the WAL archive directory. This file is named after the first WAL segment file that you need for the file system backup. For example, if the starting WAL file is 0000000100001234000055CD the backup history file is named as 0000000100001234000055CD.007C9330.backup. The second part of the filename indicates the exact position in the WAL file and can be ignored.

It is necessary to keep all archived WAL files since the last base backup. That is why the base backup interval is usually chosen based on the size of storage for the WAL file archive. You should also consider the time required for recovering data. The system should replay all WAL segments that are archived since the last base backup to be restored.

Use the low level API to make a base backup

Low level base backups can be made in a non-exclusive or an exclusive way. The non-exclusive method is recommended and the exclusive is deprecated. Non-exclusive backup allows you to run other concurrent backup processes that can use the same backup API or the pg_basebackup tool.

The base backup creation with the low level API contains more steps than the method using pg_basebackup. It is very important that these steps are performed in order, and the next step is only executed if the previous one is successful.

Ensure that WAL archiving is enabled and working.
Connect to the server as a user with rights to run the pg_start_backup function and execute the command:
```
SELECT pg_start_backup('label', false, false);
```
The pg_start_backup function has the following syntax:
```
pg_start_backup(<unique_identifier>, <quick_backup>, <exclusive>)
```
Where:
- <unique_identifier> is a string that is a unique identifier for the backup operation.
- <quick_backup>. By default, pg_start_backup can take a long time to complete. The pg_start_backup function performs a checkpoint, and I/O operations required for the checkpoint are spread out over a significant period of time that equals to half the inter-checkpoint interval. See the checkpoint_completion_target configuration parameter. It minimizes the impact on query processing. If you want to start the backup as soon as possible, change the second parameter to true. It starts an immediate checkpoint using as much I/O as available.
- <exclusive> specifies if pg_start_backup initiates an exclusive base backup. In our example, a non-exclusive base backup is performed.
  
  The connection that calls pg_start_backup must be maintained until the backup is completed, otherwise, the process is automatically aborted.
Use any file system backup tool to create a backup. See Backing up the data directory for more information on what directories must be copied.
In the same connection as in step 2, execute pg_stop_backup:
```
SELECT * FROM pg_stop_backup(false, true);
```
The pg_stop_backup function has the following syntax:
```
pg_stop_backup(<exclusive>, <wait_for_archive>)
```
Where:
- <exclusive> specifies if the backup is created in the exclusive mode. This parameter must match the value from the pg_start_backup call.
- <wait_for_archive> specifies whether pg_stop_backup should wait until the WAL files are archived.
  
  This command terminates the backup mode. On a primary, it also performs an automatic switch to the next WAL segment. On a standby, it is not possible to automatically switch WAL segments, you can run pg_switch_wal on the primary to perform a manual switch. This operation makes the last WAL segment file written during the backup ready to archive.
  
  The pg_stop_backup function returns one row with three values. Write the second value to the backup_label file in the backup root directory. The third value should be written to the tablespace_map file unless the value field is empty. These files are critical and must be written byte by byte unchanged, which may require a file to be opened in the binary mode.
The WAL segment files used during the backup should be archived. The file identified by the pg_stop_backup first returned value is the last segment that is required to form a complete set of backup files. On a primary, if archive_mode is enabled and the wait_for_archive parameter of the pg_stop_backup function is true, pg_stop_backup does not return values until the last segment is archived. On a standby, archive_mode must be always to force pg_stop_backup to wait for archiving. Archiving WAL files occurs automatically if you configure archive_command.

Recover from a continuous archive backup

To recover from the backup, perform the following steps:

Stop the server.
Copy the cluster data directory and any tablespaces to a temporary location. This operation requires that you have enough free space on your system to hold two copies of your database. If you do not have enough space to save all the data, you must save the contents of the pg_wal directory, because it might contain logs that are not archived before the system failure.
Remove all existing files and subdirectories under the cluster data directory and root directories of any tablespaces.
Restore the database files from your file system backup. Make sure that they are restored with the correct ownership (the database system user, not root) and with the correct permissions. If you are using tablespaces, verify that the symbolic links in pg_tblspc/ are restored properly.
Remove any files present in pg_wal/. These files are from the file system backup and are probably obsolete. If you did not use pg_wal/, create this directory with proper permissions. If you use a symbolic link, recreate it.
If you have unarchived WAL segment files that you saved in step 2, copy them into pg_wal/. Do not move them, so you still have the unmodified files if a problem occurs, and you have to start all over again.
Set recovery configuration settings in ADPG configurations and create the recovery.signal file in the cluster data directory (the default data directory in ADPG is /pg_data1/adpg16). You can also temporarily modify the PG_HBA section in ADCM UI to prevent user connections until you recover the system.
Start the server. The server goes into recovery mode and reads the required archived WAL files. If the recovery is terminated because of an external error, the server can be restarted and the recovery will continue. When the restore process is complete, the server deletes recovery.signal to prevent an accidental re-entry into recovery mode and begins standard database operations.
Inspect the database contents to ensure that the desired state is restored. If so, edit the PG_HBA section to allow users to connect to the restored database. If not, return to step 1.

In step 7, you need to set up the recovery configuration that describes how the restore is performed and to what point. Use the restore_command parameter that specifies how to retrieve archived WAL file segments. This parameter is similar to archive_command. It contains a shell command string with %p instead of a path name to copy the log file to and %f instead of a log file name. The path name is relative to the cluster data directory.

Example:

restore_command : 'cp /mnt/server/archivedir/%f %p'

The restore_command parameter can also contain a shell script.

If the server cannot find WAL segments in the archive, it searches for them in the pg_wal/ directory. It allows using recent non-archived segments. Segments from the archive have a higher priority than files from pg_wal/.

The recovery process handles all available WAL segments and restores the database to the current point in time or as close as possible given the available WAL segments. Therefore, a recovery ends with a "file not found" message, the exact text of the error message depends on the current restore_command. You may also see an error message at the start of recovery for a file named similar to 00000001.history. This also does not indicate a problem in most cases. See Timelines for details.

If you need to recover to some previous point in time, set a stopping point. You can specify a stopping point ("recovery target") by date/time, a named restore point, or by completion of a specific transaction ID. At present, only the date/time and named restore point options are usable since there are no tools to help you identify which transaction ID to use.

If the recovery process finds corrupted WAL data, the recovery halts at that point and the server does not start. In this case, run the recovery process from the beginning and specify a "recovery target" before the point of corruption. The recovery can complete successfully. If the recovery fails for an external reason, such as a system crash or if the WAL archive becomes inaccessible, the recovery can be restarted, and it restarts from the point of the failure. The recovery restart works in the same way as checkpoints: the server periodically saves the current state to the disk and updates the pg_control file. This way, already processed WAL data is not rescanned.

Found a mistake? Seleсt text and press Ctrl+Enter to report it