Logical replication
Overview
Logical replication is a replication of data objects and their changes based on their replication identity (usually a primary key). We use the term "logical replication" as opposed to physical replication, which uses block addresses and byte-by-byte replication. Logical replication allows you to control both data replication and security.
The logical replication uses a publish/subscribe model with one or more subscribers that subscribe to one or more publications on the publishing node. Subscribers receive data from the publications and can republish the data for cascading replication or more complex configurations.
The logical replication of a table starts with taking a snapshot of the data in the publisher database and copying it to the subscriber. For more information, see Architecture. After that, the changes on the publisher are sent to the subscriber in real-time as they occur. The subscriber applies changes in the same order as the publishing node. Publications within the same subscription are guaranteed transactional integrity. The subscriber applies data changes in the same order as the publisher so that transactional consistency is guaranteed for publications within a single subscription. This method of data replication is also known as transactional replication.
The typical use cases for logical replication are:
-
sending incremental changes in a single database or a subset of databases to subscribers;
-
firing triggers for individual changes when a subscriber receives them;
-
combining multiple databases into one (for example, for analysis purposes);
-
replicating between different PostgreSQL major versions;
-
replicating between PostgreSQL instances on different platforms;
-
providing access to replicated data to different groups of users;
-
sharing a database subset across multiple databases.
The subscriber database behaves in the same way as any other PostgreSQL instance and can be a publisher if you create publications in it. When a subscriber acts as a read-only application, there are no conflicts with a single subscription. If any applications or other subscribers write to the same set of tables, conflicts can occur.
For detailed information about restrictions of the logical replication, refer to Restrictions.
To set up a logical replication, perform the following steps:
-
Configure the publishing and subscriber nodes.
Configuration settings
The logical replication requires several configuration options to be set. Use postgresql.conf custom section on the Clusters → ADPG cluster → Services → ADPG → Primary configuration tab in ADCM UI to specify these options. See Configuration parameters for details.
Specify the following parameters on the publisher side:
-
wal_level must be set to
logical
. -
max_replication_slots must be set to at least the number of subscriptions expected to connect, plus some reserve for table synchronization.
-
max_wal_senders should be set to at least the
max_replication_slots
value plus the number of physical replicas that are connected at the same time.
Set the following parameters on the subscriber side:
-
max_replication_slots should be set to at least the number of subscriptions added to the subscriber, plus some reserve for table synchronization.
-
max_logical_replication_workers should be also set to at least the number of subscriptions added to the subscriber, plus some reserve for table synchronization.
-
max_worker_processes should include additional worker processes for the replication, at least
max_logical_replication_workers
plus 1. Note, some extensions and parallel queries also occupy slots frommax_worker_processes
. Unlike other parameters, you can set this parameter in the ADPG configurations section on the Primary configuration tab. See Configuration parameters.
You also need to check if the settings specified in the PG_HBA section on the Clusters → ADPG cluster → Services → ADPG → Primary configuration tab in ADCM UI allow the replication. These settings depend on your network configuration and the user you want to utilize for the connection.
Publication
A publication can be defined on any physical replication primary. A node that contains a publication is called a publisher. A publication is a set of changes generated from a table or a group of tables. It is also can be described as a replication set. Each publication exists in only one database.
Publications are different from schemas and do not affect the table access. You can include each table in multiple publications. Currently, publications can only contain tables. Objects must be explicitly added to them unless the publication is created for all tables with the ALL TABLES
specification.
Publications can limit the set of changes they contain. It is possible to choose any combination of the INSERT
, UPDATE
, DELETE
, and TRUNCATE
operations, similar to how triggers can fire on different event types. By default, all types of operations are replicated.
To replicate the UPDATE
and DELETE
operations, the published table must have a replica identity. It allows the system to find appropriate rows to update or delete them on the subscriber side. By default, this is the primary key. You can also specify another unique index for a replica identity. If the table does not have a suitable key, the entire row can be used as a key. To do this, set the replica identity to full
. However, this approach is very inefficient and should be used if there is no other solution. If a replication identifier other than full
is selected on the publisher side, an identifier consisting of the same or fewer columns must also be defined on the subscriber side. For more information about assigning a replica identity, see REPLICA IDENTITY. If a table without a replica identity is added to a publication, the replicated UPDATE
or DELETE
operation causes an error on the publisher. The INSERT
operations can be processed without the specified replica identity.
Every publication can have multiple subscribers.
Use the CREATE PUBLICATION command to create a publication. The following command creates a publication that publishes all changes from the books
and orders
tables:
CREATE PUBLICATION publication1 FOR TABLE books, orders;
The code below creates a publication that publishes all changes from all tables:
CREATE PUBLICATION all_tables FOR ALL TABLES;
The following publication publishes the DELETE
operations performed on the books
table:
CREATE PUBLICATION delete_publication FOR TABLE books WITH (publish = 'delete');
You can use the ALTER PUBLICATION command to change the definition of a publication.
The command below changes the delete_publication
publication to publish deletes and updates:
ALTER PUBLICATION delete_publication SET (publish = 'update, delete');
The following code adds the authors
table to publication1
:
ALTER PUBLICATION publication1 ADD TABLE authors;
Both the ADD TABLE
and DROP TABLE
operations are transactional. A table replication only starts or stops after the transaction is committed.
To remove a publication, call the DROP PUBLICATION command:
DROP PUBLICATION all_tables;
The pg_publication catalog contains all publications created in a database. The pg_publication_tables
view provides information about the mapping between publications and the tables they contain. To retrieve this data, use the following commands:
SELECT * FROM pg_publication;
SELECT * FROM pg_publication_tables;
Subscription
A subscription is the downstream side of logical replication. The node on which the subscription is defined is called the subscriber. A subscription defines the connection to another database and set of publications to which the subscriber wants to subscribe.
A subscriber database works as any other PostgreSQL instance and can be used as a publisher for other databases if its own publications are defined.
A subscriber node can have multiple subscriptions. It is possible to define multiple subscriptions between a single publisher/subscriber pair. It is necessary to ensure that the publication objects do not overlap.
Each subscription receives changes via one replication slot. Additional replication slots may be required for the initial data synchronization. They are dropped after the data synchronization is finished. For more information, see Replication slot management.
A logical replication subscription can be a standby for synchronous replication.
Subscriptions can be archived when a dump-file is created. The pg_dump utility adds subscriptions to an SQL dump file, if it is run on behalf of a superuser. Otherwise, subscriptions are skipped, because non-superusers cannot read all subscription information from the pg_subscription catalog.
Use the CREATE SUBSCRIPTION command to add a subscription:
CREATE SUBSCRIPTION subscription1
CONNECTION 'host=10.92.6.36 port=5432 user=postgres dbname=book_store password=postgres'
PUBLICATION publication1, delete_publication;
Utilize the ALTER SUBSCRIPTION command to change a subscription. The following code changes the subscribed publication to all_tables
:
ALTER SUBSCRIPTION subscription1 SET PUBLICATION all_tables;
The command below disables (stops) the subscription:
ALTER SUBSCRIPTION subscription1 DISABLE;
To remove a subscription, use DROP SUBSCRIPTION:
DROP SUBSCRIPTION subscription1;
When you drop and recreate a subscription, the synchronization information is lost. You need to resynchronize data after the recreation.
Only tables can be the target of replication. For example, you cannot replicate to a view.
The tables are matched between the publisher and the subscriber using the fully qualified table name. The replication to a table with a different name is not supported.
Table columns are also matched by name. The order of columns does not have to match. The data types of the columns do not need to match if a text representation of the data can be converted to the target type. For example, you can replicate from a column of type integer
to a column of type bigint
. The target table can also have additional columns that the published table does not contain. These columns are filled with default values specified in the target table definition.
Conflicts
The logical replication behaves similarly to DML operations, the data is updated even if it was changed locally on the subscriber node. If incoming data violates any constraints, the replication stops. This is considered as a conflict.
A conflict produces an error and stops the replication. You have to resolve it manually. You can find details about the conflict in the subscriber server log. The default path is /pg_data1/adpg14/log.
Example:
2023-01-26 17:29:06.242 UTC [14525] LOG: background worker "logical replication worker" (PID 1389) exited with exit code 1 2023-01-26 17:29:06.274 UTC [1390] ERROR: duplicate key value violates unique constraint "test1_pkey" 2023-01-26 17:29:06.274 UTC [1390] DETAIL: Key (employee_id)=(1) already exists. 2023-01-26 17:29:06.274 UTC [1390] CONTEXT: COPY test1, line 1
To resolve the conflict, change the data on the subscriber to match the incoming change, or skip the problematic transaction. To omit the transaction, call the pg_replication_origin_advance function and pass a node name corresponding to the subscription name and a position as parameters. You can find the current position of origins in the pg_replication_origin_status system view.
When replicating an UPDATE
or DELETE
operation, missing data does not produce a conflict, and such an operation is skipped.
Security
If users can modify the schema of tables on the subscriber side, they also can execute arbitrary code as a superuser. Use roles to limit ownership and the TRIGGER
privilege on these tables. Moreover, if untrusted users can create tables, use publications that list tables explicitly. Create the FOR ALL TABLES
subscription only when you trust every user permitted to create a table on the publisher or subscriber.
The role used for the replication connection must have the REPLICATION
attribute or be a superuser. If a role does not have the SUPERUSER
or BYPASSRLS
attribute, the publisher row security policies can execute. If a role does not trust all table owners, include options=-crow_security=off
in the connection string. If a table owner adds a row security policy, this setting causes replication to halt rather than execute the policy. Use the PG_HBA section on the Clusters → ADPG cluster → Services → ADPG → Primary configuration tab in ADCM UI to configure role access. A role for replication must have the LOGIN
attribute.
To copy the initial table data, the role used for the replication connection must have the SELECT
privilege on a published table or be a superuser.
To create a publication, the user must have the CREATE
privilege in the database.
To add tables to a publication, the user must have ownership rights on the table. To create a publication that publishes all tables, the user must be a superuser.
To create a subscription, the user must be a superuser.
The process of applying subscription changes is performed in the local database with superuser rights.