PostgreSQL — Logical Replication and Long running transactions

Virender Singla
6 min readDec 16, 2021

--

reference — highgo

This article summarizes my exploration of logical replication in Postgres, focusing on long-running transactions and various recovery scenarios. I conducted tests to understand how replication behaves under different conditions, such as:

  • Replicating long-running transactions.
  • Subscriber recovery after restarts.
  • Manually advancing the Logical Sequence Number (LSN) for a replication slot.

Setup:

Spin up two small RDS Instances.

Postgres RDS 11.13
Instance1: test1 — m5.large (2vCPU), 100GB GP2 volumes (300 IOPS)
Instance2: test2 — m5.large (2vCPU), 100GB GP2 volumes (300 IOPS)

I established logical replication between test database (publisher/source) and test2 database (subscriber/target) and created two tables test1 and test2 with disabled vacuum.

Publisher/Subscriber:create table test1(lsn pg_lsn, id int);alter table  test1 set (
autovacuum_enabled = false, toast.autovacuum_enabled = false
);
create table test2(lsn pg_lsn, id int);alter table test2 set (
autovacuum_enabled = false, toast.autovacuum_enabled = false
);

Case1: how replication happens for long running transaction

To begin, I’ll insert data into the publisher database and observe how it replicates to the subscriber.

Publisher:postgres=> insert into test1 select pg_current_wal_insert_lsn(),r from generate_series(1,100000000) as r;
INSERT 0 100000000
Time: 203169.104 ms (03:23.169)
Schema | Name | Type | Owner | Persistence | Size |
--------+------+-------+----------+-------------+--------
public | test1| table | postgres| permanent | 4224 MB|

After observing the replication behavior of the aforementioned transaction, I’ve summarized my findings below:

* Publisher streams data only after a transaction has been     committed.If transaction is aborted/rollback then replication logs (mined wal logs) are discarded on the Publisher and not sent to Subscriber.* Replication logs are spilled to disk in case of huge data changes on Publisher.* There is a CPU spike on Publisher (even after INSERT is completed) becuase logical decoding from wal logs and then data transfer takes CPU cycles. * Storage Space requirement: For the above case, 4GB table generated around 4GB of wal logs and then 12GB of replication logs so total 20GB storage consumed on Publisher to replicate this 4GB table.* Transaction took around 15 more minutes to appear on Subscriber after it has been committed on Publisher. This timing includes generating replication logs, transferring those logs over network and then applying those logs onto Subscriber.
CPU — Publisher
Storage — Publisher
Network Receive Throughput — Subscriber

So the next question comes up: In case of multiple transactions, in what order logical replication replicates data to the Subscriber?

Trx1: Starts at 10.00 on test1 (long running)
Trx2: Starts at 10.01 on test2
.
.
.
Trx2: Ends at 10.02
Trx1: Ends at 10.30
Outcome: Trx2 appears on Subscriber quickly whereas Trx1 takes time.

Summary:

Logical replication in Postgres maintains transaction order by replicating them on the target database in the same order they were committed on the publisher. This ensures data integrity, especially when dependencies like foreign keys exist between transactions.

Essentially, a long-running transaction won’t hinder the replication of shorter, quicker transactions. However, keep in mind that a long transaction generates a large volume of WAL logs, which can potentially strain the walsender process responsible for transmitting those logs.

Case2: How logical replication recovers after Subscriber restarts

Publisher:###truncate and reload the test table again.postgres=> truncate table test1;
TRUNCATE TABLE
postgres=> insert into test1 select pg_current_wal_insert_lsn(),r from generate_series(1,100000000) as r;
INSERT 0 100000000
Time: 213496.802 ms (03:33.497)
Schema | Name | Type | Owner | Persistence | Size |
--------+------+-------+----------+-------------+---------+------
public | test1| table | postgres | permanent | 4224 MB |

Also check for restart_lsn of the replication slot on the Publisher:

Publisher:select slot_name,plugin,slot_type,restart_lsn,confirmed_flush_lsn from pg_replication_slots; slot_name  |  plugin  | slot_type | restart_lsn | confirmed_flush_lsn
------------------+----------+-----------+-------------+------------
subscription_ec2 | pgoutput | logical | 6F/A8004C68 | 70/416E1998

What is restart_lsn? This is the start LSN of the oldest transaction which is not replicated to the target side yet. Postgres won’t delete the wal starting from this LSN as that is still required.

So once the above transaction starts applying on the Subscriber (can check using table size) and table size reached to 2000MB (means 50% apply is done), I rebooted the test2 database to see how recovery happens.

Subscriber:postgres=> \dt+ test1
List of relations
Schema | Name | Type | Owner | Persistence | Size | Description
--------+------+-------+----------+-------------+---------+
public | test1| table | postgres | permanent | 2033 MB |

After the subscriber comes up, let’s take a look at the Postgres logs.

Subscriber Logs:

recovered replication state of node 1 to 6F/A8004C68

Publisher Logs:

postgres@postgres:[9024]:LOG:   logical decoding found consistent point at 6F/A8004C68postgres@postgres:[9024]:STATEMENT:   START_REPLICATION SLOT "subscription_ec2" LOGICAL 6F/A8004C68  (proto_version '1', publication_names '"replication_ec2"')

So looking at the logs above, replication starts for the test1 table from restart_lsn (start lsn of the transaction) and replicates the aborted transaction again. This also resulted into bigger table size (until vacuum runs) so toal 2033 (dead tuples)+4224(live tuples) ~ 6GB

Subscriber:

Schema | Name | Type | Owner | Persistence | Size |
--------+------+-------+----------+-------------+---------+
public | test1| table | postgres | permanent | 6252 MB |

I observed an interesting behavior in logical replication: when the subscriber disconnects, the publisher discards all generated replication logs. Upon reconnection, the publisher has to re-analyze the WAL logs from the point the subscriber disconnected.

This suggests that logical replication lacks a checkpointing mechanism for transaction log mining. Consequently, the publisher must retain all WAL logs (tracked by restart_lsn) until the entire transaction is replicated to the subscriber.

See the dip in replication slot disk usage graph:

Replication Slot Disk Usage (MB) — Publisher

So far so good. Now let’s do the above exercise for two overlapping transactions.

Trx1: Starts at 10.00 on test1 (long running)
Trx2: Starts at 10.01 on test2
.
.
Trx2: Ends at 10.02
Trx1: Ends at 10.30
Let's wait for Trx2 to appear on the target side and then reboot the test2 database (Subscriber) while Trx1 replication is still going on after commit.

Here I was looking for lsn consistent point Postgres will choose as a recovery point. If it picks restart_lsn — 71/74005068 ( Trx1) then Trx2 data will appear twice on target database (I did not define any PK ). Again after reboot, Postgres chooses some other lsn instead of restart_lsn but replicates Trx1 data fine whereas Trx2 data was already there. So here Postgres knows what transactions have already completed and what were in progress when Subscriber crashed.

Publisher Logs:

postgres@postgres:[24505]:STATEMENT:   START_REPLICATION SLOT "subscription_ec2" LOGICAL 71/A054E198  (proto_version '1', publication_names '"replication_ec2"')
postgres@postgres:[24505]:LOG: logical decoding found consistent point at 71/74005068
postgres@postgres:[24505]:DETAIL: There are no running transactions.
postgres@postgres:[24505]:STATEMENT: START_REPLICATION SLOT "subscription_ec2" LOGICAL 71/A054E198 (proto_version '1', publication_names '"replication_ec2"')

Case3: Manually advancing LSN value for a logical replication slot

As we can disable/enable a subscription and also advance it’s lsn, I did a few tests advancing lsn value with a long running transaction going on.

alter subscription subscription_ec2 disable;SELECT pg_replication_origin_advance('pg_16545','61/380000D8');alter subscription subscription_ec2 enable;###Single Transactiondisable subscription.Trx1: Starts at 10.00 on test1Take LSN value at 10.15
select pg_current_wal_insert_lsn()
Trx1: Ends at 10.30Now advance lsn value to above noted lsnenable subscription.###Multiple Transactionsdisable subscriptionTrx1: Starts at 10.00 on test1
Trx2: Starts at 10.01 on test2
Trx2: Ends at 10.03
Take LSN value at 10.15Trx1: Ends at 10.30Now advance lsn value to above noted lsn enable subscription.

In my tests, I was able to successfully replicate transaction Trx1 by providing any LSN value that fell within the transaction’s start and end time. This indicates that Postgres effectively identifies in-progress transactions associated with a given LSN value and ensures their complete replication. Essentially, Postgres doesn’t just blindly apply logs from the specified LSN; it tracks and manages transaction boundaries to maintain data consistency.

Closing Notes: Now a million dollar question comes up that whether we should set up (safely) Subscriber based on a snapshot procedure mentioned on instacart post. However, the above tests confirm that an ongoing transaction replicates fine in case we advance lsn but I still believe that there are too many internals which one need to understand.

Do share your thoughts on this.

--

--

No responses yet