Skip to content

Commit fc8e54f

Browse files
committed
Fix Hot-Standby initialization of clog and subtrans.
These bugs can cause data loss on standbys started with hot_standby=on at the moment they start to accept read only queries, by marking committed transactions as uncommited. The likelihood of such corruptions is small unless the primary has a high transaction rate. 5a031a5 fixed bugs in HS's startup logic by maintaining less state until at least STANDBY_SNAPSHOT_PENDING state was reached, missing the fact that both clog and subtrans are written to before that. This only failed to fail in common cases because the usage of ExtendCLOG in procarray.c was superflous since clog extensions are actually WAL logged. f44eedc/I then tried to fix the missing extensions of pg_subtrans due to the former commit's changes - which are not WAL logged - by performing the extensions when switching to a state > STANDBY_INITIALIZED and not performing xid assignments before that - again missing the fact that ExtendCLOG is unneccessary - but screwed up twice: Once because latestObservedXid wasn't updated anymore in that state due to the earlier commit and once by having an off-by-one error in the loop performing extensions. This means that whenever a CLOG_XACTS_PER_PAGE (32768 with default settings) boundary was crossed between the start of the checkpoint recovery started from and the first xl_running_xact record old transactions commit bits in pg_clog could be overwritten if they started and committed in that window. Fix this mess by not performing ExtendCLOG() in HS at all anymore since it's unneeded and evidently dangerous and by performing subtrans extensions even before reaching STANDBY_SNAPSHOT_PENDING. Analysis and patch by Andres Freund. Reported by Christophe Pettus. Backpatch down to 9.0, like the previous commit that caused this.
1 parent e3a02a3 commit fc8e54f

File tree

2 files changed

+41
-29
lines changed

2 files changed

+41
-29
lines changed

src/backend/access/transam/clog.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -593,7 +593,7 @@ ExtendCLOG(TransactionId newestXact)
593593
LWLockAcquire(CLogControlLock, LW_EXCLUSIVE);
594594

595595
/* Zero the page and make an XLOG entry about it */
596-
ZeroCLOGPage(pageno, !InRecovery);
596+
ZeroCLOGPage(pageno, true);
597597

598598
LWLockRelease(CLogControlLock);
599599
}

src/backend/storage/ipc/procarray.c

Lines changed: 40 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -439,7 +439,7 @@ ProcArrayClearTransaction(PGPROC *proc)
439439
* ProcArrayInitRecovery -- initialize recovery xid mgmt environment
440440
*
441441
* Remember up to where the startup process initialized the CLOG and subtrans
442-
* so we can ensure its initialized gaplessly up to the point where necessary
442+
* so we can ensure it's initialized gaplessly up to the point where necessary
443443
* while in recovery.
444444
*/
445445
void
@@ -449,9 +449,10 @@ ProcArrayInitRecovery(TransactionId initializedUptoXID)
449449
Assert(TransactionIdIsNormal(initializedUptoXID));
450450

451451
/*
452-
* we set latestObservedXid to the xid SUBTRANS has been initialized upto
453-
* so we can extend it from that point onwards when we reach a consistent
454-
* state in ProcArrayApplyRecoveryInfo().
452+
* we set latestObservedXid to the xid SUBTRANS has been initialized upto,
453+
* so we can extend it from that point onwards in
454+
* RecordKnownAssignedTransactionIds, and when we get consistent in
455+
* ProcArrayApplyRecoveryInfo().
455456
*/
456457
latestObservedXid = initializedUptoXID;
457458
TransactionIdRetreat(latestObservedXid);
@@ -620,17 +621,23 @@ ProcArrayApplyRecoveryInfo(RunningTransactions running)
620621
pfree(xids);
621622

622623
/*
623-
* latestObservedXid is set to the the point where SUBTRANS was started up
624-
* to, initialize subtrans from thereon, up to nextXid - 1.
624+
* latestObservedXid is at least set to the the point where SUBTRANS was
625+
* started up to (c.f. ProcArrayInitRecovery()) or to the biggest xid
626+
* RecordKnownAssignedTransactionIds() was called for. Initialize
627+
* subtrans from thereon, up to nextXid - 1.
628+
*
629+
* We need to duplicate parts of RecordKnownAssignedTransactionId() here,
630+
* because we've just added xids to the known assigned xids machinery that
631+
* haven't gone through RecordKnownAssignedTransactionId().
625632
*/
626633
Assert(TransactionIdIsNormal(latestObservedXid));
634+
TransactionIdAdvance(latestObservedXid);
627635
while (TransactionIdPrecedes(latestObservedXid, running->nextXid))
628636
{
629-
ExtendCLOG(latestObservedXid);
630637
ExtendSUBTRANS(latestObservedXid);
631-
632638
TransactionIdAdvance(latestObservedXid);
633639
}
640+
TransactionIdRetreat(latestObservedXid); /* = running->nextXid - 1 */
634641

635642
/* ----------
636643
* Now we've got the running xids we need to set the global values that
@@ -704,10 +711,6 @@ ProcArrayApplyXidAssignment(TransactionId topxid,
704711

705712
Assert(standbyState >= STANDBY_INITIALIZED);
706713

707-
/* can't do anything useful unless we have more state setup */
708-
if (standbyState == STANDBY_INITIALIZED)
709-
return;
710-
711714
max_xid = TransactionIdLatest(topxid, nsubxids, subxids);
712715

713716
/*
@@ -734,6 +737,10 @@ ProcArrayApplyXidAssignment(TransactionId topxid,
734737
for (i = 0; i < nsubxids; i++)
735738
SubTransSetParent(subxids[i], topxid, false);
736739

740+
/* KnownAssignedXids isn't maintained yet, so we're done for now */
741+
if (standbyState == STANDBY_INITIALIZED)
742+
return;
743+
737744
/*
738745
* Uses same locking as transaction commit
739746
*/
@@ -2454,18 +2461,11 @@ RecordKnownAssignedTransactionIds(TransactionId xid)
24542461
{
24552462
Assert(standbyState >= STANDBY_INITIALIZED);
24562463
Assert(TransactionIdIsValid(xid));
2464+
Assert(TransactionIdIsValid(latestObservedXid));
24572465

24582466
elog(trace_recovery(DEBUG4), "record known xact %u latestObservedXid %u",
24592467
xid, latestObservedXid);
24602468

2461-
/*
2462-
* If the KnownAssignedXids machinery isn't up yet, do nothing.
2463-
*/
2464-
if (standbyState <= STANDBY_INITIALIZED)
2465-
return;
2466-
2467-
Assert(TransactionIdIsValid(latestObservedXid));
2468-
24692469
/*
24702470
* When a newly observed xid arrives, it is frequently the case that it is
24712471
* *not* the next xid in sequence. When this occurs, we must treat the
@@ -2476,22 +2476,34 @@ RecordKnownAssignedTransactionIds(TransactionId xid)
24762476
TransactionId next_expected_xid;
24772477

24782478
/*
2479-
* Extend clog and subtrans like we do in GetNewTransactionId() during
2480-
* normal operation using individual extend steps. Typical case
2481-
* requires almost no activity.
2479+
* Extend subtrans like we do in GetNewTransactionId() during normal
2480+
* operation using individual extend steps. Note that we do not need
2481+
* to extend clog since its extensions are WAL logged.
2482+
*
2483+
* This part has to be done regardless of standbyState since we
2484+
* immediately start assigning subtransactions to their toplevel
2485+
* transactions.
24822486
*/
24832487
next_expected_xid = latestObservedXid;
2484-
TransactionIdAdvance(next_expected_xid);
2485-
while (TransactionIdPrecedesOrEquals(next_expected_xid, xid))
2488+
while (TransactionIdPrecedes(next_expected_xid, xid))
24862489
{
2487-
ExtendCLOG(next_expected_xid);
2490+
TransactionIdAdvance(next_expected_xid);
24882491
ExtendSUBTRANS(next_expected_xid);
2492+
}
2493+
Assert(next_expected_xid == xid);
24892494

2490-
TransactionIdAdvance(next_expected_xid);
2495+
/*
2496+
* If the KnownAssignedXids machinery isn't up yet, there's nothing
2497+
* more to do since we don't track assigned xids yet.
2498+
*/
2499+
if (standbyState <= STANDBY_INITIALIZED)
2500+
{
2501+
latestObservedXid = xid;
2502+
return;
24912503
}
24922504

24932505
/*
2494-
* Add the new xids onto the KnownAssignedXids array.
2506+
* Add (latestObservedXid, xid] onto the KnownAssignedXids array.
24952507
*/
24962508
next_expected_xid = latestObservedXid;
24972509
TransactionIdAdvance(next_expected_xid);

0 commit comments

Comments
 (0)