The Principle, Response to Prevention of PostgreSQL Transaction ID Exhaustion or Exhaustion Warning

By digoal

Background

Perhaps it is the last time to talk about this problem because the PostgreSQL community is changing xid from 32-bit to 64-bit. The future PG version will not cause transaction ID exhaustion due to failure to freeze in time for some reason, and the database needs to be stopped for maintenance.

However, hanging the transaction ID to 64-bit does not mean freeze operation is not needed in the future. The promotion of the commit log may still require the freeze operation. (Otherwise, all commit logs need to be kept to judge the transaction status. All transaction states are queried from the commit log, which will lead to an increase in physical IO and may lead to poor performance.)

What happens before the transaction ID exhaustion? The database will issue an alarm, just as our body will issue an alarm when we have physical problems, such as an upset stomach.

When 40 million assignable transaction IDs are left, the warning starts and the library-level vacuum freeze must be executed.
When 3 million assignable transaction IDs are left, an error is reported. The database must be stopped and restarted to enter single-user mode. Then, cluster-level vacuum freeze is executed.

Related code:

src/backend/access/transam/varsup.c

/*  
 * Allocate the next FullTransactionId for a new transaction or  
 * subtransaction.  
 *  
 * The new XID is also stored into MyProc->xid/ProcGlobal->xids[] before  
 * returning.  
 *  
 * Note: when this is called, we are actually already inside a valid  
 * transaction, since XIDs are now not allocated until the transaction  
 * does something. So it is safe to do a database lookup if we want to  
 * issue a warning about XID wrap.  
 */  
FullTransactionId  
GetNewTransactionId(bool isSubXact)  
{  
  
......  
  
        /*----------  
         * Check to see if it's safe to assign another XID. This protects against  
         * catastrophic data loss due to XID wraparound. The basic rules are:  
         *  
         * If we're past xidVacLimit, start trying to force autovacuum cycles.  
         * If we're past xidWarnLimit, start issuing warnings.  
         * If we're past xidStopLimit, refuse to execute transactions, unless  
         * we are running in single-user mode (which gives an escape hatch  
         * to the DBA who somehow got past the earlier defenses).  
         *  
         * Note that this coding also appears in GetNewMultiXactId.  
         *----------  
         */  
......  
                TransactionId xidWarnLimit = ShmemVariableCache->xidWarnLimit;  
                TransactionId xidStopLimit = ShmemVariableCache->xidStopLimit;  
                TransactionId xidWrapLimit = ShmemVariableCache->xidWrapLimit;  
                Oid                     oldest_datoid = ShmemVariableCache->oldestXidDB;  
  
......    
  
                if (IsUnderPostmaster &&  
                        TransactionIdFollowsOrEquals(xid, xidStopLimit))  
                {  
                        char       *oldest_datname = get_database_name(oldest_datoid);  
  
                        /* complain even if that DB has disappeared */  
                        if (oldest_datname)  
                                ereport(ERROR,  
                                                (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),  
                                                 errmsg("database is not accepting commands to avoid wraparound data loss in database \"%s\"",  
                                                                oldest_datname),  
                                                 errhint("Stop the postmaster and vacuum that database in single-user mode.\n"  
                                                                 "You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));  
                        else  
                                ereport(ERROR,  
                                                (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),  
                                                 errmsg("database is not accepting commands to avoid wraparound data loss in database with OID %u",  
                                                                oldest_datoid),  
                                                 errhint("Stop the postmaster and vacuum that database in single-user mode.\n"  
                                                                 "You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));  
                }  
                else if (TransactionIdFollowsOrEquals(xid, xidWarnLimit))  
                {  
                        char       *oldest_datname = get_database_name(oldest_datoid);  
  
                        /* complain even if that DB has disappeared */  
                        if (oldest_datname)  
                                ereport(WARNING,  
                                                (errmsg("database \"%s\" must be vacuumed within %u transactions",  
                                                                oldest_datname,  
                                                                xidWrapLimit - xid),  
                                                 errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"  
                                                                 "You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));  
                        else  
                                ereport(WARNING,  
                                                (errmsg("database with OID %u must be vacuumed within %u transactions",  
                                                                oldest_datoid,  
                                                                xidWrapLimit - xid),  
                                                 errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"  
                                                                 "You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));  
                }

/*  
 * VariableCache is a data structure in shared memory that is used to track  
 * OID and XID assignment state. For largely historical reasons, there is  
 * just one struct with different fields that are protected by different  
 * LWLocks.  
 *  
 * Note: xidWrapLimit and oldestXidDB are not "active" values, but are  
 * used just to generate useful messages when xidWarnLimit or xidStopLimit  
 * are exceeded.  
 */  
typedef struct VariableCacheData  
{  
...  
        /*  
         * These fields are protected by XidGenLock.  
         */  
        FullTransactionId nextXid;      /* next XID to assign */  
  
        TransactionId oldestXid;        /* cluster-wide minimum datfrozenxid */  
        TransactionId xidVacLimit;      /* start forcing autovacuums here */  
        TransactionId xidWarnLimit; /* start complaining here */  
        TransactionId xidStopLimit; /* refuse to advance nextXid beyond here */  
        TransactionId xidWrapLimit; /* where the world ends */  
        Oid                     oldestXidDB;    /* database with minimum datfrozenxid */  
...  
  
  
        /*  
         * The place where we actually get into deep trouble is halfway around  
         * from the oldest potentially-existing XID. (This calculation is  
         * probably off by one or two counts, because the special XIDs reduce the  
         * size of the loop a little bit. But we throw in plenty of slop below,  
         * so it doesn't matter.)  
         */  
        xidWrapLimit = oldest_datfrozenxid + (MaxTransactionId >> 1);  
        if (xidWrapLimit < FirstNormalTransactionId)  
                xidWrapLimit += FirstNormalTransactionId;  
  
        /*  
         * We'll refuse to continue assigning XIDs in interactive mode once we get  
         * within 3M transactions of data loss. This leaves lots of room for the  
         * DBA to fool around fixing things in a standalone backend, while not  
         * being significant compared to total XID space. (VACUUM requires an XID  
         * if it truncates at wal_level!=minimal. "VACUUM (ANALYZE)", which a DBA  
         * might do by reflex, assigns an XID. Hence, we had better be sure  
         * there's lots of XIDs left...)  Also, at default BLCKSZ, this leaves two  
         * completely-idle segments. In the event of edge-case bugs involving  
         * page or segment arithmetic, idle segments render the bugs unreachable  
         * outside of single-user mode.  
         */  
        xidStopLimit = xidWrapLimit - 3000000;  
        if (xidStopLimit < FirstNormalTransactionId)  
                xidStopLimit -= FirstNormalTransactionId;  
  
        /*  
         * We'll start complaining loudly when we get within 40M transactions of  
         * data loss. This is kind of arbitrary, but if you let your gas gauge  
         * get down to 2% of full, would you be looking for the next gas station?  
         * We need to be fairly liberal about this number because there are lots  
         * of scenarios where most transactions are done by automatic clients that  
         * won't pay attention to warnings. (No, we're not gonna make this  
         * configurable. If you know enough to configure it, you know enough to  
         * not get in this kind of trouble in the first place.)  
         */  
        xidWarnLimit = xidWrapLimit - 40000000;  
        if (xidWarnLimit < FirstNormalTransactionId)  
                xidWarnLimit -= FirstNormalTransactionId;  
  
......

Analysis of Principle and Problem

1. Where does the transaction ID exist?

tuple head

2. What can you do with the help of a transaction ID?

Combine commit logs and transaction snapshots to determine tuple's visibility into other active sessions
The Sequence of Transaction Allocation

3. Why do we need to freeze the transaction ID?

The transaction ID is only 32 bits (The upper limit of the value is about four billion) and needs to be recycled, which can be understood as the need to reduce the water level of the used xid. (Explanation: Distribute four billion xids on a circle and split them in half with the frozen xid as the dividing point (half is two billion). In the half circle clockwise from the frozen xid are the assignable or consumed transaction IDs. With the allocation of transaction ID, the semicircle is gradually consumed. The operation of freeze is performed on the allocated transactions. The purpose of freeze is to erase the xid (make the corresponding tuple of these transactions visible to everyone), thereby moving the frozen xid on the circle to keep the allocatable transaction ID in the semicircle at all times.)
The tuple whose transaction ID is frozen is visible to all transactions. Whether the xid information of the tuple can be frozen depends on whether it is still useful (determine the visibility).

4. The Cause of Transaction ID Exhaustion

4.1 The transaction ID is not frozen in time.

High concurrency and small transactions consume transaction numbers quickly. In very extreme cases, the freezing speed may not keep up with the writing speed. (For example, the hardware performance is poor, but the asynchronous mode is turned on for writing transactions. This makes transaction writing very fast, but freezing is slow.)
- We recommend selecting storage with low latency and high bandwidth.
The table is too large, and the design is unreasonable. Since a single table (If it is a partition table, it refers to the partition size with the smallest granularity) is too large and freeze does not support parallel processing of single tables for the time being, so the processing speed may be relatively slow.
- We recommend choosing the partition table.
Improper parameter settings, excessive sleep during the garbage collection, and not enough garbage collection processes result in slow freezes.
- autovacuum_max_workers is too small
- autovacuum_vacuum_cost_delay is too high
Automatic recycling is not enabled. (Note: This parameter can be set globally or at the table level.)
- autovacuum=off
- At this time, the freeze will only occur when the age of the table exceeds autovacuum_freeze_max_age, and the transaction ID is forcibly frozen. At this time, the table may be old, and there are not many xids left to allocate, so there is a greater possibility that freeze cannot be completed until xid is exhausted.
Poor Hardware Performance (high I/O latency and low throughput) and Slow Recycling
- We recommend selecting storage with low latency and high bandwidth.
The memory configuration is unreasonable, and the large table index is scanned many times. In the case of a large amount of garbage in a single table, too many row numbers of the garbage record exceed the memory of autovacuum_work_mem, which makes the index need to be scanned many times, affecting the processing speed.
- Increase the autovacuum_work_mem or use the partition table to solve it. (Please visit this link)

4.2 There are transaction IDs that cannot be frozen.

In the case of a primary library or a read-only secondary library with hot_standby_feedback enabled, the following conditions may affect the freeze operation to erase certain transaction IDs:

If a long transaction exists, check the pg_stat_activity : backend_xid, backend_xmin, xact_start, query_start
If an unterminated 2pc exists, check the pg_prepared_xacts : prepared , database
If a long sql statement exists, check the pg_stat_activity : backend_xid , backend_xmin , xact_start , query_start

The freeze operation may be affected to erase certain transaction IDs when the following conditions exist in the primary database:

If a replication slot that is not normally consumed exists, check the pg_replication_slots. Since the logical subscription needs to parse data logic changes from wal and use the corresponding catalog metadata (such as data defined by structure), if it is not consumed or the consumption is slow, the catalog table keeps the older version, hindering the freeze operation of the catalog.
The delay recycling parameter-vacuum_defer_cleanup_age is set. Some junk versions are delayed to be retained forcibly.

How to Respond to Emergencies and Errors of Transaction ID Exhaustion

Suppose your database no longer allows transactions to be executed and reports an error:

"database is not accepting commands to avoid wraparound data loss in database \"%s\"",  
        oldest_datname),  
("Stop the postmaster and vacuum that database in single-user mode.\n"  
         "You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));

You must shift the database to single-user mode to repair.

If you want to restore the use of the database as quickly as possible, you can not freeze the entire cluster.

Shift to the single user mode, find the oldest object (table, materialized view, including system table), and freeze them in turn. After the age of the cluster decreases, the database can be restarted, and then freeze is executed in normal mode.

Sort Method:

First, find the library with the largest age.

age(pg_database.datfrozenxid), age(pg_database.datminmxid)

Enter the corresponding database and find the table or materialized view with the largest age.

age(pg_class.relfrozenxid), age(pg_class.relminmxid)

Then, perform vacuum freeze to the corresponding TABLE.

Now, you can check whether the remaining allocable transactions in the database are greater than three million.

How to enter single-user mode:

https://fluca1978.github.io/2021/01/03/PostgreSQLSingleUserModeP.html
https://github.com/digoal/blog/blob/master/202101/20210104_01.md

If you try all the methods above, the age of the database still cannot be decreased. It is necessary to analyze whether it is caused by the unfinished prepare transaction or zombie slot in the cluster. If so, the prepare transaction must be terminated, or the slot must be deleted (or advance slot displacement). Then, repeat the operation above.

Slot processing:

pg_drop_replication_slot(name) Delete a slot  
pg_replication_slot_advance(slot_name name, upto_lsn pg_lsn) Advance slot displacement

2pc transaction processing:

ROLLBACK PREPARED transaction_id;  
COMMIT PREPARED transaction_id;

How to Prevent Transaction ID Exhaustion

1. Hardware

Improve Hardware Performance:

Larger Memory
Storage Hardware with Low RT and High Bandwidth
Multiple CPU Cores

2. Software

Set the snapshot to old parameter-old_snapshot_threshold to eliminate the problem where transaction IDs cannot be frozen. Note: It may cause side effects on performance and damages to long transactions or 2pc transactions. This method is invalid for problems caused by the slot.
Use the partition table to shorten the freeze time of the large table.
Partition, set the autovacuum_freeze_max_age of each partition, and stagger the freeze operation storm.
Increase the autovacuum_work_mem memory configuration to avoid the need to scan indexes many times when the garbage of large tables is too much.
Increase the autovacuum_max_workers, but not too much, or the normal business may be affected.
Increase the working frequency of the vacuum and reduce rest, which involves autovacuum_vacuum_cost_delay, autovacuum_vacuum_cost_limit, and other parameters.
Monitor the age, long SQL, long transactions, 2PC, and zombie SLOT of the primary and secondary libraries and find hidden dangers as early as possible.
Batch submission is adopted to reduce the consumption of transaction IDs.

Community

The Principle, Response to Prevention of PostgreSQL Transaction ID Exhaustion or Exhaustion Warning

Background

Analysis of Principle and Problem

How to Respond to Emergencies and Errors of Transaction ID Exhaustion

How to Prevent Transaction ID Exhaustion

1. Hardware

2. Software

Read previous post:

Read next post:

digoal

You may also like

Comments

digoal

Related Products

PolarDB for PostgreSQL

AnalyticDB for PostgreSQL

SOFAStack™

ApsaraDB RDS for PostgreSQL