By digoal
Perhaps it is the last time to talk about this problem because the PostgreSQL community is changing xid from 32-bit to 64-bit. The future PG version will not cause transaction ID exhaustion due to failure to freeze in time for some reason, and the database needs to be stopped for maintenance.
However, hanging the transaction ID to 64-bit does not mean freeze operation is not needed in the future. The promotion of the commit log may still require the freeze operation. (Otherwise, all commit logs need to be kept to judge the transaction status. All transaction states are queried from the commit log, which will lead to an increase in physical IO and may lead to poor performance.)
What happens before the transaction ID exhaustion? The database will issue an alarm, just as our body will issue an alarm when we have physical problems, such as an upset stomach.
Related code:
src/backend/access/transam/varsup.c
/*
* Allocate the next FullTransactionId for a new transaction or
* subtransaction.
*
* The new XID is also stored into MyProc->xid/ProcGlobal->xids[] before
* returning.
*
* Note: when this is called, we are actually already inside a valid
* transaction, since XIDs are now not allocated until the transaction
* does something. So it is safe to do a database lookup if we want to
* issue a warning about XID wrap.
*/
FullTransactionId
GetNewTransactionId(bool isSubXact)
{
......
/*----------
* Check to see if it's safe to assign another XID. This protects against
* catastrophic data loss due to XID wraparound. The basic rules are:
*
* If we're past xidVacLimit, start trying to force autovacuum cycles.
* If we're past xidWarnLimit, start issuing warnings.
* If we're past xidStopLimit, refuse to execute transactions, unless
* we are running in single-user mode (which gives an escape hatch
* to the DBA who somehow got past the earlier defenses).
*
* Note that this coding also appears in GetNewMultiXactId.
*----------
*/
......
TransactionId xidWarnLimit = ShmemVariableCache->xidWarnLimit;
TransactionId xidStopLimit = ShmemVariableCache->xidStopLimit;
TransactionId xidWrapLimit = ShmemVariableCache->xidWrapLimit;
Oid oldest_datoid = ShmemVariableCache->oldestXidDB;
......
if (IsUnderPostmaster &&
TransactionIdFollowsOrEquals(xid, xidStopLimit))
{
char *oldest_datname = get_database_name(oldest_datoid);
/* complain even if that DB has disappeared */
if (oldest_datname)
ereport(ERROR,
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
errmsg("database is not accepting commands to avoid wraparound data loss in database \"%s\"",
oldest_datname),
errhint("Stop the postmaster and vacuum that database in single-user mode.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
else
ereport(ERROR,
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
errmsg("database is not accepting commands to avoid wraparound data loss in database with OID %u",
oldest_datoid),
errhint("Stop the postmaster and vacuum that database in single-user mode.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
}
else if (TransactionIdFollowsOrEquals(xid, xidWarnLimit))
{
char *oldest_datname = get_database_name(oldest_datoid);
/* complain even if that DB has disappeared */
if (oldest_datname)
ereport(WARNING,
(errmsg("database \"%s\" must be vacuumed within %u transactions",
oldest_datname,
xidWrapLimit - xid),
errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
else
ereport(WARNING,
(errmsg("database with OID %u must be vacuumed within %u transactions",
oldest_datoid,
xidWrapLimit - xid),
errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
}
/*
* VariableCache is a data structure in shared memory that is used to track
* OID and XID assignment state. For largely historical reasons, there is
* just one struct with different fields that are protected by different
* LWLocks.
*
* Note: xidWrapLimit and oldestXidDB are not "active" values, but are
* used just to generate useful messages when xidWarnLimit or xidStopLimit
* are exceeded.
*/
typedef struct VariableCacheData
{
...
/*
* These fields are protected by XidGenLock.
*/
FullTransactionId nextXid; /* next XID to assign */
TransactionId oldestXid; /* cluster-wide minimum datfrozenxid */
TransactionId xidVacLimit; /* start forcing autovacuums here */
TransactionId xidWarnLimit; /* start complaining here */
TransactionId xidStopLimit; /* refuse to advance nextXid beyond here */
TransactionId xidWrapLimit; /* where the world ends */
Oid oldestXidDB; /* database with minimum datfrozenxid */
...
/*
* The place where we actually get into deep trouble is halfway around
* from the oldest potentially-existing XID. (This calculation is
* probably off by one or two counts, because the special XIDs reduce the
* size of the loop a little bit. But we throw in plenty of slop below,
* so it doesn't matter.)
*/
xidWrapLimit = oldest_datfrozenxid + (MaxTransactionId >> 1);
if (xidWrapLimit < FirstNormalTransactionId)
xidWrapLimit += FirstNormalTransactionId;
/*
* We'll refuse to continue assigning XIDs in interactive mode once we get
* within 3M transactions of data loss. This leaves lots of room for the
* DBA to fool around fixing things in a standalone backend, while not
* being significant compared to total XID space. (VACUUM requires an XID
* if it truncates at wal_level!=minimal. "VACUUM (ANALYZE)", which a DBA
* might do by reflex, assigns an XID. Hence, we had better be sure
* there's lots of XIDs left...) Also, at default BLCKSZ, this leaves two
* completely-idle segments. In the event of edge-case bugs involving
* page or segment arithmetic, idle segments render the bugs unreachable
* outside of single-user mode.
*/
xidStopLimit = xidWrapLimit - 3000000;
if (xidStopLimit < FirstNormalTransactionId)
xidStopLimit -= FirstNormalTransactionId;
/*
* We'll start complaining loudly when we get within 40M transactions of
* data loss. This is kind of arbitrary, but if you let your gas gauge
* get down to 2% of full, would you be looking for the next gas station?
* We need to be fairly liberal about this number because there are lots
* of scenarios where most transactions are done by automatic clients that
* won't pay attention to warnings. (No, we're not gonna make this
* configurable. If you know enough to configure it, you know enough to
* not get in this kind of trouble in the first place.)
*/
xidWarnLimit = xidWrapLimit - 40000000;
if (xidWarnLimit < FirstNormalTransactionId)
xidWarnLimit -= FirstNormalTransactionId;
......
1. Where does the transaction ID exist?
2. What can you do with the help of a transaction ID?
3. Why do we need to freeze the transaction ID?
4. The Cause of Transaction ID Exhaustion
4.1 The transaction ID is not frozen in time.
High concurrency and small transactions consume transaction numbers quickly. In very extreme cases, the freezing speed may not keep up with the writing speed. (For example, the hardware performance is poor, but the asynchronous mode is turned on for writing transactions. This makes transaction writing very fast, but freezing is slow.)
The table is too large, and the design is unreasonable. Since a single table (If it is a partition table, it refers to the partition size with the smallest granularity) is too large and freeze does not support parallel processing of single tables for the time being, so the processing speed may be relatively slow.
Improper parameter settings, excessive sleep during the garbage collection, and not enough garbage collection processes result in slow freezes.
Automatic recycling is not enabled. (Note: This parameter can be set globally or at the table level.)
Poor Hardware Performance (high I/O latency and low throughput) and Slow Recycling
The memory configuration is unreasonable, and the large table index is scanned many times. In the case of a large amount of garbage in a single table, too many row numbers of the garbage record exceed the memory of autovacuum_work_mem, which makes the index need to be scanned many times, affecting the processing speed.
4.2 There are transaction IDs that cannot be frozen.
In the case of a primary library or a read-only secondary library with hot_standby_feedback enabled, the following conditions may affect the freeze operation to erase certain transaction IDs:
The freeze operation may be affected to erase certain transaction IDs when the following conditions exist in the primary database:
Suppose your database no longer allows transactions to be executed and reports an error:
"database is not accepting commands to avoid wraparound data loss in database \"%s\"",
oldest_datname),
("Stop the postmaster and vacuum that database in single-user mode.\n"
"You might also need to commit or roll back old prepared transactions, or drop stale replication slots.")));
You must shift the database to single-user mode to repair.
If you want to restore the use of the database as quickly as possible, you can not freeze the entire cluster.
Shift to the single user mode, find the oldest object (table, materialized view, including system table), and freeze them in turn. After the age of the cluster decreases, the database can be restarted, and then freeze is executed in normal mode.
Sort Method:
First, find the library with the largest age.
age(pg_database.datfrozenxid), age(pg_database.datminmxid)
Enter the corresponding database and find the table or materialized view with the largest age.
age(pg_class.relfrozenxid), age(pg_class.relminmxid)
Then, perform vacuum freeze to the corresponding TABLE.
Now, you can check whether the remaining allocable transactions in the database are greater than three million.
How to enter single-user mode:
https://fluca1978.github.io/2021/01/03/PostgreSQLSingleUserModeP.html
https://github.com/digoal/blog/blob/master/202101/20210104_01.md
If you try all the methods above, the age of the database still cannot be decreased. It is necessary to analyze whether it is caused by the unfinished prepare transaction or zombie slot in the cluster. If so, the prepare transaction must be terminated, or the slot must be deleted (or advance slot displacement). Then, repeat the operation above.
Slot processing:
pg_drop_replication_slot(name) Delete a slot
pg_replication_slot_advance(slot_name name, upto_lsn pg_lsn) Advance slot displacement
2pc transaction processing:
ROLLBACK PREPARED transaction_id;
COMMIT PREPARED transaction_id;
Improve Hardware Performance:
autovacuum_vacuum_cost_delay, autovacuum_vacuum_cost_limit
, and other parameters.Alibaba Clouder - July 16, 2021
ApsaraDB - June 27, 2022
Alibaba Clouder - January 20, 2021
Alibaba Clouder - March 15, 2017
Alibaba Cloud New Products - June 1, 2020
Alibaba Clouder - April 13, 2021
Alibaba Cloud PolarDB for PostgreSQL is an in-house relational database service 100% compatible with PostgreSQL and highly compatible with the Oracle syntax.
Learn MoreAn online MPP warehousing service based on the Greenplum Database open source program
Learn MoreA one-stop, cloud-native platform that allows financial enterprises to develop and maintain highly available applications that use a distributed architecture.
Learn MoreAn on-demand database hosting service for PostgreSQL with automated monitoring, backup and disaster recovery capabilities
Learn MoreMore Posts by digoal