This topic describes how to clear the tablespaces of an ApsaraDB RDS for PostgreSQL instance by using the pg_repack extension. If you perform a large number of operations, such as UPDATE, on an entire table, a table bloat may occur. In this case, you can use this extension to mitigate the table bloat. When you use this extension to handle a table bloat issue, this extension does not need to hold an exclusive lock on the bloated table. This extension is more lightweight than the CLUSTER and VACUUM FULL statements.
Prerequisites
The RDS instance runs PostgreSQL 10.0 or a later version.
The RDS instance runs a minor engine version of 20240229 or later. For more information about how to update the minor engine version, see Update the minor engine version.
NoteIf the RDS instance runs a minor engine version earlier than 20240229, the error message
ERROR: must be superuser to use xxx function
is displayed when you use the extension.
Usage notes
The pg_repack extension requires additional storage. A full-table repack requires the available storage to be at least twice the size of the table that you want to repack.
The pg_repack extension cannot remove bloat from temporary tables.
The pg_repack extension cannot remove bloat from GiST indexes.
When a table is being repacked by the pg_repack extension, you cannot execute DDL statements on the table. The pg_repack extension holds an ACCESS SHARE lock on the bloated table to prohibit the execution of DDL statements on the bloated table.
The pg_repack extension consumes a large number of disk I/O resources to create tables and indexes. Before you use the pg_repack extension, you must evaluate whether the repack operation can interrupt your workloads. For example, if your RDS instance uses enhanced SSDs (ESSDs) of performance level 1 (PL1) and you want to repack a 100-GB table, the IO throughput of the RDS instance can reach the maximum IO throughput of 250 MB/s.
Feature description
The pg_repack extension supports full-table repack and index repack.
The following procedure shows how the extension repacks an entire table:
Creates a log table. The log table is used to record the changes that are made to the original table during the repack process.
Creates triggers on the original table. The triggers are used to record the INSERT, UPDATE, and DELETE statements that are executed on the original table and insert the records into the log table during the repack process.
Creates a table. The new table has the same rows and columns as the original table.
Creates indexes in the new table.
Applies the data changes in the log table to the new table.
Switches the original and new tables in the system catalog.
Deletes the original table.
NoteThe pg_repack extension holds the ACCESS EXCLUSIVE lock to prohibit operations on the original table in Steps a, b, f, and g. In other steps, the extension holds the ACCESS SHARE lock. The lock does not prohibit the execution of INSERT, UPDATE, and DELETE statements on the original table.
The following procedure shows how the extension repacks the indexes on a table:
Concurrently creates indexes.
Switches the original and new indexes in the system catalog.
Deletes the original indexes.
Enable or disable the pg_repack extension
You must use the privileged account to execute the statements in this section.
Enable the pg_repack extension.
CREATE EXTENSION pg_repack;
Disable the pg_repack extension.
DROP EXTENSION pg_repack;
Install the client utility of the pg_repack extension
You must install the client utility of the pg_repack extension. If your Elastic Compute Service (ECS) instance runs Alibaba Cloud Linux 3.2104 or later, perform the following steps to install the client utility:
Install environment dependencies.
sudo yum install postgresql* redhat-rpm-config libpq* openssl-devel readline-devel -y
Add environment variables.
export PATH=$PATH:/usr/lib64/pgsql/postgresql-12/bin
NoteThe path and PostgreSQL version in the preceding command are for reference only. Replace them with actual values in your business.
Download the client utility, compile the client utility, and then install the client utility.
wget https://github.com/reorg/pg_repack/archive/refs/tags/ver_1.4.6.tar.gz tar zxvf ver_1.4.6.tar.gz cd pg_repack-ver_1.4.6 make && make install
Examples
-- Check but do not perform the repack operation: --dry-run
pg_repack --dry-run --no-superuser-check --echo --no-order -h Endpoint -p Port -d database1 -U user --table schema1.table1
Check and perform the repack operation:
pg_repack --no-superuser-check --echo --no-order -h Endpoint -p Port -d database1 -U user --table schema1.table1
FAQ
What do I do if the "ERROR: pg_repack failed with error: You must be a superuser to use pg_repack" message is returned?
Use the -k or --no-superuser-check option to skip the superuser permission check. This way, you can prevent this type of permission error.
When you repack a full table, you cannot use this method to obtain permissions.
References
For more information about the pg_repack extension, see Reorganize tables in PostgreSQL databases with minimal locks.