PolarDB for PostgreSQL (Compatible with Oracle) allows you to use the pg_repack extension to defragment tablespaces. This solves the issue of table bloating caused by frequent update and delete operations on tables. The pg_repack extension holds an exclusive lock on a bloated table for a short period of time and does not block read or write operations most of the time. The pg_repack extension is more lightweight than the CLUSTER and VACUUM FULL statements.
Prerequisites
The PolarDB for PostgreSQL (Compatible with Oracle) cluster is of one of the following versions:
The database engine is PolarDB for Oracle 2.0 and the revision version is 2.0.14.22.0 or later.
You can execute the following statement to query the revision version of a PolarDB for PostgreSQL (Compatible with Oracle) cluster:
SHOW polar_version;
Usage notes
You cannot use the
-a
or--all
option to repack all databases.You cannot use the pg_repack extension to repack a database. You must use the
--table
,--parent-table
, or--index
option to specify the table or index that you want to repack.You cannot use the
-c
or-schema
option to repack a schema.PolarDB for PostgreSQL (Compatible with Oracle) does not allow you to create tablespaces. Instead, you can use the default tablespaces. Therefore, you cannot use the
-s
or--tablespace
option to repack a specific tablespace.The pg_repack extension requires additional temporary storage space to hold new tables and log tables generated during the repack process. Therefore, the available storage space in the cluster must be at least twice the size of the tables to be repacked.
The pg_repack extension occupies a large amount of disk I/O resources. Before you use the extension, you must evaluate whether the repack operation may interrupt your business. For example, if your cluster uses PL1 Enterprise SSDs (ESSDs) and you want to repack a 100-GB table, the I/O throughput of the cluster can reach the maximum I/O throughput of 250 MB/s.
Usage
The pg_repack extension installed on a PolarDB for PostgreSQL (Compatible with Oracle) cluster acts as a server, and provides a dedicated client. You must use the server together with the client.
Install the extension
CREATE EXTENSION pg_repack;
Check the extension version.
SELECT extversion FROM pg_extension WHERE extname = 'pg_repack';
If you have installed an earlier version of the extension, you can update the extension to the latest version by uninstalling the original version and then re-creating the extension.
DROP EXTENSION pg_repack;
CREATE EXTENSION pg_repack;
If the pg_repack extension is not installed on your cluster, the error ERROR:pg_repack failed with error: pg_repack x.x.x is not installed in the database
is reported when you run pg_repack commands on the client.
Install the client of the pg_repack extension
The client of the pg_repack extension is released together with the PolarDBTools toolkit. After you download and install PolarDBTools toolkit, you can use the client of the extension in the toolkit. For more information about the download URLs and installation method of PolarDBTools, see PolarDBTools.
The client version of the pg_repack extension must match the extension version. Otherwise, the client returns one of the following errors:
ERROR:pg_repack failed with error: extension 'pg_repack x.x.x' required, found 'pg_repack y.y.y'; please drop and re-create the extension
orERROR:pg_repack failed with error: program 'pg_repack x.x.x' does not match database library 'pg_repack y.y.y'
.Download the PolarDBTools toolkit that matches the version of your PolarDB cluster by using the preceding URL.
Repack standard tables and table partitions
The pg_repack extension allows you to repack a standard table or a partition of a partitioned table, achieving a similar outcome to the CLUSTER
or VACUUM FULL
statement. The repack process reclaims the free space in tables and rebuilds their indexes. This makes the pg_repack extension suitable for bloated tables.
The table that you repack must have a primary key or a unique index.
Temporary tables cannot be repacked.
Partitions that have
global indexes
cannot be repacked.
Syntax description 1
You can use the
--table
option to specify the table to repack. The following command is equivalent to theCLUSTER
statement. During the repack process, data in the specified table is ordered by the column on which aCLUSTER
operation is performed.pg_repack -h <host> -p <port> -d <db> -U <user> --no-superuser-check --echo --table schema1.table1
If you want to order data by the specified column, you can use the
--order-by
option to specify the column name.pg_repack -h <host> -p <port> -d <db> -U <user> --no-superuser-check --echo --table schema1.table1 --order-by <Column name>
If you do not want to order data, you can use the
--no-order
option. The following command is equivalent to theVACUUM FULL
statement.pg_repack -h <host> -p <port> -d <db> -U <user> --no-superuser-check --echo --table schema1.table1 --no-order
If the CPU and I/O resources of the cluster are sufficient, you can use the
--jobs
option to accelerate the repack process. In this case, the extension starts multiple processes to concurrently rebuild indexes. This is suitable for tables that have multiple indexes.pg_repack -h <host> -p <port> -d <db> -U <user> --no-superuser-check --echo --table schema1.table1 --jobs <Concurrency>
Repack partitioned tables and inherited tables
The pg_repack extension supports operations on partitioned tables created by using declarative partitioning or table inheritance. The extension automatically identifies all partitions in the parent table and repacks each partition in turn. This is suitable for partitioned tables in which all partitions are bloated.
Syntax description 2
You can use the
--parent-table
option to specify the name of the partitioned table to repack.pg_repack -h <host> -p <port> -d <db> -U <user> --no-superuser-check --echo --parent-table schema1.table1
NoteExcept for the
--parent-table
option, the usage of the other options on partitioned tables is basically the same as that on standard tables.If only one partition is bloated, you do not need to repack the entire partitioned table. You can use the
--table
option to repack only the bloated partition as described in the Syntax description 1 section of this topic.Partitioned tables that have
global indexes
cannot be repacked.
Repack indexes
The pg_repack extension allows you to repack only indexes, including rebuilding indexes and reclaiming the free space in indexes. This is suitable for bloated indexes.
If an index has excessive free space, we recommend that you use the
REINDEX CONCURRENTLY
statement to rebuild the index online. The pg_repack extension is not necessary for this purpose. The pg_repack extension is developed to provide index rebuilding capabilities for earlier PostgreSQL versions that do not support theREINDEX CONCURRENTLY
statement.The pg_repack extension does not allow you to repack indexes on declaratively partitioned tables. As an alternative, you can use the
REINDEX CONCURRENTLY
statement.
Syntax description 3
You can use the
--index
option to specify the name of the index to repack.pg_repack -h <host> -p <port> -d <db> -U <user> --no-superuser-check --echo --index schema1.table1
You can use the
--only-indexes
option to repack all indexes in a table.pg_repack -h <host> -p <port> -d <db> -U <user> --no-superuser-check --echo --table schema1.table1 --only-indexes
More features
You can run the pg_repack --help
command to view more features of the pg_repack extension.
FAQ
Dry run
Before you use the pg_repack extension to repack a table, we recommend that you use the --dry-run
option to perform a dry run. A dry run does not change data in the table, but only checks whether the command to be run is valid and whether the repack process can run as expected. After the command is verified, you can delete this option and run the extension.
pg_repack -h <host> -p <port> -d <db> -U <user> --no-superuser-check --echo --table schema1.table1 --dry-run
Permissions
You must run the pg_repack extension by using a privileged account instead of a standard account. Otherwise, the following error is reported:
must be polar_superuser or superuser to use xx function
.If the
You must be a superuser to use pg_repack
error is reported, add the--no-superuser-check
option to the used pg_repack command to bypass the superuser check.
Timeout
If the size of the table or index to be repacked is excessively large, the repack operation lasts for an extended period of time. In this case, one of the following errors is reported: FATAL: terminating connection due to idle-in-transaction timeout
or FATAL: terminating connection due to idle-session timeout
. This indicates that the connection between the client of the pg_repack extension and the PolarDB for PostgreSQL (Compatible with Oracle) cluster timed out and the repack operation fails.
To resolve this issue, add the PGOPTIONS
option before the pg_repack command to specify timeout-related parameters to disable the timeout feature. This prevents disconnections due to timeout.
PGOPTIONS="-c idle_session_timeout=0 -c idle_in_transaction_session_timeout=0 -c statement_timeout=0" pg_repack -h <host> -p <port> -d <db> -U <user> --no-superuser-check --table schema1.table1
Residual object cleanup
If the pg_repack extension abnormally exits, the repack operation fails. However, objects that are created during the repack process may remain in the table that is repacked. In this case, the objects need to be cleared at the earliest opportunity. Otherwise, the use of the table may be affected.
The
repack_trigger
trigger may remain on the table that is repacked. You must execute theDROP TRIGGER
statement to delete the trigger.The
index_<oid>
temporary index may remain on the table that is repacked. You must execute theDROP INDEX CONCURRENTLY
statement to delete the index.The
repack_<oid>
temporary table and thelog_<oid>
log table are created during the repack process. You must execute theDROP TABLE
statement to delete the tables.The new type
pk_<oid>
is created during the repack process. You must execute theDROP TYPE
statement to delete the type.
How the extension works
You can use the pg_repack extension to repack full tables and indexes.
The following procedure shows how the extension repacks a full table:
Create a log table. The log table is used to record the changes to the original table during the repack process.
Create triggers on the original table. The triggers are used to record the
INSERT
,UPDATE
, andDELETE
statements that are executed on the original table and insert the records into the log table during the repack process.Create a table that has the same schema as that of the original table and import data from the original table into the new table.
Create the indexes of the original table in the new table.
Apply the incremental data changes recorded in the log table to the new table.
Switch the original and new tables in the system catalog.
Delete the original table.
NoteThe pg_repack extension holds an exclusive lock on the original table for a short period of time to prohibit read and write operations on the original table in Steps 1, 2, 6, and 7. The extension holds an ACCESS SHARE lock on the original table in the other steps. The ACCESS SHARE lock does not block the execution of the
INSERT
,UPDATE
, andDELETE
statements on the original table but blocksDDL
statements.
The following procedure shows how the extension repacks the indexes in a table:
Create indexes by executing the
CREATE INDEX CONCURRENTLY
statement.Switch the original and new indexes in the system catalog.
Delete the original indexes by executing the
DROP INDEX CONCURRENTLY
statement.
NoteWhile the pg_repack extension achieves the same outcome as the
REINDEX CONCURRENTLY
statement, the extension requires a more complex process. TheREINDEX CONCURRENTLY
statement implements the repack process in a single step. The pg_repack extension is developed to provide index rebuilding capabilities for earlier PostgreSQL versions that do not support theREINDEX CONCURRENTLY
statement.
References
For more information about the pg_repack extension, see Reorganize tables in PostgreSQL databases with minimal locks.