All Products
Search
Document Center

PolarDB:pg_repack

Last Updated:Oct 25, 2024

PolarDB for PostgreSQL allows you to use the pg_repack extension to defragment tablespaces. This solves the issue of table bloating caused by frequent update and delete operations on tables. The pg_repack extension holds an exclusive lock on a bloated table for a short period of time and does not block read or write operations most of the time. The pg_repack extension is more lightweight than the CLUSTER and VACUUM FULL statements.

Prerequisites

The PolarDB for PostgreSQL cluster is of one of the following versions:

  • PolarDB for PostgreSQL 15 whose revision version is 15.7.2.0 or later

  • PolarDB for PostgreSQL 14 whose revision version is 14.10.16.0 or later

  • PolarDB for PostgreSQL 11 whose revision version is 1.1.36 or later

Note

You can execute one of the following statements to view the revision version of a PolarDB for PostgreSQL cluster:

  • PolarDB for PostgreSQL 15 and PostgreSQL 14

    select version();
  • PolarDB for PostgreSQL 11

    show polar_version;

Usage notes

  • You cannot use the -a or --all option to repack all databases.

  • You cannot use the pg_repack extension to repack a database. You must use the --table, --parent-table, or --index option to specify the table or index that you want to repack.

  • You cannot use the -c or -schema option to repack a schema.

  • PolarDB for PostgreSQL does not allow you to create tablespaces. Instead, you can use the default tablespaces. Therefore, you cannot use the -s or --tablespace option to repack a specific tablespace.

  • The pg_repack extension requires additional temporary storage space to hold new tables and log tables generated during the repack process. Therefore, the available storage space in the cluster must be at least twice the size of the tables to be repacked.

  • The pg_repack extension occupies a large amount of disk I/O resources. Before you use the extension, you must evaluate whether the repack operation may interrupt your business. For example, if your cluster uses PL1 Enterprise SSDs (ESSDs) and you want to repack a 100-GB table, the I/O throughput of the cluster can reach the maximum I/O throughput of 250 MB/s.

Usage

The pg_repack extension installed on a PolarDB for PostgreSQL cluster acts as a server, and provides a dedicated client. You must use the server together with the client.

Install the extension

CREATE EXTENSION pg_repack;

Check the extension version.

SELECT extversion FROM pg_extension WHERE extname = 'pg_repack';

If you have installed an earlier version of the extension, you can update the extension to the latest version by uninstalling the original version and then re-creating the extension.

DROP EXTENSION pg_repack;
CREATE EXTENSION pg_repack;
Note

If the pg_repack extension is not installed on your cluster, the error ERROR:pg_repack failed with error: pg_repack x.x.x is not installed in the database is reported when you run pg_repack commands on the client.

Install the client of the pg_repack extension

The client of the pg_repack extension is released together with the PolarDBTools toolkit. After you download and install PolarDBTools toolkit, you can use the client of the extension in the toolkit. For more information about the download URLs and installation method of PolarDBTools, see PolarDBTools.

Note
  • The client version of the pg_repack extension must match the extension version. Otherwise, the client returns one of the following errors: ERROR:pg_repack failed with error: extension 'pg_repack x.x.x' required, found 'pg_repack y.y.y'; please drop and re-create the extension or ERROR:pg_repack failed with error: program 'pg_repack x.x.x' does not match database library 'pg_repack y.y.y'.

  • Download the PolarDBTools toolkit that matches the version of your PolarDB cluster by using the preceding URL.

Repack standard tables and table partitions

The pg_repack extension allows you to repack a standard table or a partition of a partitioned table, achieving a similar outcome to the CLUSTER or VACUUM FULL statement. The repack process reclaims the free space in tables and rebuilds their indexes. This makes the pg_repack extension suitable for bloated tables.

Note
  • The table that you repack must have a primary key or a unique index.

  • Temporary tables cannot be repacked.

  • Partitions that have global indexes cannot be repacked.

Syntax description 1

  • You can use the --table option to specify the table to repack. The following command is equivalent to the CLUSTER statement. During the repack process, data in the specified table is ordered by the column on which a CLUSTER operation is performed.

    pg_repack -h <host> -p <port> -d <db> -U <user> --no-superuser-check --echo --table schema1.table1
  • If you want to order data by the specified column, you can use the --order-by option to specify the column name.

    pg_repack -h <host> -p <port> -d <db> -U <user> --no-superuser-check --echo --table schema1.table1 --order-by <Column name>
  • If you do not want to order data, you can use the --no-order option. The following command is equivalent to the VACUUM FULL statement.

    pg_repack -h <host> -p <port> -d <db> -U <user> --no-superuser-check --echo --table schema1.table1 --no-order
  • If the CPU and I/O resources of the cluster are sufficient, you can use the --jobs option to accelerate the repack process. In this case, the extension starts multiple processes to concurrently rebuild indexes. This is suitable for tables that have multiple indexes.

    pg_repack -h <host> -p <port> -d <db> -U <user> --no-superuser-check --echo --table schema1.table1 --jobs <Concurrency>

Repack partitioned tables and inherited tables

The pg_repack extension supports operations on partitioned tables created by using declarative partitioning or table inheritance. The extension automatically identifies all partitions in the parent table and repacks each partition in turn. This is suitable for partitioned tables in which all partitions are bloated.

Syntax description 2

  • You can use the --parent-table option to specify the name of the partitioned table to repack.

    pg_repack -h <host> -p <port> -d <db> -U <user> --no-superuser-check --echo --parent-table schema1.table1
    Note
    • Except for the --parent-table option, the usage of the other options on partitioned tables is basically the same as that on standard tables.

    • If only one partition is bloated, you do not need to repack the entire partitioned table. You can use the --table option to repack only the bloated partition as described in the Syntax description 1 section of this topic.

    • Partitioned tables that have global indexes cannot be repacked.

Repack indexes

The pg_repack extension allows you to repack only indexes, including rebuilding indexes and reclaiming the free space in indexes. This is suitable for bloated indexes.

Note
  • If an index has excessive free space, we recommend that you use the REINDEX CONCURRENTLY statement to rebuild the index online. The pg_repack extension is not necessary for this purpose. The pg_repack extension is developed to provide index rebuilding capabilities for earlier PostgreSQL versions that do not support the REINDEX CONCURRENTLY statement.

  • The pg_repack extension does not allow you to repack indexes on declaratively partitioned tables. As an alternative, you can use the REINDEX CONCURRENTLY statement.

Syntax description 3

  • You can use the --index option to specify the name of the index to repack.

    pg_repack -h <host> -p <port> -d <db> -U <user> --no-superuser-check --echo --index schema1.table1
  • You can use the --only-indexes option to repack all indexes in a table.

    pg_repack -h <host> -p <port> -d <db> -U <user> --no-superuser-check --echo --table schema1.table1 --only-indexes

More features

You can run the pg_repack --help command to view more features of the pg_repack extension.

FAQ

Dry run

Before you use the pg_repack extension to repack a table, we recommend that you use the --dry-run option to perform a dry run. A dry run does not change data in the table, but only checks whether the command to be run is valid and whether the repack process can run as expected. After the command is verified, you can delete this option and run the extension.

pg_repack -h <host> -p <port> -d <db> -U <user> --no-superuser-check --echo --table schema1.table1 --dry-run

Permissions

  • You must run the pg_repack extension by using a privileged account instead of a standard account. Otherwise, the following error is reported: must be polar_superuser or superuser to use xx function.

  • If the You must be a superuser to use pg_repack error is reported, add the --no-superuser-check option to the used pg_repack command to bypass the superuser check.

Timeout

If the size of the table or index to be repacked is excessively large, the repack operation lasts for an extended period of time. In this case, one of the following errors is reported: FATAL: terminating connection due to idle-in-transaction timeout or FATAL: terminating connection due to idle-session timeout. This indicates that the connection between the client of the pg_repack extension and the PolarDB for PostgreSQL cluster timed out and the repack operation fails.

To resolve this issue, add the PGOPTIONS option before the pg_repack command to specify timeout-related parameters to disable the timeout feature. This prevents disconnections due to timeout.

# PolarDB for PostgreSQL 14:
PGOPTIONS="-c idle_session_timeout=0 -c idle_in_transaction_session_timeout=0 -c statement_timeout=0" pg_repack -h <host> -p <port> -d <db> -U <user> --no-superuser-check --table schema1.table1

# PolarDB for PostgreSQL 11:
PGOPTIONS="-c polar_idle_session_timeout=0 -c idle_in_transaction_session_timeout=0 -c statement_timeout=0" pg_repack -h <host> -p <port> -d <db> -U <user> --no-superuser-check --table schema1.table1

Residual object cleanup

If the pg_repack extension abnormally exits, the repack operation fails. However, objects that are created during the repack process may remain in the table that is repacked. In this case, the objects need to be cleared at the earliest opportunity. Otherwise, the use of the table may be affected.

  • The repack_trigger trigger may remain on the table that is repacked. You must execute the DROP TRIGGER statement to delete the trigger.

  • The index_<oid> temporary index may remain on the table that is repacked. You must execute the DROP INDEX CONCURRENTLY statement to delete the index.

  • The repack_<oid> temporary table and the log_<oid> log table are created during the repack process. You must execute the DROP TABLE statement to delete the tables.

  • The new type pk_<oid> is created during the repack process. You must execute the DROP TYPE statement to delete the type.

How the extension works

You can use the pg_repack extension to repack full tables and indexes.

  • The following procedure shows how the extension repacks a full table:

    1. Create a log table. The log table is used to record the changes to the original table during the repack process.

    2. Create triggers on the original table. The triggers are used to record the INSERT, UPDATE, and DELETE statements that are executed on the original table and insert the records into the log table during the repack process.

    3. Create a table that has the same schema as that of the original table and import data from the original table into the new table.

    4. Create the indexes of the original table in the new table.

    5. Apply the incremental data changes recorded in the log table to the new table.

    6. Switch the original and new tables in the system catalog.

    7. Delete the original table.

    Note
    • The pg_repack extension holds an exclusive lock on the original table for a short period of time to prohibit read and write operations on the original table in Steps 1, 2, 6, and 7. The extension holds an ACCESS SHARE lock on the original table in the other steps. The ACCESS SHARE lock does not block the execution of the INSERT, UPDATE, and DELETE statements on the original table but blocks DDL statements.

  • The following procedure shows how the extension repacks the indexes in a table:

    1. Create indexes by executing the CREATE INDEX CONCURRENTLY statement.

    2. Switch the original and new indexes in the system catalog.

    3. Delete the original indexes by executing the DROP INDEX CONCURRENTLY statement.

    Note

    While the pg_repack extension achieves the same outcome as the REINDEX CONCURRENTLY statement, the extension requires a more complex process. The REINDEX CONCURRENTLY statement implements the repack process in a single step. The pg_repack extension is developed to provide index rebuilding capabilities for earlier PostgreSQL versions that do not support the REINDEX CONCURRENTLY statement.

References

For more information about the pg_repack extension, see Reorganize tables in PostgreSQL databases with minimal locks.