Reduce the storage occupied by the tables of an ApsaraDB RDS for PostgreSQL instance

0.0.201

As an ApsaraDB RDS for PostgreSQL instance runs, a large number of tablespace fragments are generated within tables. The autovacuum feature of ApsaraDB RDS for PostgreSQL can manage only the space within each page but cannot effectively reduce the number of pages to release disk space. In this case, you can use VACUUM FULL, pg_repack, and pg_squeeze to reduce the storage occupied by the tables of an RDS instance.

Overview

Cleanup effects of autovacuum compared with VACUUM FULL, pg_repack, and pg_squeeze

Comparison between VACUUM FULL, pg_repack, and pg_squeeze in terms of storage capacity reduction

Storage capacity reduction method	Benefit	Limit	Scenario

Storage capacity reduction method	Benefit	Limit	Scenario
VACUUM FULL	No limits are imposed on the tables whose data you want to clear. No additional extensions are required.	During the execution, VACUUM FULL adds an exclusive lock to the required table. As a result, the table cannot be accessed during the execution.	The data in the required table can be unavailable within a specific period of time.
pg_repack	When a table is recreated, you can add, delete, modify, or query table data.	The recreated table must have a primary key or a unique key. The pg_repack extension is required. You must install the pg_repack client on your computer and use the client to connect to the required database and perform repack operations.	The data in the required table needs to be frequently modified.
pg_squeeze	When a table is recreated, you can add, delete, modify, or query table data.	The cleared table must have a primary key or a unique key. You must install the pg_squeeze extension but you do not need to install the pg_squeeze client on your computer. If a large amount of incremental data is continuously written, the pg_squeeze extension may fail to complete the operation.	The data in the required table does not need to be frequently modified.

VACUUM FULL

VACUUM FULL is a command provided by the PostgreSQL community to recreate tables. The command generates a replica for the original table and then deletes the original table. VACUUM FULL does not impose limits on the tables whose data you want to clear. However, VACUUM FULL adds an exclusive lock to the required table during the execution. As a result, the table cannot be accessed and your workloads are significantly affected.

pg_repack

pg_repack allows you to add, delete, modify, and query table data during the table recreation. This effectively resolves the issue caused by VACUUM FULL. However, pg_repack has limits. For example, if you use pg_repack, the recreated table must have a primary key or a unique key. The use of pg_repack is relatively complex. You must install the pg_repack client on your computer and use the client to connect to the required database and perform repack operations, which may cause inconvenience in the production environment.

pg_repack processes existing and incremental data to implement the preceding purpose.

Processes existing data and generates a new table that is the snapshot copy of the original table.
Processes incremental data and uses a trigger to continuously store newly generated data to a temporary table.
Ensures that the incremental data is completely processed. After the new table is generated, the incremental data in the temporary table is applied to the new table until the temporary table is empty. This ensures that all incremental data is transferred.

For more information about how to use pg_repack, see Use the pg_repack extension to clear tablespaces.

pg_squeeze

Similar to pg_repack, pg_squeeze can clear table data without blocking add, delete, change, and query operations. If you use pg_squeeze to clear table data, the table must have a primary key or a unique key. pg_squeeze is easier to use than pg_repack. You need to only install the pg_squeeze extension in the required database. pg_squeeze supports automatic detection and can trigger cleanup operations.

pg_squeeze works in a similar manner as pg_repack. However, pg_squeeze stores incremental data in a different manner from pg_repack. pg_squeeze uses logical replication to store incremental data in the internal data structure of PostgreSQL. pg_repack uses triggers to store data to a temporary table.

For more information about how to use pg_squeeze, see Use the pg_squeeze extension to shrink bloated tables and indexes.

Performance comparison between pg_repack and pg_squeeze

The extensions have the same features and similar limits. In a production environment, you can select an extension based on the performance.

If no incremental data is written during a cleanup operation, the extensions deliver similar performance. If a fixed amount of incremental data is written during a cleanup operation, pg_squeeze delivers better performance than pg_repack. If a large amount of incremental data is continuously written during a cleanup operation, pg_squeeze may fail to complete the operation.

pg_squeeze reduces the write performance of a table to 50% of the original write performance. pg_repack reduces the write performance of a table to 10% of the original write performance. pg_repack uses triggers and temporary tables to store incremental data. If you use pg_repack to clear table data, the write speed is significantly slowed down, and all incremental data can be consumed. If you use pg_squeeze to clear table data, incremental data may be generated faster than the speed to apply the incremental data. As a result, the incremental data cannot be completely consumed.

We recommend that you use pg_repack for tables that are frequently modified and use pg_squeeze in other scenarios.

Feedback