This topic provides an overview of the features and benefits of partitioned tables in a PolarDB for PostgreSQL cluster.
Introduction
In a PolarDB for PostgreSQL cluster, you can physically divide a table or an index into smaller pieces, called partitions, for easier management. Each partition is an object that has its own name and storage properties. For database administrators, the partitions can be managed collectively or separately, which gives them greater flexibility in their work. To applications, a partitioned table does not have difference than a non-partitioned table. You do not need to modify SQL statements or DML commands when you query a partitioned table.
Partitions of a partitioned table must have the same logical properties such as column names, data types, and constraints, but can have different physical properties, such as compression status, physical storage settings, and tablespace.
Partitioning offers benefits to many types of applications, especially those that involve massive data volumes. For online transaction processing (OLTP) databases, partitioning improves manageability and availability. For online analytical processing (OLAP) databases, the major benefits lie in performance and manageability.
Scenarios
Partition a table when it is excessively large, such as when the table space is larger than the physical memory of the database server. This can improve database performance. For example, you may want to partition a table if it is larger than 2 GB in size.
Partition a large table when it stores historical data, and new data needs to be written into the latest partition. For example, you can do this when a large table is used to store the data for a whole year. You can store the data of the most recent month in a separate and updatable partition, and the data of the previous months in other read-only partitions.
Benefits
Higher performance
Partitioning can significantly improve query performance, especially when the most frequently accessed rows can be limited to one or a few partitions. Partitioning replaces the upper levels of an index tree. When a query involves one or a few specific partitions, the database system can perform a sequential scan only on those partitions instead of relying on indexes. This can improve performance and manageability because the system processes contiguous data chunks rather than records scattered throughout the entire table.
Easier management
As standalone objects, partitions can be managed individually or collectively, and DDL operations can be performed on partitions, instead of the entire table or index. Therefore, you can break down resource-intensive tasks, such as rebuilding indexes or tables, You can remove a single partition at a time. When an issue occurs, you need to move only the relevant partition, instead of the entire table. You can also perform batch operations on data records with partitions as units. For example, when you want to remove data from a table, you only need to use DROP TABLE to delete the relevant partition or use ALTER TABLE DETACH PARTITION to remove the partition from the parent table. VACUUM operations are not required to reclaim storage space after these operations, which is another advantage over batch deletion.
Reduced resource contention
In some OLTP databases, partitioning can reduce the contention for resources, such as when a DML statement is executed on multiple partitions.
Higher availability
When a partition becomes unavailable, the rest of the partitioned table can still be accessed. The query optimizer automatically removes unavailable partitions from query plans to avoid impact on queries.
Lower storage costs
Infrequently accessed partitions can be dumped to slower, cheaper storage media to reduce costs.
Take note that the preceding benefits are valid only when the partitioned table is large in size. We recommend that you partition a table when its size reaches the size of the physical memory of your database server.
How it works
Partitioning introduces complexity to the structure, but this is transparent to users. This section describes the characteristics and mechanism of partitioning. Understanding this information can help you effectively use partitioned tables.
Example 1
CREATE TABLE measurement (
city_id int not null,
logdate date not null,
peaktemp int,
unitsales int
) PARTITION BY RANGE (logdate);
CREATE TABLE measurement_y2006m02 PARTITION OF measurement
FOR VALUES FROM ('2006-02-01') TO ('2006-03-01');
CREATE TABLE measurement_y2006m03 PARTITION OF measurement
FOR VALUES FROM ('2006-03-01') TO ('2006-04-01');
...
CREATE TABLE measurement_y2007m11 PARTITION OF measurement
FOR VALUES FROM ('2007-11-01') TO ('2007-12-01');
CREATE TABLE measurement_y2007m12 PARTITION OF measurement
FOR VALUES FROM ('2007-12-01') TO ('2008-01-01')
TABLESPACE fasttablespace;
CREATE TABLE measurement_y2008m01 PARTITION OF measurement
FOR VALUES FROM ('2008-01-01') TO ('2008-02-01')
WITH (parallel_workers = 4)
TABLESPACE fasttablespace;
Partition keys
A partition key is a single column or a combination of multiple columns that determines the partition to which each row is distributed. PolarDB for PostgreSQL automatically routes operations such as insertion, update, or deletion, to the relevant partitions based on the partition key.
In Example 1, logdate
is the partition key of the partitioned table named measurement
. The bounds of partitions of the table measurement
are determined by logdate
values.
Partitioning methods
PolarDB for PostgreSQL supports multiple partitioning methods.
Range partitioning
Range partitioning partitions a table based on ranges of values of the partition key, such as date ranges or identifier ranges of your business. The boundaries of each range are inclusive of the lower end and exclusive of the upper end. For example, if the range of values of the first partition is from 1 to 10, and the range of values of the second partition is from 10 to 20, 10 belongs to the second partition, not the first. The
measurement
table in Example 1 is a range-partitioned table.Interval range partitioning is a partitioning method extended from range partitioning. For more information, see Interval range partitioning.
List partitioning
Example 2
CREATE TABLE department(deptno INT4 Primary Key,dname VARCHAR(50), location VARCHAR(100)) PARTITION BY LIST (deptno); CREATE TABLE department_p1 partition of department for values in (10, 20); CREATE TABLE department_p1 partition of department for values in (30, 40);
List partitioning partitions a table based on a list of discrete values. The
department
table in Example 2 is a list-partitioned table. Two partitions of the table are explicitly defined to contain only specific values of the partition key. For example, thedepartment_p1
partition stores only the rows in which thedeptno
value is10
or20
, and thedepartment_p2
partitions stores only the rows in which thedeptno
value is30
or40
.Hash partitioning
In hash partitioning, you use a hash function to compute a hash value for the partition key. Then, you divide this hash value by a given modulus and assign the row to a partition based on the remainder of this division.
Example 3
create table idxpart (i int) partition by hash (i); create table idxpart0 partition of idxpart for values with (modulus 2, remainder 0); create table idxpart1 partition of idxpart for values with (modulus 2, remainder 1);
In this example, the
idxpart
table is hash-partitioned. Theidxpart0
partition stores only rows in which the hash value ofi
divided by 2 has a remainder of 0. Theidxpart1
partition stores only rows in which the hash value ofi
divided by 2 has a remainder of 1.
Multi-level partitioning
Multi-level partitioning refers to the practice of partitioning an already partitioned table.
PolarDB for PostgreSQL supports an unlimited number of partitioning levels, but we recommend that you do not create more than three levels. Excessive partitioning levels may undermine the manageability of partitions and have a negative impact on query performance.
You can use different partitioning methods on different partitioning levels. For example, you can use range partitioning for the first level, hash partitioning for the second, and list partitioning for the third.
Example 4
CREATE TABLE measurement (
city_id int not null,
logdate date not null,
peaktemp int,
unitsales int
) PARTITION BY RANGE (logdate);
CREATE TABLE measurement_y2006m03 PARTITION OF measurement
FOR VALUES FROM ('2006-03-01') TO ('2006-04-01') PARTITION BY Hash (city_id);
CREATE TABLE measurement_y2006m03_hash1 PARTITION OF measurement_y2006m03
for values with (modulus 2, remainder 0) PARTITION BY List (peaktemp);
CREATE TABLE measurement_y2006m03_hash1_l1 PARTITION OF measurement_y2006m03_hash1 for values in (10, 20);
Syntax
For information about the statements for managing partitions, such as creating, adding, merging, splitting, and dropping partitions, see Command list for partitioned tables.