Introduction to the Core Features of the Partitioned Table in PolarDB for PostgreSQL

This article introduces the advantages of partitioned table strategies and discusses query optimization features and index options for PolarDB for PostgreSQL.

By Shenzhui

Overview

By partitioning a table in PolarDB for PostgreSQL, you can physically separate a large table or index into smaller, more manageable pieces called partitions. Each partition is an independent object with its own name and storage properties. From a database administrator's perspective, partitions can be managed collectively or separately, providing greater flexibility in managing the table. However, from an application's perspective, a partitioned table is identical to a non-partitioned table, and existing query statements and DML commands do not need to be modified.

Benefits

• Improved performance

Partitioning can significantly improve query performance, especially when the most frequently accessed rows can be limited in one or several partitions. Partitioning replaces the upper levels of an index tree. When a query involves one or a few specific partitions, the database system can perform a sequential scan on just those partitions instead of relying on indexes. This can improve performance and manageability because the system processes contiguous data chunks rather than records scattered throughout the entire table.

• Easier management

As standalone objects, partitions can be managed individually or collectively. Because DDL operations can be performed on partitions, in addition to the entire table or an index, you can use partitioning to break down resource-intensive tasks, such as rebuilding indexes or tables. A single partition can be moved at a time.

Therefore, when an issue occurs, only the relevant partition needs to be moved, instead of the entire table. What's more, you can perform batch operations on data records in the unit of partitions.

For example, when you want to remove data from a table, you only need to use DROP TABLE to delete the relevant partition or use ALTER TABLE DETACH PARTITION to remove the partition from the parent table. VACUUM operations are not required to reclaim storage space after these operations, which is another advantage over batch deletion.

• Reduced resource contention

In some OLTP systems, partitioning can reduce the contention of resources, such as when a DML statement is executed on multiple partitions.

• Higher availability

If a partition becomes unavailable, the rest of the partitioned table can still be accessed. The query optimizer automatically removes unavailable partitions from query plans to avoid impacting queries.

• Lower storage costs

Infrequently accessed partitions can be moved to a slower, cheaper storage medium, reducing storage costs.

Note that partitioning is only beneficial for extremely large tables. We recommend partitioning a table when its size reaches the size of the physical memory of your database server.

Partitioning Strategy

PolarDB for PostgreSQL supports multiple partitioning strategies.

• Range partitioning

The data is distributed based on a range of values of the partition key.

• List partitioning

When list-partitioning a table, you specify the list of key values for each partition.

• Hash partitioning

Hash partitioning is performed based on a specified modulus and remainder. Each partition stores the records where the hash value of the partition key divided by the specified modulus will produce the specified remainder.

• Multi-level partitioning

Multi-level partitioning refers to the practice of partitioning an already partitioned table, creating subpartitions that can also be further partitioned.

PolarDB for PostgreSQL supports an unlimited number of partitioning levels, but we recommend that you do not create levels beyond three. Excessive levels of partitioning may undermine the manageability of partitions and even deteriorate query performance.

Optimize Queries on Partitioned Tables

Partition Pruning

PolarDB for PostgreSQL provides the partition pruning feature. If partition pruning is enabled, the planner checks the definition of each partition to determine whether the scanning of a partition can be skipped because the partition does not contain rows that meet the condition specified by the WHERE clause. If a partition does not contain rows that meet the condition, the partition is excluded (pruned) from an execution plan. Partition pruning dramatically reduces the amount of data retrieved from the disk and shortens processing time, thus improving query performance and resource utilization.

PolarDB for PostgreSQL supports static and dynamic pruning. Static or dynamic pruning is performed based on the SQL statements that you execute.

• Static pruning occurs during compilation. In this case, the partitions to be accessed are known before the execution. For example, if an SQL statement contains a WHERE clause that specifies a condition based on a constant value of the partition key, static pruning can be performed.

• Dynamic pruning happens at run-time, meaning that the exact partitions to be accessed by a statement are not known beforehand. For example, if an SQL statement contains a WHERE clause that uses an operator or function, dynamic pruning is performed.

Partition pruning affects the statistics of the objects where pruning occurs and also affects the execution plan of a statement.

The system will limit the data scan only to the partitions that could contain the desired records. Both static pruning and dynamic pruning improve the query performance by excluding partitions from an execution plan.

PolarDB for PostgreSQL classifies conditional expressions into three types: immutable, stable, and volatile. The three types of conditional expressions correspond to three types of pruning.

• If a conditional expression is immutable, such as an expression based on a constant value, partition pruning occurs during the optimization.

• If a conditional expression is stable, such as an expression that contains now(), partition pruning occurs when the executor is initialized.

• If a conditional expression is volatile, such as an expression that contains random(), partition pruning occurs when the executor is run.

Partition-wise Join

The partition-wise join feature divides a large join into smaller joins between a pair of partitions from the two joined tables. It is used to optimize the joins between two partitions. If you want to use partition keys to join partitions, you can use the partition-wise join feature to reduce invalid joins between the partitions and improve the performance of join queries.

Parallel Append

PolarDB for PostgreSQL provides the parallel append feature for partitioned tables. This feature enables partitioned tables to provide better performance than common tables. Parallel querying is an essential capability for modern databases. PolarDB for PostgreSQL provides better parallel querying performance on partitioned tables than common tables.

Parallel append can be inter-partition parallel append, intra-partition parallel append, and hybrid parallel append.

In inter-partition parallel append, each worker queries one partition and multiple workers can query the entire partitioned table in parallel.

In intra-partition parallel append, parallel querying is performed within each partition, but sequential querying is performed between partitions.

In hybrid parallel append, parallel querying is performed both within each partition and between partitions. This delivers the highest degree of parallelism.

Improve the Performance of Query Planning

In PolarDB for PostgreSQL, no limits are imposed on the number of partitions in a partitioned table. If a table has two or more partition levels, the number of partitions increases exponentially.

For example, a table is hash-partitioned into 100 partitions, and then each partition is hash-partitioned into 100 subpartitions. There would be 10,000 partitions in total for this partitioned table. Use the EXPLAIN command to obtain the query execution plan:

explain analyze select * from part_hash;
                                 QUERY PLAN                                  
-----------------------------------------------------------------------------
 Append  (cost=0.00..344500.00 rows=16300000 width=22)
   ->  Seq Scan on part_hash_sys0102  (cost=0.00..26.30 rows=1630 width=22)
   ->  Seq Scan on part_hash_sys0103  (cost=0.00..26.30 rows=1630 width=22)
   ->  Seq Scan on part_hash_sys0104  (cost=0.00..26.30 rows=1630 width=22)
  ...
  ...
  ...
   ->  Seq Scan on part_hash_sys10198  (cost=0.00..26.30 rows=1630 width=22)
   ->  Seq Scan on part_hash_sys10199  (cost=0.00..26.30 rows=1630 width=22)
   ->  Seq Scan on part_hash_sys10200  (cost=0.00..26.30 rows=1630 width=22)
 Planning Time: 3183.644 ms
 Execution Time: 633.779 ms
(10003 rows)
Total Memory: 216852KB

As is shown in the results, the query planning takes a relatively long time, which is related to how the optimizer generates an execution plan for a partitioned table: it picks an optimal plan for each partition and appends these plans into a single plan for the entire partitioned table. If the partitioned table only has a small number of partitions, the process would be fast and deliver a satisfactory user experience. However, as the number of partitions grows, the process becomes slower, and you may find partitioned tables perform worse than non-partitioned tables.

In the previous query, the table part_hash has 10,000 partitions. Its planning time is 3 seconds. In contrast, if the table were not partitioned, the planning time for this query would be around 0.1 millisecond only, which makes for a hundred-time difference. In addition to the long planning time, the query for the partitioned table also causes high memory usage, which could lead to an OOM issue. The performance is even worse when joins are included in a query.

To solve this performance issue, PolarDB for PostgreSQL provides the PartitionedTableScan operator for querying partitioned tables. It is more efficient than the Append operator, significantly reducing planning time and avoiding OOM because less memory is consumed.

If PartitionedTableScan is used, the two SQL statements would be executed with much less planning time and memory consumption.

explain analyze select * from part_hash;
                                 QUERY PLAN                                  
----------------------------------------------------------------------------------------------------------------------------------------------------
PartitionedTableScan on part_hash  (cost=0.00..1.00 rows=1 width=22) (actual time=134.348..134.352 rows=0 loops=1)(Iteration partition number 10000)
   Scan Partitions: part_hash_sys0102, part_hash_sys0103, ...part_hash_sys10198, part_hash_sys10199, part_hash_sys10200
   ->  Seq Scan on part_hash  (cost=0.00..1.00 rows=1 width=22)
 Planning Time: 293.778 ms
 Execution Time: 384.202 ms
(5 rows)
Total Memory: 40276KB

PartitionedTableScan delivers higher efficiency. This performance test shows the performance improvement made by PartitionedTableScan.

1. Planning Time of a single SQL statement

Number of partitions	Append Planning Time	PartitionedTableScan Planning Time
16	0.266ms	0.067ms
32	1.820ms	0.258ms
64	3.654ms	0.402ms
128	7.010ms	0.664ms
256	14.095ms	1.247ms
512	27.697ms	2.328ms
1024	73.176ms	4.165ms

2. Memory usage of a single SQL statement

Number of partitions	Append Memory	PartitionedTableScan Memory
16	1,170 KB	1,044 KB
32	1,240 KB	1,044 KB
64	2,120 KB	1,624 KB
128	2,244 KB	1,524 KB
256	2,888 KB	2,072 KB
512	4,720 KB	3,012 KB
1024	8,236 KB	5,280 KB

Indexes in Partitioned Tables

Local Indexes

In a partitioned table, local indexes correspond to partitions and have the same number and range as partitions. Each indexed partition is associated with a partition in the parent table. Therefore, all keys in an indexed partition only reference rows stored in a single partition. Therefore, indexed partitions are automatically synchronized with their associated partitions and are independent of each other.

You can create a local index by specifying LOCAL attributes. For a local index, partitions or subpartitions that are in the same number as those in the parent table are created. These partitions or subpartitions cover the same range as those in the parent table.

PolarDB for PostgreSQL automatically maintains indexed partitions when partitions in the parent table are added, deleted, merged, or split, or when hash partitions or hash sub-partitions are added or merged.

If a partitioned key column is a subset of an index column, you can create a unique local index to ensure that rows with the same index key always are mapped to the same partition.

Global Indexes

A global index is a B- tree index that can also be partitioned independently of the parent table.

A globally indexed partition can point to all partitions in the parent table. A global index can also be partitioned. Its partition key must be the prefix of the index key.

PolarDB for PostgreSQL provides the global index feature. Global indexes are created on a partition table. Unlike local indexes that are created on each partition by default, global indexes are for the data of the entire partitioned table (each index corresponding to multiple partitions). This can provide globally unique constraints on non-partition keys and greatly improve the query performance of non-partition keys.

• Point query performance on non-partition keys

Item	TPS
Prepared Statement	Not used			Used
Concurrency	1	32	64	1	32	64
Common table	27,732	494,433	430,848	53,935	985,880	886,882
Partitioned table and local index	367	4,155	3,688	856	8,742	6,790
Partitioned table and global index	19,006	308,128	262,941	45,090	820,924	731,557

• TPC-B performance on non-partition keys

Note: Both point queries and DML are included.

Item	TPS
Prepared Statement	Not used			Used
Concurrency	1	32	64	1	32	64
Common table	1,115	51,025	60,409	4,822	90,312	100,802
Partitioned table and local index	271	2,903	2,524	550	5,276	4,237
Partitioned table and global index	3,453	36,320	39,941	4,334	69,040	75,232

Summary

This article begins with a brief introduction to partitioned table strategies and their advantages. It then covers the query optimization features supported by PolarDB for PostgreSQL on partitioned tables. Lastly, it explores local and global indexes on partitioned tables, aiming to provide a comprehensive understanding of PolarDB for PostgreSQL partitioned tables. For more information, see PolarDB for PostgreSQL documentation: partitioned table.

Discover what and how you can use our products to build >>

Community

Introduction to the Core Features of the Partitioned Table in PolarDB for PostgreSQL

Overview

Benefits

Partitioning Strategy

Optimize Queries on Partitioned Tables

Partition Pruning

Partition-wise Join

Parallel Append

Improve the Performance of Query Planning

Indexes in Partitioned Tables

Local Indexes

Global Indexes

Summary

Read previous post:

Read next post:

ApsaraDB

You may also like

Comments

ApsaraDB

Related Products

PolarDB for PostgreSQL

PolarDB for Xscale

PolarDB for MySQL

AnalyticDB for PostgreSQL