ApsaraDB RDS for PostgreSQL 12: Improved High Concurrency Performance for Large Partitioned Tables

By Lingce

On October 3, 2019, Alibaba Cloud released ApsaraDB RDS for PostgreSQL 12, the world's most powerful open-source database system. ApsaraDB RDS for PostgreSQL 12 offers significant improvements in features and performance. For example, it has improved the high-concurrency performance for large partitioned tables by a factor of 100, optimized B-tree index space and performance, implemented the standard JSON-formatted features of SQL Server 2016, and provides support for multi-column statistics on most common values (MCVs), inline common table expressions (CTEs), and pluggable table storage APIs. This article describes some features of ApsaraDB RDS for PostgreSQL 12.

Partitioned Table Performance

ApsaraDB RDS for PostgreSQL has supported partitioned tables for a long time. Before ApsaraDB RDS for PostgreSQL 10.0, users needed to manually create partitioned tables by inheriting data. Support for declarative partitions was introduced in V10.0. Users create partitioned tables by running SQL statements, which facilitates the use of partitioned tables. ApsaraDB RDS for PostgreSQL 11 supports hash partitions and enhances partition pruning policies in the planning and execution phases to improve the query performance of partitioned tables. ApsaraDB RDS for PostgreSQL 12 further enhances the query and data import performance of partitioned tables. Query performance significantly improves when there are a large number of partitions.

Create an ApsaraDB RDS for PostgreSQL 11 instance and an ApsaraDB RDS for PostgreSQL 12 instance with the same specifications (4-core and 8 GB), and test the performance of copying 100 million lines of data with different partition quantities. As shown in the following figure, the import performance of ApsaraDB RDS for PostgreSQL 12 is always better than that of ApsaraDB RDS for PostgreSQL 11 as the number of partitions increases. The improved performance of data copy is due to the support for batch insertion of partitioned tables in ApsaraDB RDS for PostgreSQL 12. In earlier versions, only one line of data could be inserted each time.

In ApsaraDB RDS for PostgreSQL 10, each partitioned table was checked in sequence for required data. The processing of each partitioned table was similar to that of a common table. ApsaraDB RDS for PostgreSQL 11 introduced the partition pruning feature, allowing users to quickly locate the partitions to be accessed. ApsaraDB RDS for PostgreSQL 12 further optimizes the partition pruning feature in order to avoid loading metadata for each partition and generating internal structures, which decouples the planned query time consumption from irrelevant partitions. This optimization is related to partition-based filtering in query criteria. Better partition-based filtering leads to fewer partitions to process and better optimization results.

The following figure compares the query performance on the partition key (also the primary key) with different partition quantities. As shown in the figure, the more partitions, the greater the improvement in the performance of ApsaraDB RDS for PostgreSQL 12. The performance can improve by up to 150 times. As the number of partitions increases, the performance of ApsaraDB RDS for PostgreSQL 12 remains relatively stable.

Although the performance of partitioned tables is greatly improved, there is still a gap in performance compared with that of a single table. When you design the table structure, you still need to determine whether to perform partitioning and specify the number of partitions based on the scenario requirements.

Index Enhancement

B-tree indexes are widely used in database systems which effectively reduce the amount of query data to be accessed and improve query performance. Indexes are query optimization policies that exchange space for time. They consume some additional storage space, and their performance is crucial for queries. ApsaraDB RDS for PostgreSQL 12 improves the overall performance of standard B-tree indexes and reduces their disk space usage. For composite indexes, the space usage can be reduced by up to 40%, effectively saving disk space. The performance of B-tree indexes with duplicate entries is also improved. In addition, the REINDEX CONCURRENTLY command is introduced to recreate indexes without affecting your business.

Let's test the space usage of B-tree indexes. Create the following tables and indexes in ApsaraDB RDS for PostgreSQL 11 and ApsaraDB RDS for PostgreSQL 12 respectively, insert 20 million lines of data in each table, and run the VACUUM statement to update the statistics.

CREATE TABLE foo (
    aid bigint NOT NULL,
    bid bigint NOT NULL
);
ALTER TABLE foo
    ADD CONSTRAINT foo_pkey PRIMARY KEY (aid, bid);
CREATE INDEX foo_bid_idx ON foo(bid);
INSERT INTO foo (aid, bid)
    SELECT i, i / 10000
    FROM generate_series(1, 20000000) AS i; 
VACUUM (ANALYZE) foo;

Check the size of the foo_bid_idx index in V11 and V12.

  #  PostgreSQL 11
postgres=> \di+ foo_bid_idx
                                    List of relations
 Schema |    Name     | Type  |    Owner    | Table | Persistence |  Size  | Description
--------+-------------+-------+-------------+-------+-------------+--------+-------------
 public | foo_bid_idx | index |   postgres  | foo   | permanent   | 544 MB |
(1 row)
2 . # PostgreSQL 12
postgres=> \di+ foo_bid_idx
                                    List of relations
 Schema |    Name     | Type  |    Owner    | Table | Persistence |  Size  | Description
--------+-------------+-------+-------------+-------+-------------+--------+-------------
 public | foo_bid_idx | index |   postgres  | foo   | permanent   | 408 MB |
(1 row)

Note that the index size in V11 is 33% greater than that of V12. When there are a large number of indexes, the new version saves a significant amount of storage space.

Non-B-tree indexes are also enhanced. For example, the overhead for generating WAL logs of GiST, GIN, and SP-GiST indexes is reduced. GiST indexes are used to create covering indexes. The distance operator of SP-GiST indexes is used for K-NN queries.

Support for SQL and JSON Path Languages

Earlier versions of ApsaraDB RDS for PostgreSQL supported JSON-formatted data types and queries of simple JSON-formatted data. However, if the JSON-formatted data was complex, such as data including a lot of nested data and arrays, it's not possible to query the values easily. Instead, use external plug-ins, such as jsQuery, that support SQL and JSON path languages.

ApsaraDB RDS for PostgreSQL 12 provides further support for unstructured data. It also supports JSON features and various path query methods introduced by the SQL Server 2016 standard, introduces the new data type jsonpath to indicate the path expression, and supports various complex JSON-format queries without relying on plug-ins. For more information about specific usage, see the relevant documentation.

Parameter-controlled Plan Selection

For repeatedly executed PREPARE statements, ApsaraDB RDS for PostgreSQL caches their execution plans. When you run the PREPARE statement, ApsaraDB RDS for PostgreSQL automatically chooses to generate a new plan (a custom plan) or use a cached plan (a generic plan). However, in certain scenarios, the plan selected by the database may not be optimal. ApsaraDB RDS for PostgreSQL 12 provides the plan_cache_mode parameter to select a plan. For example, if the query parameter is always a constant, you may explicitly set this parameter so that the optimizer always uses a generic plan. This optimizes query performance by avoiding SQL parsing and rewriting.

Run the ** PREPARE statement** and use a custom plan for the first 5 execution:
postgres=> prepare p(integer) as select aid from foo where aid=$1;
PREPARE
postgres=> EXPLAIN EXECUTE p(1);
                               QUERY PLAN------------------------------------------------------------------------
 Index Only Scan using foo_pkey on foo  (cost=0.44..1.56 rows=1 width=8)
   Index Cond: (aid = 1)
(2 rows)
1. # The results of the next four executions are omitted here.

Use a generic plan at the sixth execution:
> postgres=> EXPLAIN EXECUTE p(1);
                               QUERY PLAN
-------------------------------------------------------------------------
 Index Only Scan using foo_pkey on foo  (cost=0.44..1.56 rows=1 width=8)
   Index Cond: (aid = $1)
(2 rows)  

Run the PREPARE statement again,
set plan_cache_mode  
 to force_generic_plan,  
and check the plan usage. You can see that a generic plan has been used at the first execution.
> postgres=> DEALLOCATE p;
DEALLOCATE
postgres=> prepare p(integer) as select aid from foo where aid=$1;
PREPARE
(1)# Set plan_cache_mode to force_generic_plan.
postgres=> set plan_cache_mode = force_generic_plan;
SET
postgres=> EXPLAIN EXECUTE p(1);
                               QUERY PLAN
-------------------------------------------------------------------------
 Index Only Scan using foo_pkey on foo  (cost=0.44..1.56 rows=1 width=8)
   Index Cond: (aid = $1)
(2 rows)

Variable parameterization in the execution plan helps to determine whether to use a generic plan.

Pluggable Table Storage APIs

For a long time, ApsaraDB RDS for PostgreSQL only supported the heap table storage engine, and its implementation was highly coupled with other modules. Based on its index scalability, ApsaraDB RDS for PostgreSQL 12 abstracts a layer of storage engine access APIs to support multiple storage engines, such as Zheap, columnar store, KVStore, and memory engines.

The following figure shows the architecture of pluggable table storage APIs. On the basis of the original architecture, Table Access Manager is added to provide a unified table access API. Different storage engines can be connected through this API.

Currently, only the heap table storage engine is supported. More storage engines will be supported in the near future. If you are interested in storage engines, try to implement one.

postgres=> select * from pg_am;
 oid  | amname |      amhandler       | amtype
------+--------+----------------------+--------
    2 | heap   | heap_tableam_handler | t
  403 | btree  | bthandler            | i
  405 | hash   | hashhandler          | i
  783 | gist   | gisthandler          | i
 2742 | gin    | ginhandler           | i
 4000 | spgist | spghandler           | i
 3580 | brin   | brinhandler          | i
(7 rows)

Support for Multiple Plug-ins

ApsaraDB RDS for PostgreSQL 12 supports more plug-ins to meet specific business requirements in a variety of verticals and special scenarios. The following describes some interesting and frequently used plug-ins. For more plug-ins, see the list of supported plug-ins in ApsaraDB RDS for PostgreSQL.

roaringbitmap uses RoaringBitmap as a built-in data type and provides abundant functions. The RoaringBitmap algorithm greatly improves bitmap computing performance.
RDKit supports mol data types (describing molecular types) and FP data types (describing molecular fingerprints), as well as functions such as molecular computing and search.
Ganos is a spatio-temporal data engine developed by Alibaba Cloud. It can efficiently store, index, query, analyze, and compute spatiotemporal data.
PASE is a high-performance vector search index plug-in. It uses the mature, stable, and efficient approximate nearest neighbor (ANN) search algorithms IVFFlat and HNSW to implement high-speed vector queries in ApsaraDB RDS for PostgreSQL.
Zhparser is a Chinese word segmentation plug-in that helps implement Chinese full-text search.
oss_fdw is used to load data from Object Storage Service (OSS) to ApsaraDB RDS for PostgreSQL and write data in ApsaraDB RDS for PostgreSQL to OSS.

Summary

ApsaraDB RDS for PostgreSQL 12 significantly improves the query performance of partitioned tables and B-tree indexes. It uses a parameter to control the selection of the execution plan for PREPARE statements and supports built-in SQL and JSON path languages and more plug-ins. Pluggable table access APIs are very important because they will support multiple storage engines in the future. Currently, only the heap table storage engine is supported, which does not have much effect on users.

In addition to the features described in this article, V12 also provides many other features, such as multi-column MCV statistics and inline CTEs. See the relevant documentation or purchase instances from Alibaba Cloud to learn more.

Community

ApsaraDB RDS for PostgreSQL 12: Improved High Concurrency Performance for Large Partitioned Tables

Partitioned Table Performance

Index Enhancement

Support for SQL and JSON Path Languages

Parameter-controlled Plan Selection

Pluggable Table Storage APIs

Support for Multiple Plug-ins

Summary

Read previous post:

Read next post:

ApsaraDB

You may also like

Comments

ApsaraDB

Related Products

PolarDB for PostgreSQL

AnalyticDB for PostgreSQL

ApsaraDB RDS for PostgreSQL

ApsaraDB for HBase