Overview
The native PostgreSQL execution engine is based on the Volcano model and uses a storage structure based on heap rows, known as the row-oriented storage engine. In this engine, operators such as Scan
and Sum
process data one row at a time. This can lead to excessive I/O and function call latency in scenarios involving a large amount of data, making it less suitable for analytical and statistical queries.
Compared with the row-oriented storage engine, the vectorized engine of PolarDB for PostgreSQL, also known as the column-oriented storage engine, enhances query performance in two key areas: In-Memory Column Index (IMCI) at the storage layer and vectorized operators at the execution engine layer. These improvements effectively overcome the limitations of the row-oriented storage engine in handling complex queries. In a TPC-H performance test that uses 100 GB test dataset and a 32 cores, 256 GB cluster, the vectorized engine of PolarDB for PostgreSQL delivers a query performance 50 times higher than the row-oriented storage engine. For more information, see Performance metrics.
The vectorized engine complements the row-oriented storage engine, allowing PolarDB for PostgreSQL to maintain high-performance transaction processing and significantly improve the performance of complex queries. It can be used in the following scenarios:
HTAP scenarios: For example, you may need to manage daily CRUD operations on a large amount of transactional data while producing real-time transaction reports for the last hour. The PolarDB for PostgreSQL vectorized engine can efficiently manage these workloads and simplify the system architecture, eliminating the need for supplementary systems to facilitate real-time OLAP queries.
Slow SQL statements: In the following scenarios involving highly concurrent transactions, slow SQL statements may occur:
Full table statistics, such as the Count
, Sum
, and Average
operations.
Column operations such as Group By
and Order By
.
Join
operations to merge multiple tables.
Queries that filter multiple unordered columns or use inflexible composite indexes that prone to invalidity.
Multimodal and spatio-temporal queries: For example, you may need to retrieve Key
and Value
from nested JSON structures or create spatio-temporal heatmaps based on the geographic grid statistics.
How it works
Vectorized engine and IMCI
The vectorized engine of PolarDB for PostgreSQL optimizes both the execution engine and storage layers to better process complex queries.
Execution engine
The vectorized engine uses CPU SIMD instructions to batch process data. A CPU instruction can process multiple rows of data in parallel. This helps reduce the time for function calls and prevent the Cache Miss issue.
The vectorized engine achieves full vectorization of query operators. For example, it vectorizes operators such as Scan
, Group By
, Order By
, Hash Join
, Filter
, Count
, and Sum
, so that it can accept batch data inputs and process them using SIMD instructions.
Storage
The column-oriented storage is used. It is more suitable for vectorized operators than the heap row-oriented storage.
The column-oriented storage is implemented by indexes, which is the IMCI. IMCIs are similar to B-tree indexes and GiST indexes but have different storage structures and applicable scenarios. IMCIs are used by the vectorized engine. B-tree indexes and GiST indexes are used by the row-oriented storage engine. A table can contain both IMCIs and other types of indexes to handle various queries. The PolarDB for PostgreSQL optimizer selects the most suitable index based on the query cost.
As shown in the following figure, you can create a B-tree index on column c2 of the t table for point queries (SELECT * FROM t WHERE c2=10
), and an IMCI on columns c4 and c5 for statistical queries (SELECT c4, SUM(c5) FROM t GROUP BY c4
). The optimizer determines the most efficient index to use based on the cost of the SQL statement.

Real-time row-column synchronization
The vectorized engine uses IMCIs to manage column-oriented data. Initially, data is written to a row-oriented table and then synchronized to the IMCI by using the indexing mechanism. This is known as row-column synchronization. The PolarDB for PostgreSQL vectorized engine offers an efficient, real-time, and automated mechanism for row-column synchronization, eliminating the need for additional pipelines or manual updates of column-oriented data.
The row-column synchronization mechanism of the vectorized engine parses WAL logs to retrieve modified data, which is then written to the IMCI asynchronously. This process has minimal impact on the performance and load of row-oriented storage engine (less than 3%). Because the PolarDB for PostgreSQL vectorized engine can coexist with the row-oriented storage engine on the same node, the parsing process of WAL logs are optimized. Despite the asynchronous nature of the row-column conversion process, it can still achieve real-time synchronization with a latency ranging from a few milliseconds to a few seconds, depending on the write load. For more information about how to optimize row-column synchronization, see Improve the real-time performance of IMCIs.
Implementation details
The vectorized engine is deployed on all nodes in a PolarDB for PostgreSQL cluster. Therefore, both the row-oriented storage engine and vectorized engine exist on each compute node. The system makes the following decisions when executing a SQL statement:
Select a compute node for execution.
Select an execution engine on the node.
Select a compute node
When the system executes a SQL statement that is related to IMCI in a PolarDB cluster that contains multiple nodes, it must select a compute node for execution.
All statements that modify data, such as DDL and DML statements, are executed on the primary node. The primary node selects the appropriate execution engine.
The primary node creates the IMCI and performs real-time synchronization for the IMCI.
You can configure the PolarProxy to route SQL statements for read operations to the appropriate node for execution. For more information, see PolarProxy.
Select an execution engine
The compute node selects an execution engine to execute the SQL statement.
For DDL statements such as CREATE TABLE
and ALTER TABLE
, the row-oriented storage engine is used. However, for the CREATE TABLE AS SELECT
statement, the system decides whether to use the vectorized engine based on the complexity of the SELECT
subquery.
For DML statements such as INSERT
, UPDATE
, and DELETE
, the system uses the row-oriented storage engine.
For DQL statements such as SELECT
, the system decides whether to use the vectorized engine based on the query cost and specific parameters. Typically, a higher query cost indicates a higher possibility of using the vectorized engine. For more information, see Use the vectorized engine. If the vectorized engine fails to execute the SELECT
statement, the system uses the row-oriented storage engine to execute the statement again.

Benefits
High performance
Compared to the row-oriented storage engine, the vectorized engine significantly increases the SQL query performance. It can execute complex queries more than 100 times faster than the row-oriented storage engine.
Low cost
To optimize queries, you can create IMCIs for related columns instead of the entire table.
IMCIs occupies less storage space than row-oriented indexes. Depending on the data type of a specific column, the IMCI occupies only 10% to 50% of the storage space that is occupied by the row-oriented index.
Ease of use
The vectorized engine is fully compatible with native PostgreSQL and can be used in the same manner.
IMCIs can be managed like native PostgreSQL indexes and support statements such as CREATE INDEX
and DROP INDEX
. No additional statements are required. For more information, see Manage IMCIs.
IMCIs are highly compatible with PostgreSQL data types and syntax. You can use IMCIs for acceleration without the need to modify statements.
You can configure parameters to specify which SQL statements can use IMCIs, such as all SQL statements, SQL statements in a session, or specific SQL statements with hints. For more information, see Use the vectorized engine.
Real-time IMCI data
Data consistency between row-oriented data and IMCIs Automatically, eliminating the need for setting up conversion or manual synchronization between row-oriented data and column-oriented data.
Data inserted into the row-oriented table is synchronized to the IMCI with a latency ranging from a few milliseconds to a few seconds. You can adjust the data synchronization performance based on business loads. For more information, see Real-time performance of IMCIs.
Data consistency
The following consistency levels of IMCIs and row-oriented data are provided to meet various business needs.
Eventual consistency (default): suitable for queries that require heavy write loads but low real-time performance.
Strong consistency: returns query results after the IMCI data is consistent with the row-oriented data. For more information, see Query consistency level.
Various usage
Supports the Prepared Statement
syntax.
Supports acceleration of SELECT statements within transaction blocks.
Note
The SELECT statement must be a read-before-write SQL within the transaction block.
Supports partitioned tables and partitioned tables managed by pg_pathman. Partition pruning is also supported. For more information, see Use IMCIs in partitioned tables.
Supports acceleration of spatio-temporal multimodal queries.
Billing
You are not charged for the vectorized engine feature of PolarDB for PostgreSQL.