IMCI

Updated at: 2025-03-04 09:12

PolarDB for PostgreSQL supports the vectorized engine feature and IMCIs, enhancing the handling of complex queries.

Overview

The native PostgreSQL execution engine is based on the Volcano model and uses a storage structure based on heap rows, known as the row-oriented storage engine. In this engine, operators such as Scan and Sum process data one row at a time. This can lead to excessive I/O and function call latency in scenarios involving a large amount of data, making it less suitable for analytical and statistical queries.

Compared with the row-oriented storage engine, the vectorized engine of PolarDB for PostgreSQL, also known as the column-oriented storage engine, enhances query performance in two key areas: In-Memory Column Index (IMCI) at the storage layer and vectorized operators at the execution engine layer. These improvements effectively overcome the limitations of the row-oriented storage engine in handling complex queries. In a TPC-H performance test that uses 100 GB test dataset and a 32 cores, 256 GB cluster, the vectorized engine of PolarDB for PostgreSQL delivers a query performance 50 times higher than the row-oriented storage engine. For more information, see Performance metrics.

The vectorized engine complements the row-oriented storage engine, allowing PolarDB for PostgreSQL to maintain high-performance transaction processing and significantly improve the performance of complex queries. It can be used in the following scenarios:

  • HTAP scenarios: For example, you may need to manage daily CRUD operations on a large amount of transactional data while producing real-time transaction reports for the last hour. The PolarDB for PostgreSQL vectorized engine can efficiently manage these workloads and simplify the system architecture, eliminating the need for supplementary systems to facilitate real-time OLAP queries.

  • Slow SQL statements: In the following scenarios involving highly concurrent transactions, slow SQL statements may occur:

    • Full table statistics, such as the Count, Sum, and Average operations.

    • Column operations such as Group By and Order By.

    • Join operations to merge multiple tables.

    • Queries that filter multiple unordered columns or use inflexible composite indexes that prone to invalidity.

  • Multimodal and spatio-temporal queries: For example, you may need to retrieve Key and Value from nested JSON structures or create spatio-temporal heatmaps based on the geographic grid statistics.

How it works

Vectorized engine and IMCI

The vectorized engine of PolarDB for PostgreSQL optimizes both the execution engine and storage layers to better process complex queries.

  • Execution engine

    • The vectorized engine uses CPU SIMD instructions to batch process data. A CPU instruction can process multiple rows of data in parallel. This helps reduce the time for function calls and prevent the Cache Miss issue.

    • The vectorized engine achieves full vectorization of query operators. For example, it vectorizes operators such as Scan, Group By, Order By, Hash Join, Filter, Count, and Sum , so that it can accept batch data inputs and process them using SIMD instructions.

  • Storage

    • The column-oriented storage is used. It is more suitable for vectorized operators than the heap row-oriented storage.

    • The column-oriented storage is implemented by indexes, which is the IMCI. IMCIs are similar to B-tree indexes and GiST indexes but have different storage structures and applicable scenarios. IMCIs are used by the vectorized engine. B-tree indexes and GiST indexes are used by the row-oriented storage engine. A table can contain both IMCIs and other types of indexes to handle various queries. The PolarDB for PostgreSQL optimizer selects the most suitable index based on the query cost.

As shown in the following figure, you can create a B-tree index on column c2 of the t table for point queries (SELECT * FROM t WHERE c2=10), and an IMCI on columns c4 and c5 for statistical queries (SELECT c4, SUM(c5) FROM t GROUP BY c4). The optimizer determines the most efficient index to use based on the cost of the SQL statement.

image

Real-time row-column synchronization

The vectorized engine uses IMCIs to manage column-oriented data. Initially, data is written to a row-oriented table and then synchronized to the IMCI by using the indexing mechanism. This is known as row-column synchronization. The PolarDB for PostgreSQL vectorized engine offers an efficient, real-time, and automated mechanism for row-column synchronization, eliminating the need for additional pipelines or manual updates of column-oriented data.

The row-column synchronization mechanism of the vectorized engine parses WAL logs to retrieve modified data, which is then written to the IMCI asynchronously. This process has minimal impact on the performance and load of row-oriented storage engine (less than 3%). Because the PolarDB for PostgreSQL vectorized engine can coexist with the row-oriented storage engine on the same node, the parsing process of WAL logs are optimized. Despite the asynchronous nature of the row-column conversion process, it can still achieve real-time synchronization with a latency ranging from a few milliseconds to a few seconds, depending on the write load. For more information about how to optimize row-column synchronization, see Improve the real-time performance of IMCIs.

Implementation details

The vectorized engine is deployed on all nodes in a PolarDB for PostgreSQL cluster. Therefore, both the row-oriented storage engine and vectorized engine exist on each compute node. The system makes the following decisions when executing a SQL statement:

  1. Select a compute node for execution.

  2. Select an execution engine on the node.

Select a compute node

When the system executes a SQL statement that is related to IMCI in a PolarDB cluster that contains multiple nodes, it must select a compute node for execution.

  • All statements that modify data, such as DDL and DML statements, are executed on the primary node. The primary node selects the appropriate execution engine.

  • The primary node creates the IMCI and performs real-time synchronization for the IMCI.

  • You can configure the PolarProxy to route SQL statements for read operations to the appropriate node for execution. For more information, see PolarProxy.

Select an execution engine

The compute node selects an execution engine to execute the SQL statement.

  • For DDL statements such as CREATE TABLE and ALTER TABLE, the row-oriented storage engine is used. However, for the CREATE TABLE AS SELECT statement, the system decides whether to use the vectorized engine based on the complexity of the SELECT subquery.

  • For DML statements such as INSERT, UPDATE, and DELETE, the system uses the row-oriented storage engine.

  • For DQL statements such as SELECT, the system decides whether to use the vectorized engine based on the query cost and specific parameters. Typically, a higher query cost indicates a higher possibility of using the vectorized engine. For more information, see Use the vectorized engine. If the vectorized engine fails to execute the SELECT statement, the system uses the row-oriented storage engine to execute the statement again.

image

Benefits

  • High performance

    Compared to the row-oriented storage engine, the vectorized engine significantly increases the SQL query performance. It can execute complex queries more than 100 times faster than the row-oriented storage engine.

  • Low cost

    • To optimize queries, you can create IMCIs for related columns instead of the entire table.

    • IMCIs occupies less storage space than row-oriented indexes. Depending on the data type of a specific column, the IMCI occupies only 10% to 50% of the storage space that is occupied by the row-oriented index.

  • Ease of use

    The vectorized engine is fully compatible with native PostgreSQL and can be used in the same manner.

    • IMCIs can be managed like native PostgreSQL indexes and support statements such as CREATE INDEX and DROP INDEX. No additional statements are required. For more information, see Manage IMCIs.

    • IMCIs are highly compatible with PostgreSQL data types and syntax. You can use IMCIs for acceleration without the need to modify statements.

    • You can configure parameters to specify which SQL statements can use IMCIs, such as all SQL statements, SQL statements in a session, or specific SQL statements with hints. For more information, see Use the vectorized engine.

  • Real-time IMCI data

    • Data consistency between row-oriented data and IMCIs Automatically, eliminating the need for setting up conversion or manual synchronization between row-oriented data and column-oriented data.

    • Data inserted into the row-oriented table is synchronized to the IMCI with a latency ranging from a few milliseconds to a few seconds. You can adjust the data synchronization performance based on business loads. For more information, see Real-time performance of IMCIs.

  • Data consistency

    The following consistency levels of IMCIs and row-oriented data are provided to meet various business needs.

    • Eventual consistency (default): suitable for queries that require heavy write loads but low real-time performance.

    • Strong consistency: returns query results after the IMCI data is consistent with the row-oriented data. For more information, see Query consistency level.

  • Various usage

    • Supports the Prepared Statement syntax.

    • Supports acceleration of SELECT statements within transaction blocks.

      Note

      The SELECT statement must be a read-before-write SQL within the transaction block.

    • Supports partitioned tables and partitioned tables managed by pg_pathman. Partition pruning is also supported. For more information, see Use IMCIs in partitioned tables.

    • Supports acceleration of spatio-temporal multimodal queries.

Billing

You are not charged for the vectorized engine feature of PolarDB for PostgreSQL.

  • On this page (1)
  • Overview
  • How it works
  • Vectorized engine and IMCI
  • Real-time row-column synchronization
  • Implementation details
  • Select a compute node
  • Select an execution engine
  • Benefits
  • Billing
Feedback
phone Contact Us

Chat now with Alibaba Cloud Customer Service to assist you in finding the right products and services to meet your needs.

alicare alicarealicarealicare