This topic describes the multi-node massively parallel processing (MPP) capability provided by the In-Memory Column Index (IMCI) feature of PolarDB.
Background information
PolarDB provides IMCIs as a hybrid transaction/analytical processing (HTAP) solution. With the increase of the data volume involved in queries, the complexity of queries, and the number of queries for Object Storage Service (OSS) tables, a single read-only column store node becomes incapable of handling queries for massive data. To address this issue, IMCIs provide the multi-node MPP and resource scaling capabilities.
Architecture
The following figure shows the architecture of IMCI-based multi-node MPP.
IMCI-based multi-node MPP is a multi-node execution group that consists of multiple read-only column store nodes and implements the multi-node parallel execution of IMCIs. When query workloads change, you can quickly increase or decrease the number of read-only column store nodes to balance query performance and computing costs.
Multi-node MPP can handle a wide range of analytical processing (AP) workloads. The IMCI optimizer can accurately determine the transactional processing (TP) method of SQL statements, select single-node execution or multi-node parallel execution for the SQL statements, and then distribute the SQL statements to desired column store nodes based on your business requirements.
Scenarios
You want to use the scaling capability provided by multi-node MPP to increase the CPU resources and IOPS used in queries to reduce latency.
You want to implement in-memory query handling by distributing queries to multiple nodes for higher throughput.
Supported versions
The cluster that you want to manage must be a PolarDB for MySQL 8.0.1 cluster of Enterprise Edition whose revision version is 8.0.1.1.38 or later.
Usage notes
You can add multiple read-only column store nodes to a cluster. For more information, see Add a read-only column store node.
For more information about how to enable and use multi-node MPP, join the DingTalk group (ID: 27520023189) for technical support.
Best practices
Partition keys
In PolarDB, a level-1 or level-2 partition uses the HASH or KEY partitioning strategies. During the execution of multi-node MPP, the partition uses the share-nothing strategy. The share-nothing strategy indicates that each partition is processed by only one node,. This results in the following advantages:
Improved data caching capability: Each node needs only to process its specific partition, which enables each node to use the local memory to cache data in a more efficient manner.
Optimized query performance: When you perform
JOIN
operations andGROUP BY
queries based on a partition key, you need only to locally process data on each node, which significantly reduces the data transmission volume among multiple nodes.
For the preceding advantages, we recommend that you configure level-1 or level-2 partitions of the HASH or KEY type for a table based on the common partition key JOIN
and GROUP BY
when you create the table. In addition, all HASH or KEY tables must contain the same number of partitions. If the number of partitions of two tables is different, JOIN
operations between the tables cannot be locally processed.
We recommend that you specify a large prime number as the number of partitions. This can reduce the possibility of uneven data distribution and further improve query performance and efficient utilization of system resources.
Sort keys
To filter massive data, you can create RANGE partitions or add sort keys to the column store. We recommend that you create RANGE partitions and add sort keys for columns that can be filtered by predicate in the WHERE clause. For example, you specify the following query condition: WHERE date > '2024-10-01' AND date < '2024-10-07' AND customer_id = 'X231'
. In this case, you can create RANGE partitions for date
columns and add IMCI sort keys for customer_id
columns. This can significantly reduce the amount of data that needs to be processed and therefore improve query performance. You can properly configure partitions and sort keys for IMCIs to improve data processing efficiency and increase the response speed of queries for massive data. For more information, see Overview and Configure sort keys for IMCIs.
Performance tests
For more information about the performance test results of multi-node MPP, see IMCI performance.
Related operations
Determine whether multi-node MPP can be used to execute the current SQL statement.
You can add the
SET_VAR(imci_plan_use_mpp=forced)
hint to an SQL statement to view the execution plan of the SQL statement. This way, you can determine whether multi-node MPP can be used to execute the SQL statement. Example:EXPLAIN SELECT /*+ SET_VAR(imci_plan_use_mpp=forced) */ COUNT(*) FROM nation;
Sample result:
+----+----------------------------+--------+---------------------------------------------------------------------------------+ | ID | Operator | Name | Extra Info | +----+----------------------------+--------+---------------------------------------------------------------------------------+ | 1 | Select Statement | | IMCI Execution Plan (max_dop = 11, max_query_mem = 37438953472) | | 2 | └─Compute Scalar | | | | 3 | └─Aggregation | | | | 4 | └─Consume | | Consume ProducerPipeId: 1 | | 5 | └─Exchange | | PipeId: 1, Consumers: 23377031, Producers: 23377031,23377032, Part Type: Gather | | 6 | └─Aggregation | | | | 7 | └─Table Scan | nation | | +----+----------------------------+--------+---------------------------------------------------------------------------------
If the execution plan contains the Exchange operator, multi-node MPP can be used to execute the SQL statement.
Determine whether an SQL statement will be executed in parallel by using multi-node MPP.
After you determine that multi-node MPP can be used to execute an SQL statement, you can view the execution plan of the SQL statement to check whether multi-node MPP will be used in the execution of the SQL statement. If the execution plan contains the Exchange operator, the SQL statement will be executed in parallel by using multi-node MPP.