All Products
Search
Document Center

PolarDB:Usage

Last Updated:Dec 19, 2024

This topic describes how to use the Elastic Parallel Query (ePQ) feature.

Usage notes

  • Make sure the version of your cluster is supported by the ePQ feature.

    • Single-node elastic parallel query:

      • Database engine: 8.0.1 whose revision version is 8.0.1.0.5 or later.

      • Database edition: Enterprise Edition.

    • Single-node elastic parallel query:

      • Database engine: 8.0.2 whose revision version is 8.0.2.1.4.1 or later.

      • Database edition: Enterprise Edition.

    • Multi-node elastic parallel query:

      • Database engine: 8.0.2 whose revision version is 8.0.2.2.6 or later.

      • Database edition: Enterprise Edition.

  • Make sure that you are aware of the limits and compatibility issues of the ePQ feature. For more information, see Limits and compatibility issues.

  • Both read-only nodes and the primary node support the ePQ feature. ePQ is disabled on the primary node by default. You can enable this feature based on your requirements. For more information, see Adaptive DOP settings.

Enable or disable ePQ

Enable ePQ

Log on to the PolarDB console. In the Database Connections section of the Basic Information page, select an endpoint and click Configure. In the Modify Endpoint Settings dialog, select On for Parallel Query and configure the value of Parallelism.

For more information, click here.

  1. Log on to the PolarDB console.

  2. In the left-side navigation pane, click Clusters.

  3. In the upper-left corner, select the region in which the cluster is deployed.

  4. Find the cluster and click its ID.

  5. In the Database Connections section of the Basic Information page, select an endpoint and click Configure. image

  6. In the Modify Endpoint Settings dialog, select On for Parallel Query and configure the value of Parallelism. For more information about other parameters, see Configure PolarProxy. imageimage

Recommended settings and description

  • Parallelism the maximum number of worker threads that can run simultaneously in a single compute node for a single query. The maximum number of worker threads that can run simultaneously in a single query is the product of the degree of parallelism (DOP) and the number of nodes.

  • Increase the value of Parallelism in small increments. We recommend that you do not use a value that exceeds a quarter of the total number of CPU cores.. For example, you can set the value of Parallelism to 2. You can check the CPU utilization the next day. If the CPU utilization is low, increase the value. If the CPU utilization is high, do not increase the value.

  • ePQ is enabled only for connections that are established after you turn on the feature.

  • When you enable ePQ, set the innodb_adaptive_hash_index parameter to OFF.

    Note
    • If the innodb_adaptive_hash_index parameter is set to ON, the performance of parallel query is degraded.

    • The default value of the thread_pool_enabled parameter is OFF. If you have not modified this parameter, you can ignore this setting.

  • DOP can be configured in the console or by using the cluster parameter:

    • You can configure DOP in the console or by using the max_parallel_degree parameter. The settings in the console prevail. We recommend that you use the console to enable ePQ.

    • If you do not enable ePQ in the console but set max_parallel_degree to a value greater than 0, ePQ is enabled.

      Note

      The max_parallel_degree parameter specifies the maximum DOP for an individual query, which is the maximum number of workers that are used to run the query in parallel. For more information, see Configuration parameters.

Disable ePQ

Log on to the PolarDB console. In the Database Connections section of the Basic Information page, select an endpoint and click Configure. In the Modify Endpoint Settings dialog, select Off for Parallel Query.

For more information, click here.

  1. Log on to the PolarDB console.

  2. In the left-side navigation pane, click Clusters.

  3. In the upper-left corner, select the region in which the cluster is deployed.

  4. Find the cluster and click its ID.

  5. In the Database Connections section of the Basic Information page, select an endpoint and click Configure. image

  6. In the Modify Endpoint Settings dialog, select Off for Parallel Query. image

Note:

  • ePQ is disabled only for connections that are established after you turn off the feature.

  • After you disable ePQ in the console, confirm that the max_parallel_degree parameter is set to 0 to ensure that ePQ is completely disabled.

    Note

    The max_parallel_degree parameter specifies the maximum DOP for an individual query, which is the maximum number of workers that are used to run the query in parallel. For more information, see Configuration parameters.

Configure ePQ

You can configure a set of parameters to manage the ePQ feature. The following table lists some key parameters. For more information, see DOP policies.

Type

Description

Adaptive DOP settings

You can set the DOP for a single query by configuring the parallel_degree_policy parameter. Valid values:

  • TYPICAL: PolarDB sets the DOP to the same value as max_parallel_degree, regardless of database loads such as CPU utilization.

    Note

    The max_parallel_degree parameter specifies the maximum DOP for an individual query, which is the maximum number of workers that are used to run the query in parallel. For more information, see Configuration parameters.

  • AUTO: PolarDB enables or disables the elastic parallel query feature based on database loads such as CPU utilization, and selects the DOP based on the query cost.

  • REPLICA_AUTO (default): Only read-only nodes determine whether to enable the elastic parallel query feature based on database loads such as CPU utilization, and select the DOP based on the query cost. The primary node does not perform elastic parallel query.

Specify whether the optimizer can enable ePQ

PolarDB provides two thresholds to specify whether the optimizer can enable the elastic parallel query feature. If an SQL statement meets one of the following conditions, the optimizer enables the elastic parallel query feature.

  • records_threshold_for_parallelism

    If the number of scanned rows exceeds the value of records_threshold_for_parallelism, the optimizer enables ePQ. The default value is 10000. If the majority of your queries are simple and involve only a small amount of data, we recommend that you set the threshold to at least 2000.

    Note

    The number of scanned rows is estimated based on the statistics of tables. This number can be different from the actual number of rows.

  • cost_threshold_for_parallelism

    If the cost of sequential queries exceeds the value of the cost_threshold_for_parallelism parameter, the optimizer enables ePQ. The default value is 50000.

Specify whether to enable multi-node ePQ

The PolarDB two thresholds to specify whether to enable multi-node elastic parallel query. If an SQL statement meets one of the following conditions, multi-node elastic parallel query can be used.

  • records_threshold_for_mpp

    If the number of scanned rows of a table involved in a query statement exceeds this threshold, the optimizer considers using multi-node elastic parallel query and schedules parallel tasks to multiple nodes for computing. The default value is N times the records_threshold_for_parallelism value. The N value is the number of nodes involved in the current cluster endpoint.

  • cost_threshold_for_mpp

    If the sequential execution cost of a query statement exceeds this threshold, the optimizer considers using multi-node elastic parallel query. The default value is N times the cost_threshold_for_parallelism value. The N value is the number of nodes involved in the current cluster endpoint.

Use hints to control ePQ

You can optimize an individual SQL statement by using hints to specify whether the optimizer enables ePQ. You can also use hints to configure DOP and the table that needs to enable ePQ. For example, if ePQ is disabled by the system, you can use hints to accelerate a slow SQL query that is frequently executed. For more information, see Parallel hints.

Can I use hints to enable the elastic parallel query feature for the PolarDB cluster endpoint in read and write mode?

Yes. We recommend that you enable ePQ in the PolarDB console. If you want to enable ePQ for a SQL statement, you can set the query DOP to n by using hint /*+ PARALLEL(n) */ or /*+ SET_VAR(max_parallel_degree=n) */. The two hints are different:

  • When the /*+ PARALLEL(n) */ hint is used, ePQ is forcibly enabled regardless of whether queries are routed to the primary node or how much data is involved.

  • When the /*+ SET_VAR(max_parallel_degree=n) */ hint is used, whether ePQ is enabled depends on whether queries are routed to the primary node, and the query cost and rows of data evaluated by the optimizer.

Are elastic parallel query parameters that I specify by using hints in a SQL statement in conflict with those that I configure in the console?

No. The parameters specified by hints are valid for the current SQL statement and have higher priority than the global parameters configured in the console.

Limits and compatibility issues

Limits

PolarDB continuously improves the parallel query feature. This feature cannot improve performance in the following situations:

  • Queries on non-Innodb tables.

  • Queries that use full-text indexes.

  • Stored procedures, which must be executed by the leader.

  • Tables that are scanned by using INDEX-MERGE joins.

  • Queries in serializable transactions.

  • INSERT ... SELECT and REPLACE ... SELECT statements in transactions when the isolation level is Repeatable Read.

Compatibility issues

  • The number of error messages may increase.

    Assume that you use serial execution for statements and error messages occur. If you use parallel execution for the same statements, more error messages may be returned.

  • Precision issues may occur.

    For serial execution, no intermediate results need to be stored. For parallel execution, you may need to store intermediate results. If the intermediate results are floating-point numbers, the precision of the floating-point numbers may be different from that of parallel execution. This may result in differences in the results of parallel execution and serial execution.

  • The size of network packets or intermediate results exceeds the upper limit that is specified by the max_allowed_packet parameter.

    For serial execution, no intermediate results are produced. For parallel execution, intermediate results may be produced. If the size of the intermediate results exceeds the upper limit that is specified by the max_allowed_packet parameter, an error message may be returned. To resolve this issue, you can increase the value of the max_allowed_packet parameter. For more information about how to modify parameters, see Specify cluster and node parameters.

  • The order of the result sets for parallel execution may be different from that for serial execution.

    If the SELECT ... LIMIT n statement that does not contain the ORDER BY keyword is executed in parallel, the order of the returned result sets may be inconsistent with the execution order. Multiple workers are executed simultaneously. Therefore, the execution speed of the workers is uncertain during each execution. After the leader retrieves enough data, the final results are returned. As a result, the order of the returned result sets may be inconsistent with the execution order.

  • The number of data records that have row locks is increased.

    If the SELECT ... FROM ... FOR SHARE statement is executed in parallel, InnoDB locks each row of data that is accessed. Therefore, the number of records that have row locks may be more than that in non-parallel execution cases. This phenomenon is normal in parallel execution cases.

FAQ

Are all queries accelerated after I enable the elastic parallel query feature?

Not all queries are executed in parallel. In the following cases, queries are not executed in parallel:

  • Parallel queries are not supported. For more information, see Limits.

  • The rows of scanned data or costs fall short of the threshold for parallel queries. The records_threshold_for_parallelism and cost_threshold_for_parallelism parameters define the record and cost thresholds for parallel queries.

  • If the cluster has too high workloads, queries are no longer executed in parallel. For more information, see DOP control based on system resource usage.

Why is the hint syntax not working after ePQ is enabled?

The issue may be caused by the following reasons::

  • ePQ is enabled or disabled only for connections that are established after you turn on or off the feature. Check whether you have reconnected to the cluster after you enable ePQ.

  • ePQ is disabled on the primary node by default. Check whether you connect to the database by using the primary endpoint or the cluster endpoint. We recommend that you use the cluster endpoint to connect to the database. For more information, see Adaptive DOP settings.