All Products
Search
Document Center

PolarDB:Execute the EXPLAIN statement to view elastic parallel query execution plans

Last Updated:Mar 25, 2024

This topic describes how to execute the EXPLAIN statement to view elastic parallel query information in execution plans.

Example table

In the following example, the pq_test table is used to test parallel queries.

  • Schema:

    SHOW CREATE TABLE pq_test\G
    *************************** 1. row ***************************
    Table: pq_test
    Create Table: CREATE TABLE `pq_test` (
      `id` BIGINT(20) NOT NULL AUTO_INCREMENT,
      `help_topic_id` INT(10) UNSIGNED NOT NULL,
      `name` CHAR(64) NOT NULL,
      `help_category_id` SMALLINT(5) UNSIGNED NOT NULL,
      `description` TEXT NOT NULL,
      `example` TEXT NOT NULL,
      `url` TEXT NOT NULL,
      PRIMARY KEY (`id`)
    ) ENGINE=InnoDB AUTO_INCREMENT=21495809 DEFAULT CHARSET=utf8
    1 row in set (0.00 sec)
  • Table size:

    SHOW TABLE STATUS\G
    *************************** 1. row ***************************
               Name: pq_test
             Engine: InnoDB
            Version: 10
         Row_format: Dynamic
               Rows: 20064988
     Avg_row_length: 1898
        Data_length: 38085328896
    Max_data_length: 0
       Index_length: 0
          Data_free: 4194304
     Auto_increment: 21495809
        Create_time: 2019-07-30 01:35:27
        Update_time: NULL
         Check_time: NULL
          Collation: utf8_general_ci
           Checksum: NULL
     Create_options:
            Comment:
    1 row in set (0.02 sec)
  • SQL statement:

    SELECT COUNT(*) FROM pq_test;

EXPLAIN statement

View non-parallel queries

You can use the EXPLAIN statement on non-parallel queries.

Sample statement:

SET max_parallel_degree=0; EXPLAIN SELECT COUNT(*) FROM pq_test\G

Sample result:

*************************** 1. row ***************************
           Id: 1
  Select_type: SIMPLE
        Table: pq_test
  Partitions: NULL
         Type: index
Possible_keys: NULL
          Key: PRIMARY
      Key_len: 8
          Ref: NULL
         Rows: 20064988
     Filtered: 100.00
        Extra: Using index
1 row in set, 1 warning (0.03 sec)

View parallel queries

  • In PolarDB for MySQL 8.0.1, you can use the EXPLAIN statement on parallel queries.

    Sample statement:

    EXPLAIN SELECT COUNT(*) FROM pq_test\G

    Sample result:

    *************************** 1. row ***************************
               Id: 1
      Select_type: SIMPLE
            Table: <gather2>
       Partitions: NULL
             Type: ALL
    Possible_keys: NULL
              Key: NULL
          Key_len: NULL
              Ref: NULL
             Rows: 20064988
         Filtered: 100.00
            Extra: NULL
    *************************** 2. row ***************************
               Id: 2
      Select_type: SIMPLE
            Table: pq_test
       Partitions: NULL
             Type: index
    Possible_keys: NULL
              Key: PRIMARY
          Key_len: 8
              Ref: NULL
             Rows: 10032494
         Filtered: 100.00
            Extra: Parallel scan (2 workers); Using index
    2 rows in set, 1 warning (0.00 sec)
  • In PolarDB for MySQL 8.0.2, you can use the EXPLAIN FORMAT=TREE statement on parallel queries. This returns a more precise description of query handling.

    Sample statement:

    EXPLAIN FORMAT=TREE SELECT COUNT(*) FROM pq_test\G

    Sample result:

    *************************** 1. row ***************************
    EXPLAIN: -> Count rows in gather  (cost=... rows=1)
        -> Gather (slice: 1; workers: 2, nodes: 2)
            -> Parallel index scan on pq_test using PRIMARY, with parallel partitions: 8  (cost=... rows=20064988)

The result indicates the following information:

  • The EXPLAIN output shows that the parallel plan includes a Gather operation. Gather is implemented to gather the partial results that are produced by all workers.

  • In addition, information in the Extra field shows that a parallel scan is performed on the pq_test table by using four workers.

  • The number of workers used and the number of nodes are displayed in the Gather operation. A total of 2 nodes are used and each node has 2 workers.

View parallel queries that include subqueries

  • In PolarDB for MySQL 8.0.1, you can use the EXPLAIN statement on parallel queries that include subqueries.

    Sample statement:

    EXPLAIN SELECT
        o_orderpriority,
        COUNT(*) as order_count
    FROM
        orders
    WHERE
        o_orderdate >= '1994-04-01'
        AND o_orderdate < date_add('1994-04-01', interval '3' month)
        AND exists (
            SELECT
                *
            FROM
                lineitem
            WHERE
                l_orderkey = o_orderkey
                and l_commitdate < l_receiptdate
        )
    GROUP BY
        o_orderpriority
    ORDER BY
        o_orderpriority\G

    Sample result:

    *************************** 1. row ***************************
               id: 1
      select_type: PRIMARY
            table: <gather1.1>
       partitions: NULL
             type: ALL
    possible_keys: NULL
              key: NULL
          key_len: NULL
              ref: NULL
             rows: 1489068
         filtered: 100.00
            Extra: Using temporary; Using filesort
    *************************** 2. row ***************************
               id: 1
      select_type: PRIMARY
            table: orders
       partitions: NULL
             type: range
    possible_keys: i_o_orderdate
              key: i_o_orderdate
          key_len: 3
              ref: NULL
             rows: 568369
         filtered: 100.00
            Extra: Parallel scan (2 workers); Using index condition; Using where; Using temporary
    *************************** 3. row ***************************
               id: 2
      select_type: DEPENDENT SUBQUERY
            table: lineitem
       partitions: NULL
             type: ref
    possible_keys: PRIMARY,i_l_orderkey
              key: PRIMARY
          key_len: 4
              ref: tpch_10.orders.O_ORDERKEY
             rows: 4
         filtered: 33.3
            Extra: Parallel pushdown; Using where
    3 rows in set, 2 warnings (0.01 sec)

    In the preceding result, the select_type of the subquery is DEPENDENT SUBQUERY and the Extra field is set to Parallel pushdown, which indicates that the subquery is sent to workers for parallel processing.

  • In PolarDB for MySQL 8.0.2, you can use the FORMAT=TREE statement on parallel queries that include subqueries. This returns a more precise description of query handling.

    • Example 1

      Sample statement:

      EXPLAIN FORMAT=TREE SELECT
          o_orderpriority,
          COUNT(*) as order_count
      FROM
          orders
      WHERE
          o_orderdate >= '1994-04-01'
          AND o_orderdate < date_add('1994-04-01', interval '3' month)
          AND exists (
              SELECT
                  *
              FROM
                  lineitem
              WHERE
                  l_orderkey = o_orderkey
                  and l_commitdate < l_receiptdate
          )
      GROUP BY
          o_orderpriority
      ORDER BY
          o_orderpriority\G

      Sample result:

      *************************** 1. row ***************************
      EXPLAIN: -> Sort: <temporary>.o_orderpriority
          -> Table scan on <temporary>
              -> Aggregate using temporary table  (cost=1746887.76 rows=1489068)
                  -> Gather (slice: 1; workers: 2)  (cost=1597980.96 rows=1489068)
                      -> Table scan on <temporary>
                          -> Aggregate using temporary table  (cost=1486290.85 rows=744534)
                              -> Filter: exists(select #2)  (cost=772982.43 rows=568369)
                                  -> Parallel index range scan on orders using i_o_orderdate, with index condition: ((orders.O_ORDERDATE >= DATE'1994-04-01') and (orders.O_ORDERDATE < <cache>(('1994-04-01' + interval '3' month)))), with parallel partitions: 89  (cost=772982.43 rows=568369)
                                  -> Select #2 (subquery in condition; dependent)
                                      -> Limit: 1 row(s)
                                          -> Filter: (lineitem.L_COMMITDATE < lineitem.L_RECEIPTDATE)  (cost=1.14 rows=1)
                                              -> Index lookup on lineitem using PRIMARY (L_ORDERKEY=orders.O_ORDERKEY)  (cost=1.14 rows=4)
      
      1 row in set, 1 warning (0.02 sec)
    • Example 2

      Sample statement:

      EXPLAIN FORMAT=TREE
      select
      	ps_partkey,
      	sum(ps_supplycost * ps_availqty) as value
      from
      	partsupp,
      	supplier,
      	nation
      where
      	ps_suppkey = s_suppkey
      	and s_nationkey = n_nationkey
      	and n_name = 'IRAN'
      group by
      	ps_partkey having
      		sum(ps_supplycost * ps_availqty) > (
      			select
      				sum(ps_supplycost * ps_availqty) * 0.0000010000
      			from
      				partsupp,
      				supplier,
      				nation
      			where
      				ps_suppkey = s_suppkey
      				and s_nationkey = n_nationkey
      				and n_name = 'IRAN'
      		)
      order by
      	value desc limit 1;

      Sample result:

      | -> Limit: 1 row(s)  (cost=1408498.03 rows=1)
          -> Gather (merge sort; slice: 1; workers: 256; actual workers: 32)  (cost=1408498.03 rows=256)
              -> Limit: 1 row(s)  (cost=1408404.20 rows=1)
                  -> Sort: <temporary>.value DESC, limit input to 1 row(s) per chunk  (cost=1408404.20 rows=803182)
                      -> Filter: (sum((partsupp.PS_SUPPLYCOST * partsupp.PS_AVAILQTY)) > (select #2))
                          -> Table scan on <temporary>
                              -> Aggregate using temporary table  (cost=1408404.20 rows=803182)
                                  -> Inner hash join (partsupp.PS_SUPPKEY = supplier.S_SUPPKEY)  (cost=829770.18 rows=327820)
                                      -> Parallel table scan on partsupp, with parallel partitions: 2882, partition_keys: 1  (cost=6347528.15 rows=3176912)
                                      -> Hash
                                          -> Broadcast (slice: 2; workers: 256; nodes: 16)  (cost=103382.56 rows=1029632)
                                              -> Nested loop inner join  (cost=409.36 rows=4022)
                                                  -> Filter: (nation.N_NAME = 'IRAN')  (cost=2.29 rows=3)
                                                      -> Table scan on nation  (cost=2.29 rows=25)
                                                  -> Parallel index lookup on supplier using SUPPLIER_FK1 (S_NATIONKEY=nation.N_NATIONKEY), with parallel partitions: 9243  (cost=65.94 rows=1609)
                          -> Select #2 (subquery in condition; run only once; shared access)
                              -> Aggregate: sum(`<collector>`.tmp_field_0)  (cost=825576.85 rows=1)
                                  -> Gather (slice: 1; workers: 256; nodes: 16)  (cost=825564.05 rows=256)
                                      -> Aggregate: sum((partsupp.PS_SUPPLYCOST * partsupp.PS_AVAILQTY))  (cost=825541.20 rows=1)
                                          -> Inner hash join (partsupp.PS_SUPPKEY = supplier.S_SUPPKEY)  (cost=809150.20 rows=327820)
                                              -> Parallel table scan on partsupp, with parallel partitions: 14405  (cost=6147699.35 rows=3176912)
                                              -> Hash
                                                  -> Broadcast (slice: 2; workers: 256; nodes: 16)  (cost=103382.56 rows=1029632)
                                                      -> Nested loop inner join  (cost=409.36 rows=4022)
                                                          -> Filter: (nation.N_NAME = 'IRAN')  (cost=2.29 rows=3)
                                                              -> Table scan on nation  (cost=2.29 rows=25)
                                                          -> Parallel index lookup on supplier using SUPPLIER_FK1 (S_NATIONKEY=nation.N_NATIONKEY), with parallel partitions: 9243  (cost=65.94 rows=1609)

      In the preceding result, the select_type of the subquery is SUBQUERY and the Select #2 field is set to Shared access. The PolarDB optimizer runs the parallel subquery in advance and the result is shared among workers. In this query plan, the query block in the outer layer of the subquery cannot execute multi-node elastic parallel query due to policy restrictions. Multiple parallel workers for the outer query can execute single-node elastic parallel query within the node where the query is delivered.

View the execution time of various computing tasks

In PolarDB for MySQL 8.0.2, you can execute the EXPLAIN ANALYZE statement to view the execution time of various computing tasks when parallel queries are used.

  • Sample statement:

    EXPLAIN ANALYZE select
        c_name,
        c_custkey,
        o_orderkey,
        o_orderdate,
        o_totalprice,
        sum(l_quantity)
    from
        customer,
        orders,
        lineitem
    where
        o_orderkey in (
            select
                l_orderkey
            from
                lineitem
            group by
                l_orderkey having
                    sum(l_quantity) > 313
        )
        and c_custkey = o_custkey
        and o_orderkey = l_orderkey
    group by
        c_name,
        c_custkey,
        o_orderkey,
        o_orderdate,
        o_totalprice
    order by
        o_totalprice desc,
        o_orderdate
    LIMIT 100;
  • Sample result:

    | -> Limit: 100 row(s)  (cost=14.62 rows=1) (actual time=1.252..1.252 rows=0 loops=1)
        -> Gather (merge sort; slice: 1; workers: 1; nodes: 2)  (cost=14.62 rows=1) (actual time=1.250..1.250 rows=0 loops=1)
            -> Limit: 100 row(s)  (cost=4.52 rows=1) (actual time=0.084,0.084,0.084..0.084,0.084,0.084 rows=0,0,0 loops=1,1,1)
                -> Sort: <temporary>.O_TOTALPRICE DESC, <temporary>.O_ORDERDATE, limit input to 100 row(s) per chunk  (cost=4.52 rows=1) (actual time=0.083,0.083,0.083..0.083,0.083,0.083 rows=0,0,0 loops=1,1,1)
                    -> Table scan on <temporary> (actual time=0.070,0.070,0.070..0.070,0.070,0.070 rows=0,0,0 loops=1,1,1)
                        -> Aggregate using temporary table  (cost=4.52 rows=1) (actual time=0.001,0.001,0.001..0.001,0.001,0.001 rows=0,0,0 loops=1,1,1)
                            -> Nested loop inner join  (cost=2.86 rows=4) (actual time=0.039,0.039,0.039..0.039,0.039,0.039 rows=0,0,0 loops=1,1,1)
                                -> Nested loop inner join  (cost=1.45 rows=1) (actual time=0.037,0.037,0.037..0.037,0.037,0.037 rows=0,0,0 loops=1,1,1)
                                    -> Parallel table scan on customer, with parallel partitions: 1, partition_keys: 1  (cost=0.35 rows=1) (actual time=0.036,0.036,0.036..0.036,0.036,0.036 rows=0,0,0 loops=1,1,1)
                                    -> Filter: <in_optimizer>(orders.O_ORDERKEY,<exists>(select #2))  (cost=1.10 rows=1)
                                        -> Index lookup on orders using ORDERS_FK1 (O_CUSTKEY=customer.C_CUSTKEY)  (cost=1.10 rows=1)
                                        -> Select #2 (subquery in condition; dependent)
                                            -> Limit: 1 row(s)
                                                -> Filter: ((sum(lineitem.L_QUANTITY) > 313) and (<cache>(orders.O_ORDERKEY) = <ref_null_helper>(lineitem.L_ORDERKEY)))
                                                    -> Group aggregate: sum(lineitem.L_QUANTITY)
                                                        -> Index scan on lineitem using PRIMARY  (cost=41554048.20 rows=380071042)
                                -> Index lookup on lineitem using PRIMARY (L_ORDERKEY=orders.O_ORDERKEY)  (cost=1.41 rows=4)

If the EXPLAIN statement contains the ANALYZE keyword, the execution time of various computing tasks when parallel queries are used can be displayed. For parallel queries, the computing time of each worker in each operator is also recorded, including the maximum, minimum, and average computing time of each operator among multiple workers.