How to execute the EXPLAIN statement to view elastic parallel query information in execution plans - PolarDB

This topic describes how to execute the EXPLAIN statement to view elastic parallel query information in execution plans.

Example table

In the following example, the pq_test table is used to test parallel queries.

Schema:

SHOW CREATE TABLE pq_test\G
*************************** 1. row ***************************
Table: pq_test
Create Table: CREATE TABLE `pq_test` (
  `id` BIGINT(20) NOT NULL AUTO_INCREMENT,
  `help_topic_id` INT(10) UNSIGNED NOT NULL,
  `name` CHAR(64) NOT NULL,
  `help_category_id` SMALLINT(5) UNSIGNED NOT NULL,
  `description` TEXT NOT NULL,
  `example` TEXT NOT NULL,
  `url` TEXT NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=21495809 DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

Table size:

SHOW TABLE STATUS\G
*************************** 1. row ***************************
           Name: pq_test
         Engine: InnoDB
        Version: 10
     Row_format: Dynamic
           Rows: 20064988
 Avg_row_length: 1898
    Data_length: 38085328896
Max_data_length: 0
   Index_length: 0
      Data_free: 4194304
 Auto_increment: 21495809
    Create_time: 2019-07-30 01:35:27
    Update_time: NULL
     Check_time: NULL
      Collation: utf8_general_ci
       Checksum: NULL
 Create_options:
        Comment:
1 row in set (0.02 sec)

SQL statement:
```
SELECT COUNT(*) FROM pq_test;
```

EXPLAIN statement

View non-parallel queries

You can use the EXPLAIN statement on non-parallel queries.

Sample statement:

SET max_parallel_degree=0; EXPLAIN SELECT COUNT(*) FROM pq_test\G

Sample result:

*************************** 1. row ***************************
           Id: 1
  Select_type: SIMPLE
        Table: pq_test
  Partitions: NULL
         Type: index
Possible_keys: NULL
          Key: PRIMARY
      Key_len: 8
          Ref: NULL
         Rows: 20064988
     Filtered: 100.00
        Extra: Using index
1 row in set, 1 warning (0.03 sec)

View parallel queries

In PolarDB for MySQL 8.0.1, you can use the EXPLAIN statement on parallel queries.

Sample statement:

EXPLAIN SELECT COUNT(*) FROM pq_test\G

Sample result:

*************************** 1. row ***************************
           Id: 1
  Select_type: SIMPLE
        Table: &lt;gather2&gt;
   Partitions: NULL
         Type: ALL
Possible_keys: NULL
          Key: NULL
      Key_len: NULL
          Ref: NULL
         Rows: 20064988
     Filtered: 100.00
        Extra: NULL
*************************** 2. row ***************************
           Id: 2
  Select_type: SIMPLE
        Table: pq_test
   Partitions: NULL
         Type: index
Possible_keys: NULL
          Key: PRIMARY
      Key_len: 8
          Ref: NULL
         Rows: 10032494
     Filtered: 100.00
        Extra: Parallel scan (2 workers); Using index
2 rows in set, 1 warning (0.00 sec)

In PolarDB for MySQL 8.0.2, you can use the EXPLAIN FORMAT=TREE statement on parallel queries. This returns a more precise description of query handling.

Sample statement:

EXPLAIN FORMAT=TREE SELECT COUNT(*) FROM pq_test\G

Sample result:

*************************** 1. row ***************************
EXPLAIN: -> Count rows in gather  (cost=... rows=1)
    -> Gather (slice: 1; workers: 2, nodes: 2)
        -> Parallel index scan on pq_test using PRIMARY, with parallel partitions: 8  (cost=... rows=20064988)

The result indicates the following information:

The EXPLAIN output shows that the parallel plan includes a Gather operation. Gather is implemented to gather the partial results that are produced by all workers.
In addition, information in the Extra field shows that a parallel scan is performed on the pq_test table by using four workers.
The number of workers used and the number of nodes are displayed in the Gather operation. A total of 2 nodes are used and each node has 2 workers.

View parallel queries that include subqueries

In PolarDB for MySQL 8.0.1, you can use the EXPLAIN statement on parallel queries that include subqueries.

Sample statement:

EXPLAIN SELECT
    o_orderpriority,
    COUNT(*) as order_count
FROM
    orders
WHERE
    o_orderdate >= '1994-04-01'
    AND o_orderdate < date_add('1994-04-01', interval '3' month)
    AND exists (
        SELECT
            *
        FROM
            lineitem
        WHERE
            l_orderkey = o_orderkey
            and l_commitdate < l_receiptdate
    )
GROUP BY
    o_orderpriority
ORDER BY
    o_orderpriority\G

Sample result:

*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: <gather1.1>
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 1489068
     filtered: 100.00
        Extra: Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: orders
   partitions: NULL
         type: range
possible_keys: i_o_orderdate
          key: i_o_orderdate
      key_len: 3
          ref: NULL
         rows: 568369
     filtered: 100.00
        Extra: Parallel scan (2 workers); Using index condition; Using where; Using temporary
*************************** 3. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: lineitem
   partitions: NULL
         type: ref
possible_keys: PRIMARY,i_l_orderkey
          key: PRIMARY
      key_len: 4
          ref: tpch_10.orders.O_ORDERKEY
         rows: 4
     filtered: 33.3
        Extra: Parallel pushdown; Using where
3 rows in set, 2 warnings (0.01 sec)

In the preceding result, the select_type of the subquery is DEPENDENT SUBQUERY and the Extra field is set to Parallel pushdown, which indicates that the subquery is sent to workers for parallel processing.

In PolarDB for MySQL 8.0.2, you can use the FORMAT=TREE statement on parallel queries that include subqueries. This returns a more precise description of query handling.

Example 1

Sample statement:

EXPLAIN FORMAT=TREE SELECT
    o_orderpriority,
    COUNT(*) as order_count
FROM
    orders
WHERE
    o_orderdate >= '1994-04-01'
    AND o_orderdate < date_add('1994-04-01', interval '3' month)
    AND exists (
        SELECT
            *
        FROM
            lineitem
        WHERE
            l_orderkey = o_orderkey
            and l_commitdate < l_receiptdate
    )
GROUP BY
    o_orderpriority
ORDER BY
    o_orderpriority\G

Sample result:

*************************** 1. row ***************************
EXPLAIN: -> Sort: <temporary>.o_orderpriority
    -> Table scan on <temporary>
        -> Aggregate using temporary table  (cost=1746887.76 rows=1489068)
            -> Gather (slice: 1; workers: 2)  (cost=1597980.96 rows=1489068)
                -> Table scan on <temporary>
                    -> Aggregate using temporary table  (cost=1486290.85 rows=744534)
                        -> Filter: exists(select #2)  (cost=772982.43 rows=568369)
                            -> Parallel index range scan on orders using i_o_orderdate, with index condition: ((orders.O_ORDERDATE >= DATE'1994-04-01') and (orders.O_ORDERDATE < <cache>(('1994-04-01' + interval '3' month)))), with parallel partitions: 89  (cost=772982.43 rows=568369)
                            -> Select #2 (subquery in condition; dependent)
                                -> Limit: 1 row(s)
                                    -> Filter: (lineitem.L_COMMITDATE < lineitem.L_RECEIPTDATE)  (cost=1.14 rows=1)
                                        -> Index lookup on lineitem using PRIMARY (L_ORDERKEY=orders.O_ORDERKEY)  (cost=1.14 rows=4)

1 row in set, 1 warning (0.02 sec)

Example 2

Sample statement:

EXPLAIN FORMAT=TREE
select
	ps_partkey,
	sum(ps_supplycost * ps_availqty) as value
from
	partsupp,
	supplier,
	nation
where
	ps_suppkey = s_suppkey
	and s_nationkey = n_nationkey
	and n_name = 'IRAN'
group by
	ps_partkey having
		sum(ps_supplycost * ps_availqty) > (
			select
				sum(ps_supplycost * ps_availqty) * 0.0000010000
			from
				partsupp,
				supplier,
				nation
			where
				ps_suppkey = s_suppkey
				and s_nationkey = n_nationkey
				and n_name = 'IRAN'
		)
order by
	value desc limit 1;

Sample result:

| -> Limit: 1 row(s)  (cost=1408498.03 rows=1)
    -> Gather (merge sort; slice: 1; workers: 256; actual workers: 32)  (cost=1408498.03 rows=256)
        -> Limit: 1 row(s)  (cost=1408404.20 rows=1)
            -> Sort: <temporary>.value DESC, limit input to 1 row(s) per chunk  (cost=1408404.20 rows=803182)
                -> Filter: (sum((partsupp.PS_SUPPLYCOST * partsupp.PS_AVAILQTY)) > (select #2))
                    -> Table scan on <temporary>
                        -> Aggregate using temporary table  (cost=1408404.20 rows=803182)
                            -> Inner hash join (partsupp.PS_SUPPKEY = supplier.S_SUPPKEY)  (cost=829770.18 rows=327820)
                                -> Parallel table scan on partsupp, with parallel partitions: 2882, partition_keys: 1  (cost=6347528.15 rows=3176912)
                                -> Hash
                                    -> Broadcast (slice: 2; workers: 256; nodes: 16)  (cost=103382.56 rows=1029632)
                                        -> Nested loop inner join  (cost=409.36 rows=4022)
                                            -> Filter: (nation.N_NAME = 'IRAN')  (cost=2.29 rows=3)
                                                -> Table scan on nation  (cost=2.29 rows=25)
                                            -> Parallel index lookup on supplier using SUPPLIER_FK1 (S_NATIONKEY=nation.N_NATIONKEY), with parallel partitions: 9243  (cost=65.94 rows=1609)
                    -> Select #2 (subquery in condition; run only once; shared access)
                        -> Aggregate: sum(`<collector>`.tmp_field_0)  (cost=825576.85 rows=1)
                            -> Gather (slice: 1; workers: 256; nodes: 16)  (cost=825564.05 rows=256)
                                -> Aggregate: sum((partsupp.PS_SUPPLYCOST * partsupp.PS_AVAILQTY))  (cost=825541.20 rows=1)
                                    -> Inner hash join (partsupp.PS_SUPPKEY = supplier.S_SUPPKEY)  (cost=809150.20 rows=327820)
                                        -> Parallel table scan on partsupp, with parallel partitions: 14405  (cost=6147699.35 rows=3176912)
                                        -> Hash
                                            -> Broadcast (slice: 2; workers: 256; nodes: 16)  (cost=103382.56 rows=1029632)
                                                -> Nested loop inner join  (cost=409.36 rows=4022)
                                                    -> Filter: (nation.N_NAME = 'IRAN')  (cost=2.29 rows=3)
                                                        -> Table scan on nation  (cost=2.29 rows=25)
                                                    -> Parallel index lookup on supplier using SUPPLIER_FK1 (S_NATIONKEY=nation.N_NATIONKEY), with parallel partitions: 9243  (cost=65.94 rows=1609)

In the preceding result, the select_type of the subquery is SUBQUERY and the Select #2 field is set to Shared access. The PolarDB optimizer runs the parallel subquery in advance and the result is shared among workers. In this query plan, the query block in the outer layer of the subquery cannot execute multi-node elastic parallel query due to policy restrictions. Multiple parallel workers for the outer query can execute single-node elastic parallel query within the node where the query is delivered.

View the execution time of various computing tasks

In PolarDB for MySQL 8.0.2, you can execute the EXPLAIN ANALYZE statement to view the execution time of various computing tasks when parallel queries are used.

Sample statement:

EXPLAIN ANALYZE select
    c_name,
    c_custkey,
    o_orderkey,
    o_orderdate,
    o_totalprice,
    sum(l_quantity)
from
    customer,
    orders,
    lineitem
where
    o_orderkey in (
        select
            l_orderkey
        from
            lineitem
        group by
            l_orderkey having
                sum(l_quantity) > 313
    )
    and c_custkey = o_custkey
    and o_orderkey = l_orderkey
group by
    c_name,
    c_custkey,
    o_orderkey,
    o_orderdate,
    o_totalprice
order by
    o_totalprice desc,
    o_orderdate
LIMIT 100;

Sample result:

| -> Limit: 100 row(s)  (cost=14.62 rows=1) (actual time=1.252..1.252 rows=0 loops=1)
    -> Gather (merge sort; slice: 1; workers: 1; nodes: 2)  (cost=14.62 rows=1) (actual time=1.250..1.250 rows=0 loops=1)
        -> Limit: 100 row(s)  (cost=4.52 rows=1) (actual time=0.084,0.084,0.084..0.084,0.084,0.084 rows=0,0,0 loops=1,1,1)
            -> Sort: <temporary>.O_TOTALPRICE DESC, <temporary>.O_ORDERDATE, limit input to 100 row(s) per chunk  (cost=4.52 rows=1) (actual time=0.083,0.083,0.083..0.083,0.083,0.083 rows=0,0,0 loops=1,1,1)
                -> Table scan on <temporary> (actual time=0.070,0.070,0.070..0.070,0.070,0.070 rows=0,0,0 loops=1,1,1)
                    -> Aggregate using temporary table  (cost=4.52 rows=1) (actual time=0.001,0.001,0.001..0.001,0.001,0.001 rows=0,0,0 loops=1,1,1)
                        -> Nested loop inner join  (cost=2.86 rows=4) (actual time=0.039,0.039,0.039..0.039,0.039,0.039 rows=0,0,0 loops=1,1,1)
                            -> Nested loop inner join  (cost=1.45 rows=1) (actual time=0.037,0.037,0.037..0.037,0.037,0.037 rows=0,0,0 loops=1,1,1)
                                -> Parallel table scan on customer, with parallel partitions: 1, partition_keys: 1  (cost=0.35 rows=1) (actual time=0.036,0.036,0.036..0.036,0.036,0.036 rows=0,0,0 loops=1,1,1)
                                -> Filter: <in_optimizer>(orders.O_ORDERKEY,<exists>(select #2))  (cost=1.10 rows=1)
                                    -> Index lookup on orders using ORDERS_FK1 (O_CUSTKEY=customer.C_CUSTKEY)  (cost=1.10 rows=1)
                                    -> Select #2 (subquery in condition; dependent)
                                        -> Limit: 1 row(s)
                                            -> Filter: ((sum(lineitem.L_QUANTITY) > 313) and (<cache>(orders.O_ORDERKEY) = <ref_null_helper>(lineitem.L_ORDERKEY)))
                                                -> Group aggregate: sum(lineitem.L_QUANTITY)
                                                    -> Index scan on lineitem using PRIMARY  (cost=41554048.20 rows=380071042)
                            -> Index lookup on lineitem using PRIMARY (L_ORDERKEY=orders.O_ORDERKEY)  (cost=1.41 rows=4)

If the EXPLAIN statement contains the ANALYZE keyword, the execution time of various computing tasks when parallel queries are used can be displayed. For parallel queries, the computing time of each worker in each operator is also recorded, including the maximum, minimum, and average computing time of each operator among multiple workers.