如何使用EXPLAIN語句查看與並行查詢相關的內容 - PolarDB

本文介紹如何使用EXPLAIN語句查看執行計畫輸出中與並行查詢相關的內容。

查詢用表

本文樣本中使用pq_test表進行並行查詢測試。

表結構如下：

SHOW CREATE TABLE pq_test\G
*************************** 1. row ***************************
Table: pq_test
Create Table: CREATE TABLE `pq_test` (
  `id` BIGINT(20) NOT NULL AUTO_INCREMENT,
  `help_topic_id` INT(10) UNSIGNED NOT NULL,
  `name` CHAR(64) NOT NULL,
  `help_category_id` SMALLINT(5) UNSIGNED NOT NULL,
  `description` TEXT NOT NULL,
  `example` TEXT NOT NULL,
  `url` TEXT NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=21495809 DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

表大小如下：

SHOW TABLE STATUS\G
*************************** 1. row ***************************
           Name: pq_test
         Engine: InnoDB
        Version: 10
     Row_format: Dynamic
           Rows: 20064988
 Avg_row_length: 1898
    Data_length: 38085328896
Max_data_length: 0
   Index_length: 0
      Data_free: 4194304
 Auto_increment: 21495809
    Create_time: 2019-07-30 01:35:27
    Update_time: NULL
     Check_time: NULL
      Collation: utf8_general_ci
       Checksum: NULL
 Create_options:
        Comment:
1 row in set (0.02 sec)

查詢SQL如下：
```
SELECT COUNT(*) FROM pq_test;
```

EXPLAIN查詢語句

不使用並行查詢的執行情況

通過EXPLAIN語句查看不使用並行查詢的執行情況。

查詢語句如下：

SET max_parallel_degree=0; EXPLAIN SELECT COUNT(*) FROM pq_test\G

查詢結果如下：

*************************** 1. row ***************************
           Id: 1
  Select_type: SIMPLE
        Table: pq_test
  Partitions: NULL
         Type: index
Possible_keys: NULL
          Key: PRIMARY
      Key_len: 8
          Ref: NULL
         Rows: 20064988
     Filtered: 100.00
        Extra: Using index
1 row in set, 1 warning (0.03 sec)

通過EXPLAIN語句使用並行查詢的執行情況

PolarDB MySQL版8.0.1版本中，通過EXPLAIN語句查看使用並行查詢的情況。

查詢語句如下：

EXPLAIN SELECT COUNT(*) FROM pq_test\G

查詢結果如下：

*************************** 1. row ***************************
           Id: 1
  Select_type: SIMPLE
        Table: &lt;gather2&gt;
   Partitions: NULL
         Type: ALL
Possible_keys: NULL
          Key: NULL
      Key_len: NULL
          Ref: NULL
         Rows: 20064988
     Filtered: 100.00
        Extra: NULL
*************************** 2. row ***************************
           Id: 2
  Select_type: SIMPLE
        Table: pq_test
   Partitions: NULL
         Type: index
Possible_keys: NULL
          Key: PRIMARY
      Key_len: 8
          Ref: NULL
         Rows: 10032494
     Filtered: 100.00
        Extra: Parallel scan (2 workers); Using index
2 rows in set, 1 warning (0.00 sec)

PolarDB MySQL版8.0.2版本中，通過EXPLAIN語句查看使用並行查詢的情況，可以使用更為清晰的EXPLAIN FORMAT=TREE查詢。

查詢語句如下：

EXPLAIN FORMAT=TREE SELECT COUNT(*) FROM pq_test\G

查詢結果如下：

*************************** 1. row ***************************
EXPLAIN: -> Count rows in gather  (cost=... rows=1)
    -> Gather (slice: 1; workers: 2, nodes: 2)
        -> Parallel index scan on pq_test using PRIMARY, with parallel partitions: 8  (cost=... rows=20064988)

從上述結果可以看出：

上述並行計劃中包含了Gather操作，該操作負責匯總所有Worker返回的中間結果。
從執行計畫輸出的Extra資訊中可以看到pq_test表使用了Parallel scan（並行掃描）策略，期望用4個Workers來並存執行。
Gather操作中顯示了使用的worker數量和節點數量，這裡一共使用了2個節點，每個節點2個worker。

通過帶有子查詢的EXPLAIN語句查看使用並行查詢的執行情況

PolarDB MySQL版8.0.1版本中，通過帶有子查詢的EXPLAIN語句查看使用並行查詢的情況。

查詢語句如下：

EXPLAIN SELECT
    o_orderpriority,
    COUNT(*) as order_count
FROM
    orders
WHERE
    o_orderdate >= '1994-04-01'
    AND o_orderdate < date_add('1994-04-01', interval '3' month)
    AND exists (
        SELECT
            *
        FROM
            lineitem
        WHERE
            l_orderkey = o_orderkey
            and l_commitdate < l_receiptdate
    )
GROUP BY
    o_orderpriority
ORDER BY
    o_orderpriority\G

查詢結果如下：

*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: <gather1.1>
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 1489068
     filtered: 100.00
        Extra: Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: orders
   partitions: NULL
         type: range
possible_keys: i_o_orderdate
          key: i_o_orderdate
      key_len: 3
          ref: NULL
         rows: 568369
     filtered: 100.00
        Extra: Parallel scan (2 workers); Using index condition; Using where; Using temporary
*************************** 3. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: lineitem
   partitions: NULL
         type: ref
possible_keys: PRIMARY,i_l_orderkey
          key: PRIMARY
      key_len: 4
          ref: tpch_10.orders.O_ORDERKEY
         rows: 4
     filtered: 33.3
        Extra: Parallel pushdown; Using where
3 rows in set, 2 warnings (0.01 sec)

從上述結果可以看出，select_type為DEPENDENT SUBQUERY的子查詢中，Extra顯示為Parallel pushdown，表示該子查詢使用了Parallel pushdown策略，即子查詢被整個下推到Worker去執行。

PolarDB MySQL版8.0.2版本中，通過帶有子查詢的EXPLAIN語句查看使用並行查詢的情況，通過FORMAT=TREE可以看到更清晰的執行計畫。

樣本1

查詢語句如下：

EXPLAIN FORMAT=TREE SELECT
    o_orderpriority,
    COUNT(*) as order_count
FROM
    orders
WHERE
    o_orderdate >= '1994-04-01'
    AND o_orderdate < date_add('1994-04-01', interval '3' month)
    AND exists (
        SELECT
            *
        FROM
            lineitem
        WHERE
            l_orderkey = o_orderkey
            and l_commitdate < l_receiptdate
    )
GROUP BY
    o_orderpriority
ORDER BY
    o_orderpriority\G

查詢結果如下：

*************************** 1. row ***************************
EXPLAIN: -> Sort: <temporary>.o_orderpriority
    -> Table scan on <temporary>
        -> Aggregate using temporary table  (cost=1746887.76 rows=1489068)
            -> Gather (slice: 1; workers: 2)  (cost=1597980.96 rows=1489068)
                -> Table scan on <temporary>
                    -> Aggregate using temporary table  (cost=1486290.85 rows=744534)
                        -> Filter: exists(select #2)  (cost=772982.43 rows=568369)
                            -> Parallel index range scan on orders using i_o_orderdate, with index condition: ((orders.O_ORDERDATE >= DATE'1994-04-01') and (orders.O_ORDERDATE < <cache>(('1994-04-01' + interval '3' month)))), with parallel partitions: 89  (cost=772982.43 rows=568369)
                            -> Select #2 (subquery in condition; dependent)
                                -> Limit: 1 row(s)
                                    -> Filter: (lineitem.L_COMMITDATE < lineitem.L_RECEIPTDATE)  (cost=1.14 rows=1)
                                        -> Index lookup on lineitem using PRIMARY (L_ORDERKEY=orders.O_ORDERKEY)  (cost=1.14 rows=4)

1 row in set, 1 warning (0.02 sec)

樣本2

查詢語句如下：

EXPLAIN FORMAT=TREE
select
	ps_partkey,
	sum(ps_supplycost * ps_availqty) as value
from
	partsupp,
	supplier,
	nation
where
	ps_suppkey = s_suppkey
	and s_nationkey = n_nationkey
	and n_name = 'IRAN'
group by
	ps_partkey having
		sum(ps_supplycost * ps_availqty) > (
			select
				sum(ps_supplycost * ps_availqty) * 0.0000010000
			from
				partsupp,
				supplier,
				nation
			where
				ps_suppkey = s_suppkey
				and s_nationkey = n_nationkey
				and n_name = 'IRAN'
		)
order by
	value desc limit 1;

查詢結果如下：

| -> Limit: 1 row(s)  (cost=1408498.03 rows=1)
    -> Gather (merge sort; slice: 1; workers: 256; actual workers: 32)  (cost=1408498.03 rows=256)
        -> Limit: 1 row(s)  (cost=1408404.20 rows=1)
            -> Sort: <temporary>.value DESC, limit input to 1 row(s) per chunk  (cost=1408404.20 rows=803182)
                -> Filter: (sum((partsupp.PS_SUPPLYCOST * partsupp.PS_AVAILQTY)) > (select #2))
                    -> Table scan on <temporary>
                        -> Aggregate using temporary table  (cost=1408404.20 rows=803182)
                            -> Inner hash join (partsupp.PS_SUPPKEY = supplier.S_SUPPKEY)  (cost=829770.18 rows=327820)
                                -> Parallel table scan on partsupp, with parallel partitions: 2882, partition_keys: 1  (cost=6347528.15 rows=3176912)
                                -> Hash
                                    -> Broadcast (slice: 2; workers: 256; nodes: 16)  (cost=103382.56 rows=1029632)
                                        -> Nested loop inner join  (cost=409.36 rows=4022)
                                            -> Filter: (nation.N_NAME = 'IRAN')  (cost=2.29 rows=3)
                                                -> Table scan on nation  (cost=2.29 rows=25)
                                            -> Parallel index lookup on supplier using SUPPLIER_FK1 (S_NATIONKEY=nation.N_NATIONKEY), with parallel partitions: 9243  (cost=65.94 rows=1609)
                    -> Select #2 (subquery in condition; run only once; shared access)
                        -> Aggregate: sum(`<collector>`.tmp_field_0)  (cost=825576.85 rows=1)
                            -> Gather (slice: 1; workers: 256; nodes: 16)  (cost=825564.05 rows=256)
                                -> Aggregate: sum((partsupp.PS_SUPPLYCOST * partsupp.PS_AVAILQTY))  (cost=825541.20 rows=1)
                                    -> Inner hash join (partsupp.PS_SUPPKEY = supplier.S_SUPPKEY)  (cost=809150.20 rows=327820)
                                        -> Parallel table scan on partsupp, with parallel partitions: 14405  (cost=6147699.35 rows=3176912)
                                        -> Hash
                                            -> Broadcast (slice: 2; workers: 256; nodes: 16)  (cost=103382.56 rows=1029632)
                                                -> Nested loop inner join  (cost=409.36 rows=4022)
                                                    -> Filter: (nation.N_NAME = 'IRAN')  (cost=2.29 rows=3)
                                                        -> Table scan on nation  (cost=2.29 rows=25)
                                                    -> Parallel index lookup on supplier using SUPPLIER_FK1 (S_NATIONKEY=nation.N_NATIONKEY), with parallel partitions: 9243  (cost=65.94 rows=1609)

從上述結果可以看出，select_type為SUBQUERY的子查詢中，在Select #2子查詢處顯示為Shared access，表示該子查詢使用了Shared access策略，即PolarDB最佳化器選擇提前並存執行該子查詢並將執行結果Share給外層所有Worker做共用訪問，在這種查詢計劃下，子查詢外層的查詢塊由於策略限制，將無法做跨機執行，即外層查詢的多個並行worker需要在查詢下發的節點內單機並存執行。

查看使用並行查詢時各類計算的具體執行時間

PolarDB MySQL版8.0.2版本中，可以通過EXPLAIN ANALYZE語句查看使用並行查詢時，各類計算的具體執行時間。

查詢語句如下：

EXPLAIN ANALYZE select
    c_name,
    c_custkey,
    o_orderkey,
    o_orderdate,
    o_totalprice,
    sum(l_quantity)
from
    customer,
    orders,
    lineitem
where
    o_orderkey in (
        select
            l_orderkey
        from
            lineitem
        group by
            l_orderkey having
                sum(l_quantity) > 313
    )
    and c_custkey = o_custkey
    and o_orderkey = l_orderkey
group by
    c_name,
    c_custkey,
    o_orderkey,
    o_orderdate,
    o_totalprice
order by
    o_totalprice desc,
    o_orderdate
LIMIT 100;

查詢結果如下：

| -> Limit: 100 row(s)  (cost=14.62 rows=1) (actual time=1.252..1.252 rows=0 loops=1)
    -> Gather (merge sort; slice: 1; workers: 1; nodes: 2)  (cost=14.62 rows=1) (actual time=1.250..1.250 rows=0 loops=1)
        -> Limit: 100 row(s)  (cost=4.52 rows=1) (actual time=0.084,0.084,0.084..0.084,0.084,0.084 rows=0,0,0 loops=1,1,1)
            -> Sort: <temporary>.O_TOTALPRICE DESC, <temporary>.O_ORDERDATE, limit input to 100 row(s) per chunk  (cost=4.52 rows=1) (actual time=0.083,0.083,0.083..0.083,0.083,0.083 rows=0,0,0 loops=1,1,1)
                -> Table scan on <temporary> (actual time=0.070,0.070,0.070..0.070,0.070,0.070 rows=0,0,0 loops=1,1,1)
                    -> Aggregate using temporary table  (cost=4.52 rows=1) (actual time=0.001,0.001,0.001..0.001,0.001,0.001 rows=0,0,0 loops=1,1,1)
                        -> Nested loop inner join  (cost=2.86 rows=4) (actual time=0.039,0.039,0.039..0.039,0.039,0.039 rows=0,0,0 loops=1,1,1)
                            -> Nested loop inner join  (cost=1.45 rows=1) (actual time=0.037,0.037,0.037..0.037,0.037,0.037 rows=0,0,0 loops=1,1,1)
                                -> Parallel table scan on customer, with parallel partitions: 1, partition_keys: 1  (cost=0.35 rows=1) (actual time=0.036,0.036,0.036..0.036,0.036,0.036 rows=0,0,0 loops=1,1,1)
                                -> Filter: <in_optimizer>(orders.O_ORDERKEY,<exists>(select #2))  (cost=1.10 rows=1)
                                    -> Index lookup on orders using ORDERS_FK1 (O_CUSTKEY=customer.C_CUSTKEY)  (cost=1.10 rows=1)
                                    -> Select #2 (subquery in condition; dependent)
                                        -> Limit: 1 row(s)
                                            -> Filter: ((sum(lineitem.L_QUANTITY) > 313) and (<cache>(orders.O_ORDERKEY) = <ref_null_helper>(lineitem.L_ORDERKEY)))
                                                -> Group aggregate: sum(lineitem.L_QUANTITY)
                                                    -> Index scan on lineitem using PRIMARY  (cost=41554048.20 rows=380071042)
                            -> Index lookup on lineitem using PRIMARY (L_ORDERKEY=orders.O_ORDERKEY)  (cost=1.41 rows=4)

查詢語句帶有ANALYZE關鍵字時，會實際執行查詢語句，並統計其中各個計算的耗時情況。對於並行查詢，也可以記錄各個worker在每個運算元中的計算耗時。包括各個運算元在多個worker上的最長、最短及平均執行耗時。