本文介紹如何使用EXPLAIN語句查看執行計畫輸出中與並行查詢相關的內容。
查詢用表
本文樣本中使用pq_test
表進行並行查詢測試。
表結構如下:
SHOW CREATE TABLE pq_test\G *************************** 1. row *************************** Table: pq_test Create Table: CREATE TABLE `pq_test` ( `id` BIGINT(20) NOT NULL AUTO_INCREMENT, `help_topic_id` INT(10) UNSIGNED NOT NULL, `name` CHAR(64) NOT NULL, `help_category_id` SMALLINT(5) UNSIGNED NOT NULL, `description` TEXT NOT NULL, `example` TEXT NOT NULL, `url` TEXT NOT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=21495809 DEFAULT CHARSET=utf8 1 row in set (0.00 sec)
表大小如下:
SHOW TABLE STATUS\G *************************** 1. row *************************** Name: pq_test Engine: InnoDB Version: 10 Row_format: Dynamic Rows: 20064988 Avg_row_length: 1898 Data_length: 38085328896 Max_data_length: 0 Index_length: 0 Data_free: 4194304 Auto_increment: 21495809 Create_time: 2019-07-30 01:35:27 Update_time: NULL Check_time: NULL Collation: utf8_general_ci Checksum: NULL Create_options: Comment: 1 row in set (0.02 sec)
查詢SQL如下:
SELECT COUNT(*) FROM pq_test;
EXPLAIN查詢語句
不使用並行查詢的執行情況
通過EXPLAIN語句查看不使用並行查詢的執行情況。
查詢語句如下:
SET max_parallel_degree=0; EXPLAIN SELECT COUNT(*) FROM pq_test\G
查詢結果如下:
*************************** 1. row ***************************
Id: 1
Select_type: SIMPLE
Table: pq_test
Partitions: NULL
Type: index
Possible_keys: NULL
Key: PRIMARY
Key_len: 8
Ref: NULL
Rows: 20064988
Filtered: 100.00
Extra: Using index
1 row in set, 1 warning (0.03 sec)
通過EXPLAIN語句使用並行查詢的執行情況
PolarDB MySQL版8.0.1版本中,通過EXPLAIN語句查看使用並行查詢的情況。
查詢語句如下:
EXPLAIN SELECT COUNT(*) FROM pq_test\G
查詢結果如下:
*************************** 1. row *************************** Id: 1 Select_type: SIMPLE Table: <gather2> Partitions: NULL Type: ALL Possible_keys: NULL Key: NULL Key_len: NULL Ref: NULL Rows: 20064988 Filtered: 100.00 Extra: NULL *************************** 2. row *************************** Id: 2 Select_type: SIMPLE Table: pq_test Partitions: NULL Type: index Possible_keys: NULL Key: PRIMARY Key_len: 8 Ref: NULL Rows: 10032494 Filtered: 100.00 Extra: Parallel scan (2 workers); Using index 2 rows in set, 1 warning (0.00 sec)
PolarDB MySQL版8.0.2版本中,通過EXPLAIN語句查看使用並行查詢的情況,可以使用更為清晰的
EXPLAIN FORMAT=TREE
查詢。查詢語句如下:
EXPLAIN FORMAT=TREE SELECT COUNT(*) FROM pq_test\G
查詢結果如下:
*************************** 1. row *************************** EXPLAIN: -> Count rows in gather (cost=... rows=1) -> Gather (slice: 1; workers: 2, nodes: 2) -> Parallel index scan on pq_test using PRIMARY, with parallel partitions: 8 (cost=... rows=20064988)
從上述結果可以看出:
上述並行計劃中包含了Gather操作,該操作負責匯總所有Worker返回的中間結果。
從執行計畫輸出的Extra資訊中可以看到
pq_test
表使用了Parallel scan(並行掃描)策略,期望用4個Workers來並存執行。Gather操作中顯示了使用的worker數量和節點數量,這裡一共使用了2個節點,每個節點2個worker。
通過帶有子查詢的EXPLAIN語句查看使用並行查詢的執行情況
PolarDB MySQL版8.0.1版本中,通過帶有子查詢的EXPLAIN語句查看使用並行查詢的情況。
查詢語句如下:
EXPLAIN SELECT o_orderpriority, COUNT(*) as order_count FROM orders WHERE o_orderdate >= '1994-04-01' AND o_orderdate < date_add('1994-04-01', interval '3' month) AND exists ( SELECT * FROM lineitem WHERE l_orderkey = o_orderkey and l_commitdate < l_receiptdate ) GROUP BY o_orderpriority ORDER BY o_orderpriority\G
查詢結果如下:
*************************** 1. row *************************** id: 1 select_type: PRIMARY table: <gather1.1> partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 1489068 filtered: 100.00 Extra: Using temporary; Using filesort *************************** 2. row *************************** id: 1 select_type: PRIMARY table: orders partitions: NULL type: range possible_keys: i_o_orderdate key: i_o_orderdate key_len: 3 ref: NULL rows: 568369 filtered: 100.00 Extra: Parallel scan (2 workers); Using index condition; Using where; Using temporary *************************** 3. row *************************** id: 2 select_type: DEPENDENT SUBQUERY table: lineitem partitions: NULL type: ref possible_keys: PRIMARY,i_l_orderkey key: PRIMARY key_len: 4 ref: tpch_10.orders.O_ORDERKEY rows: 4 filtered: 33.3 Extra: Parallel pushdown; Using where 3 rows in set, 2 warnings (0.01 sec)
從上述結果可以看出,
select_type
為DEPENDENT SUBQUERY
的子查詢中,Extra
顯示為Parallel pushdown
,表示該子查詢使用了Parallel pushdown
策略,即子查詢被整個下推到Worker去執行。PolarDB MySQL版8.0.2版本中,通過帶有子查詢的EXPLAIN語句查看使用並行查詢的情況,通過
FORMAT=TREE
可以看到更清晰的執行計畫。樣本1
查詢語句如下:
EXPLAIN FORMAT=TREE SELECT o_orderpriority, COUNT(*) as order_count FROM orders WHERE o_orderdate >= '1994-04-01' AND o_orderdate < date_add('1994-04-01', interval '3' month) AND exists ( SELECT * FROM lineitem WHERE l_orderkey = o_orderkey and l_commitdate < l_receiptdate ) GROUP BY o_orderpriority ORDER BY o_orderpriority\G
查詢結果如下:
*************************** 1. row *************************** EXPLAIN: -> Sort: <temporary>.o_orderpriority -> Table scan on <temporary> -> Aggregate using temporary table (cost=1746887.76 rows=1489068) -> Gather (slice: 1; workers: 2) (cost=1597980.96 rows=1489068) -> Table scan on <temporary> -> Aggregate using temporary table (cost=1486290.85 rows=744534) -> Filter: exists(select #2) (cost=772982.43 rows=568369) -> Parallel index range scan on orders using i_o_orderdate, with index condition: ((orders.O_ORDERDATE >= DATE'1994-04-01') and (orders.O_ORDERDATE < <cache>(('1994-04-01' + interval '3' month)))), with parallel partitions: 89 (cost=772982.43 rows=568369) -> Select #2 (subquery in condition; dependent) -> Limit: 1 row(s) -> Filter: (lineitem.L_COMMITDATE < lineitem.L_RECEIPTDATE) (cost=1.14 rows=1) -> Index lookup on lineitem using PRIMARY (L_ORDERKEY=orders.O_ORDERKEY) (cost=1.14 rows=4) 1 row in set, 1 warning (0.02 sec)
樣本2
查詢語句如下:
EXPLAIN FORMAT=TREE select ps_partkey, sum(ps_supplycost * ps_availqty) as value from partsupp, supplier, nation where ps_suppkey = s_suppkey and s_nationkey = n_nationkey and n_name = 'IRAN' group by ps_partkey having sum(ps_supplycost * ps_availqty) > ( select sum(ps_supplycost * ps_availqty) * 0.0000010000 from partsupp, supplier, nation where ps_suppkey = s_suppkey and s_nationkey = n_nationkey and n_name = 'IRAN' ) order by value desc limit 1;
查詢結果如下:
| -> Limit: 1 row(s) (cost=1408498.03 rows=1) -> Gather (merge sort; slice: 1; workers: 256; actual workers: 32) (cost=1408498.03 rows=256) -> Limit: 1 row(s) (cost=1408404.20 rows=1) -> Sort: <temporary>.value DESC, limit input to 1 row(s) per chunk (cost=1408404.20 rows=803182) -> Filter: (sum((partsupp.PS_SUPPLYCOST * partsupp.PS_AVAILQTY)) > (select #2)) -> Table scan on <temporary> -> Aggregate using temporary table (cost=1408404.20 rows=803182) -> Inner hash join (partsupp.PS_SUPPKEY = supplier.S_SUPPKEY) (cost=829770.18 rows=327820) -> Parallel table scan on partsupp, with parallel partitions: 2882, partition_keys: 1 (cost=6347528.15 rows=3176912) -> Hash -> Broadcast (slice: 2; workers: 256; nodes: 16) (cost=103382.56 rows=1029632) -> Nested loop inner join (cost=409.36 rows=4022) -> Filter: (nation.N_NAME = 'IRAN') (cost=2.29 rows=3) -> Table scan on nation (cost=2.29 rows=25) -> Parallel index lookup on supplier using SUPPLIER_FK1 (S_NATIONKEY=nation.N_NATIONKEY), with parallel partitions: 9243 (cost=65.94 rows=1609) -> Select #2 (subquery in condition; run only once; shared access) -> Aggregate: sum(`<collector>`.tmp_field_0) (cost=825576.85 rows=1) -> Gather (slice: 1; workers: 256; nodes: 16) (cost=825564.05 rows=256) -> Aggregate: sum((partsupp.PS_SUPPLYCOST * partsupp.PS_AVAILQTY)) (cost=825541.20 rows=1) -> Inner hash join (partsupp.PS_SUPPKEY = supplier.S_SUPPKEY) (cost=809150.20 rows=327820) -> Parallel table scan on partsupp, with parallel partitions: 14405 (cost=6147699.35 rows=3176912) -> Hash -> Broadcast (slice: 2; workers: 256; nodes: 16) (cost=103382.56 rows=1029632) -> Nested loop inner join (cost=409.36 rows=4022) -> Filter: (nation.N_NAME = 'IRAN') (cost=2.29 rows=3) -> Table scan on nation (cost=2.29 rows=25) -> Parallel index lookup on supplier using SUPPLIER_FK1 (S_NATIONKEY=nation.N_NATIONKEY), with parallel partitions: 9243 (cost=65.94 rows=1609)
從上述結果可以看出,
select_type
為SUBQUERY
的子查詢中,在Select #2
子查詢處顯示為Shared access
,表示該子查詢使用了Shared access
策略,即PolarDB最佳化器選擇提前並存執行該子查詢並將執行結果Share給外層所有Worker做共用訪問,在這種查詢計劃下,子查詢外層的查詢塊由於策略限制,將無法做跨機執行,即外層查詢的多個並行worker需要在查詢下發的節點內單機並存執行。
查看使用並行查詢時各類計算的具體執行時間
PolarDB MySQL版8.0.2版本中,可以通過EXPLAIN ANALYZE
語句查看使用並行查詢時,各類計算的具體執行時間。
查詢語句如下:
EXPLAIN ANALYZE select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity) from customer, orders, lineitem where o_orderkey in ( select l_orderkey from lineitem group by l_orderkey having sum(l_quantity) > 313 ) and c_custkey = o_custkey and o_orderkey = l_orderkey group by c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice order by o_totalprice desc, o_orderdate LIMIT 100;
查詢結果如下:
| -> Limit: 100 row(s) (cost=14.62 rows=1) (actual time=1.252..1.252 rows=0 loops=1) -> Gather (merge sort; slice: 1; workers: 1; nodes: 2) (cost=14.62 rows=1) (actual time=1.250..1.250 rows=0 loops=1) -> Limit: 100 row(s) (cost=4.52 rows=1) (actual time=0.084,0.084,0.084..0.084,0.084,0.084 rows=0,0,0 loops=1,1,1) -> Sort: <temporary>.O_TOTALPRICE DESC, <temporary>.O_ORDERDATE, limit input to 100 row(s) per chunk (cost=4.52 rows=1) (actual time=0.083,0.083,0.083..0.083,0.083,0.083 rows=0,0,0 loops=1,1,1) -> Table scan on <temporary> (actual time=0.070,0.070,0.070..0.070,0.070,0.070 rows=0,0,0 loops=1,1,1) -> Aggregate using temporary table (cost=4.52 rows=1) (actual time=0.001,0.001,0.001..0.001,0.001,0.001 rows=0,0,0 loops=1,1,1) -> Nested loop inner join (cost=2.86 rows=4) (actual time=0.039,0.039,0.039..0.039,0.039,0.039 rows=0,0,0 loops=1,1,1) -> Nested loop inner join (cost=1.45 rows=1) (actual time=0.037,0.037,0.037..0.037,0.037,0.037 rows=0,0,0 loops=1,1,1) -> Parallel table scan on customer, with parallel partitions: 1, partition_keys: 1 (cost=0.35 rows=1) (actual time=0.036,0.036,0.036..0.036,0.036,0.036 rows=0,0,0 loops=1,1,1) -> Filter: <in_optimizer>(orders.O_ORDERKEY,<exists>(select #2)) (cost=1.10 rows=1) -> Index lookup on orders using ORDERS_FK1 (O_CUSTKEY=customer.C_CUSTKEY) (cost=1.10 rows=1) -> Select #2 (subquery in condition; dependent) -> Limit: 1 row(s) -> Filter: ((sum(lineitem.L_QUANTITY) > 313) and (<cache>(orders.O_ORDERKEY) = <ref_null_helper>(lineitem.L_ORDERKEY))) -> Group aggregate: sum(lineitem.L_QUANTITY) -> Index scan on lineitem using PRIMARY (cost=41554048.20 rows=380071042) -> Index lookup on lineitem using PRIMARY (L_ORDERKEY=orders.O_ORDERKEY) (cost=1.41 rows=4)
查詢語句帶有ANALYZE關鍵字時,會實際執行查詢語句,並統計其中各個計算的耗時情況。對於並行查詢,也可以記錄各個worker在每個運算元中的計算耗時。包括各個運算元在多個worker上的最長、最短及平均執行耗時。