本文介绍如何使用EXPLAIN语句查看执行计划输出中与并行查询相关的内容。
查询用表
本文示例中使用pq_test
表进行并行查询测试。
表结构如下:
SHOW CREATE TABLE pq_test\G *************************** 1. row *************************** Table: pq_test Create Table: CREATE TABLE `pq_test` ( `id` BIGINT(20) NOT NULL AUTO_INCREMENT, `help_topic_id` INT(10) UNSIGNED NOT NULL, `name` CHAR(64) NOT NULL, `help_category_id` SMALLINT(5) UNSIGNED NOT NULL, `description` TEXT NOT NULL, `example` TEXT NOT NULL, `url` TEXT NOT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=21495809 DEFAULT CHARSET=utf8 1 row in set (0.00 sec)
表大小如下:
SHOW TABLE STATUS\G *************************** 1. row *************************** Name: pq_test Engine: InnoDB Version: 10 Row_format: Dynamic Rows: 20064988 Avg_row_length: 1898 Data_length: 38085328896 Max_data_length: 0 Index_length: 0 Data_free: 4194304 Auto_increment: 21495809 Create_time: 2019-07-30 01:35:27 Update_time: NULL Check_time: NULL Collation: utf8_general_ci Checksum: NULL Create_options: Comment: 1 row in set (0.02 sec)
查询SQL如下:
SELECT COUNT(*) FROM pq_test;
EXPLAIN查询语句
不使用并行查询的执行情况
通过EXPLAIN语句查看不使用并行查询的执行情况。
查询语句如下:
SET max_parallel_degree=0; EXPLAIN SELECT COUNT(*) FROM pq_test\G
查询结果如下:
*************************** 1. row ***************************
Id: 1
Select_type: SIMPLE
Table: pq_test
Partitions: NULL
Type: index
Possible_keys: NULL
Key: PRIMARY
Key_len: 8
Ref: NULL
Rows: 20064988
Filtered: 100.00
Extra: Using index
1 row in set, 1 warning (0.03 sec)
通过EXPLAIN语句使用并行查询的执行情况
PolarDB MySQL版8.0.1版本中,通过EXPLAIN语句查看使用并行查询的情况。
查询语句如下:
EXPLAIN SELECT COUNT(*) FROM pq_test\G
查询结果如下:
*************************** 1. row *************************** Id: 1 Select_type: SIMPLE Table: <gather2> Partitions: NULL Type: ALL Possible_keys: NULL Key: NULL Key_len: NULL Ref: NULL Rows: 20064988 Filtered: 100.00 Extra: NULL *************************** 2. row *************************** Id: 2 Select_type: SIMPLE Table: pq_test Partitions: NULL Type: index Possible_keys: NULL Key: PRIMARY Key_len: 8 Ref: NULL Rows: 10032494 Filtered: 100.00 Extra: Parallel scan (2 workers); Using index 2 rows in set, 1 warning (0.00 sec)
PolarDB MySQL版8.0.2版本中,通过EXPLAIN语句查看使用并行查询的情况,可以使用更为清晰的
EXPLAIN FORMAT=TREE
查询。查询语句如下:
EXPLAIN FORMAT=TREE SELECT COUNT(*) FROM pq_test\G
查询结果如下:
*************************** 1. row *************************** EXPLAIN: -> Count rows in gather (cost=... rows=1) -> Gather (slice: 1; workers: 2, nodes: 2) -> Parallel index scan on pq_test using PRIMARY, with parallel partitions: 8 (cost=... rows=20064988)
从上述结果可以看出:
上述并行计划中包含了Gather操作,该操作负责汇总所有Worker返回的中间结果。
从执行计划输出的Extra信息中可以看到
pq_test
表使用了Parallel scan(并行扫描)策略,期望用4个Workers来并行执行。Gather操作中显示了使用的worker数量和节点数量,这里一共使用了2个节点,每个节点2个worker。
通过带有子查询的EXPLAIN语句查看使用并行查询的执行情况
PolarDB MySQL版8.0.1版本中,通过带有子查询的EXPLAIN语句查看使用并行查询的情况。
查询语句如下:
EXPLAIN SELECT o_orderpriority, COUNT(*) as order_count FROM orders WHERE o_orderdate >= '1994-04-01' AND o_orderdate < date_add('1994-04-01', interval '3' month) AND exists ( SELECT * FROM lineitem WHERE l_orderkey = o_orderkey and l_commitdate < l_receiptdate ) GROUP BY o_orderpriority ORDER BY o_orderpriority\G
查询结果如下:
*************************** 1. row *************************** id: 1 select_type: PRIMARY table: <gather1.1> partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 1489068 filtered: 100.00 Extra: Using temporary; Using filesort *************************** 2. row *************************** id: 1 select_type: PRIMARY table: orders partitions: NULL type: range possible_keys: i_o_orderdate key: i_o_orderdate key_len: 3 ref: NULL rows: 568369 filtered: 100.00 Extra: Parallel scan (2 workers); Using index condition; Using where; Using temporary *************************** 3. row *************************** id: 2 select_type: DEPENDENT SUBQUERY table: lineitem partitions: NULL type: ref possible_keys: PRIMARY,i_l_orderkey key: PRIMARY key_len: 4 ref: tpch_10.orders.O_ORDERKEY rows: 4 filtered: 33.3 Extra: Parallel pushdown; Using where 3 rows in set, 2 warnings (0.01 sec)
从上述结果可以看出,
select_type
为DEPENDENT SUBQUERY
的子查询中,Extra
显示为Parallel pushdown
,表示该子查询使用了Parallel pushdown
策略,即子查询被整个下推到Worker去执行。PolarDB MySQL版8.0.2版本中,通过带有子查询的EXPLAIN语句查看使用并行查询的情况,通过
FORMAT=TREE
可以看到更清晰的执行计划。示例1
查询语句如下:
EXPLAIN FORMAT=TREE SELECT o_orderpriority, COUNT(*) as order_count FROM orders WHERE o_orderdate >= '1994-04-01' AND o_orderdate < date_add('1994-04-01', interval '3' month) AND exists ( SELECT * FROM lineitem WHERE l_orderkey = o_orderkey and l_commitdate < l_receiptdate ) GROUP BY o_orderpriority ORDER BY o_orderpriority\G
查询结果如下:
*************************** 1. row *************************** EXPLAIN: -> Sort: <temporary>.o_orderpriority -> Table scan on <temporary> -> Aggregate using temporary table (cost=1746887.76 rows=1489068) -> Gather (slice: 1; workers: 2) (cost=1597980.96 rows=1489068) -> Table scan on <temporary> -> Aggregate using temporary table (cost=1486290.85 rows=744534) -> Filter: exists(select #2) (cost=772982.43 rows=568369) -> Parallel index range scan on orders using i_o_orderdate, with index condition: ((orders.O_ORDERDATE >= DATE'1994-04-01') and (orders.O_ORDERDATE < <cache>(('1994-04-01' + interval '3' month)))), with parallel partitions: 89 (cost=772982.43 rows=568369) -> Select #2 (subquery in condition; dependent) -> Limit: 1 row(s) -> Filter: (lineitem.L_COMMITDATE < lineitem.L_RECEIPTDATE) (cost=1.14 rows=1) -> Index lookup on lineitem using PRIMARY (L_ORDERKEY=orders.O_ORDERKEY) (cost=1.14 rows=4) 1 row in set, 1 warning (0.02 sec)
示例2
查询语句如下:
EXPLAIN FORMAT=TREE select ps_partkey, sum(ps_supplycost * ps_availqty) as value from partsupp, supplier, nation where ps_suppkey = s_suppkey and s_nationkey = n_nationkey and n_name = 'IRAN' group by ps_partkey having sum(ps_supplycost * ps_availqty) > ( select sum(ps_supplycost * ps_availqty) * 0.0000010000 from partsupp, supplier, nation where ps_suppkey = s_suppkey and s_nationkey = n_nationkey and n_name = 'IRAN' ) order by value desc limit 1;
查询结果如下:
| -> Limit: 1 row(s) (cost=1408498.03 rows=1) -> Gather (merge sort; slice: 1; workers: 256; actual workers: 32) (cost=1408498.03 rows=256) -> Limit: 1 row(s) (cost=1408404.20 rows=1) -> Sort: <temporary>.value DESC, limit input to 1 row(s) per chunk (cost=1408404.20 rows=803182) -> Filter: (sum((partsupp.PS_SUPPLYCOST * partsupp.PS_AVAILQTY)) > (select #2)) -> Table scan on <temporary> -> Aggregate using temporary table (cost=1408404.20 rows=803182) -> Inner hash join (partsupp.PS_SUPPKEY = supplier.S_SUPPKEY) (cost=829770.18 rows=327820) -> Parallel table scan on partsupp, with parallel partitions: 2882, partition_keys: 1 (cost=6347528.15 rows=3176912) -> Hash -> Broadcast (slice: 2; workers: 256; nodes: 16) (cost=103382.56 rows=1029632) -> Nested loop inner join (cost=409.36 rows=4022) -> Filter: (nation.N_NAME = 'IRAN') (cost=2.29 rows=3) -> Table scan on nation (cost=2.29 rows=25) -> Parallel index lookup on supplier using SUPPLIER_FK1 (S_NATIONKEY=nation.N_NATIONKEY), with parallel partitions: 9243 (cost=65.94 rows=1609) -> Select #2 (subquery in condition; run only once; shared access) -> Aggregate: sum(`<collector>`.tmp_field_0) (cost=825576.85 rows=1) -> Gather (slice: 1; workers: 256; nodes: 16) (cost=825564.05 rows=256) -> Aggregate: sum((partsupp.PS_SUPPLYCOST * partsupp.PS_AVAILQTY)) (cost=825541.20 rows=1) -> Inner hash join (partsupp.PS_SUPPKEY = supplier.S_SUPPKEY) (cost=809150.20 rows=327820) -> Parallel table scan on partsupp, with parallel partitions: 14405 (cost=6147699.35 rows=3176912) -> Hash -> Broadcast (slice: 2; workers: 256; nodes: 16) (cost=103382.56 rows=1029632) -> Nested loop inner join (cost=409.36 rows=4022) -> Filter: (nation.N_NAME = 'IRAN') (cost=2.29 rows=3) -> Table scan on nation (cost=2.29 rows=25) -> Parallel index lookup on supplier using SUPPLIER_FK1 (S_NATIONKEY=nation.N_NATIONKEY), with parallel partitions: 9243 (cost=65.94 rows=1609)
从上述结果可以看出,
select_type
为SUBQUERY
的子查询中,在Select #2
子查询处显示为Shared access
,表示该子查询使用了Shared access
策略,即PolarDB优化器选择提前并行执行该子查询并将执行结果Share给外层所有Worker做共享访问,在这种查询计划下,子查询外层的查询块由于策略限制,将无法做跨机执行,即外层查询的多个并行worker需要在查询下发的节点内单机并行执行。
查看使用并行查询时各类计算的具体执行时间
PolarDB MySQL版8.0.2版本中,可以通过EXPLAIN ANALYZE
语句查看使用并行查询时,各类计算的具体执行时间。
查询语句如下:
EXPLAIN ANALYZE select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity) from customer, orders, lineitem where o_orderkey in ( select l_orderkey from lineitem group by l_orderkey having sum(l_quantity) > 313 ) and c_custkey = o_custkey and o_orderkey = l_orderkey group by c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice order by o_totalprice desc, o_orderdate LIMIT 100;
查询结果如下:
| -> Limit: 100 row(s) (cost=14.62 rows=1) (actual time=1.252..1.252 rows=0 loops=1) -> Gather (merge sort; slice: 1; workers: 1; nodes: 2) (cost=14.62 rows=1) (actual time=1.250..1.250 rows=0 loops=1) -> Limit: 100 row(s) (cost=4.52 rows=1) (actual time=0.084,0.084,0.084..0.084,0.084,0.084 rows=0,0,0 loops=1,1,1) -> Sort: <temporary>.O_TOTALPRICE DESC, <temporary>.O_ORDERDATE, limit input to 100 row(s) per chunk (cost=4.52 rows=1) (actual time=0.083,0.083,0.083..0.083,0.083,0.083 rows=0,0,0 loops=1,1,1) -> Table scan on <temporary> (actual time=0.070,0.070,0.070..0.070,0.070,0.070 rows=0,0,0 loops=1,1,1) -> Aggregate using temporary table (cost=4.52 rows=1) (actual time=0.001,0.001,0.001..0.001,0.001,0.001 rows=0,0,0 loops=1,1,1) -> Nested loop inner join (cost=2.86 rows=4) (actual time=0.039,0.039,0.039..0.039,0.039,0.039 rows=0,0,0 loops=1,1,1) -> Nested loop inner join (cost=1.45 rows=1) (actual time=0.037,0.037,0.037..0.037,0.037,0.037 rows=0,0,0 loops=1,1,1) -> Parallel table scan on customer, with parallel partitions: 1, partition_keys: 1 (cost=0.35 rows=1) (actual time=0.036,0.036,0.036..0.036,0.036,0.036 rows=0,0,0 loops=1,1,1) -> Filter: <in_optimizer>(orders.O_ORDERKEY,<exists>(select #2)) (cost=1.10 rows=1) -> Index lookup on orders using ORDERS_FK1 (O_CUSTKEY=customer.C_CUSTKEY) (cost=1.10 rows=1) -> Select #2 (subquery in condition; dependent) -> Limit: 1 row(s) -> Filter: ((sum(lineitem.L_QUANTITY) > 313) and (<cache>(orders.O_ORDERKEY) = <ref_null_helper>(lineitem.L_ORDERKEY))) -> Group aggregate: sum(lineitem.L_QUANTITY) -> Index scan on lineitem using PRIMARY (cost=41554048.20 rows=380071042) -> Index lookup on lineitem using PRIMARY (L_ORDERKEY=orders.O_ORDERKEY) (cost=1.41 rows=4)
查询语句带有ANALYZE关键字时,会实际执行查询语句,并统计其中各个计算的耗时情况。对于并行查询,也可以记录各个worker在每个算子中的计算耗时。包括各个算子在多个worker上的最长、最短及平均执行耗时。