This topic describes how to execute the EXPLAIN statement to view elastic parallel query information in execution plans.
Example table
In the following example, the pq_test
table is used to test parallel queries.
Schema:
SHOW CREATE TABLE pq_test\G *************************** 1. row *************************** Table: pq_test Create Table: CREATE TABLE `pq_test` ( `id` BIGINT(20) NOT NULL AUTO_INCREMENT, `help_topic_id` INT(10) UNSIGNED NOT NULL, `name` CHAR(64) NOT NULL, `help_category_id` SMALLINT(5) UNSIGNED NOT NULL, `description` TEXT NOT NULL, `example` TEXT NOT NULL, `url` TEXT NOT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=21495809 DEFAULT CHARSET=utf8 1 row in set (0.00 sec)
Table size:
SHOW TABLE STATUS\G *************************** 1. row *************************** Name: pq_test Engine: InnoDB Version: 10 Row_format: Dynamic Rows: 20064988 Avg_row_length: 1898 Data_length: 38085328896 Max_data_length: 0 Index_length: 0 Data_free: 4194304 Auto_increment: 21495809 Create_time: 2019-07-30 01:35:27 Update_time: NULL Check_time: NULL Collation: utf8_general_ci Checksum: NULL Create_options: Comment: 1 row in set (0.02 sec)
SQL statement:
SELECT COUNT(*) FROM pq_test;
EXPLAIN statement
View non-parallel queries
You can use the EXPLAIN statement on non-parallel queries.
Sample statement:
SET max_parallel_degree=0; EXPLAIN SELECT COUNT(*) FROM pq_test\G
Sample result:
*************************** 1. row ***************************
Id: 1
Select_type: SIMPLE
Table: pq_test
Partitions: NULL
Type: index
Possible_keys: NULL
Key: PRIMARY
Key_len: 8
Ref: NULL
Rows: 20064988
Filtered: 100.00
Extra: Using index
1 row in set, 1 warning (0.03 sec)
View parallel queries
In PolarDB for MySQL 8.0.1, you can use the EXPLAIN statement on parallel queries.
Sample statement:
EXPLAIN SELECT COUNT(*) FROM pq_test\G
Sample result:
*************************** 1. row *************************** Id: 1 Select_type: SIMPLE Table: <gather2> Partitions: NULL Type: ALL Possible_keys: NULL Key: NULL Key_len: NULL Ref: NULL Rows: 20064988 Filtered: 100.00 Extra: NULL *************************** 2. row *************************** Id: 2 Select_type: SIMPLE Table: pq_test Partitions: NULL Type: index Possible_keys: NULL Key: PRIMARY Key_len: 8 Ref: NULL Rows: 10032494 Filtered: 100.00 Extra: Parallel scan (2 workers); Using index 2 rows in set, 1 warning (0.00 sec)
In PolarDB for MySQL 8.0.2, you can use the
EXPLAIN FORMAT=TREE
statement on parallel queries. This returns a more precise description of query handling.Sample statement:
EXPLAIN FORMAT=TREE SELECT COUNT(*) FROM pq_test\G
Sample result:
*************************** 1. row *************************** EXPLAIN: -> Count rows in gather (cost=... rows=1) -> Gather (slice: 1; workers: 2, nodes: 2) -> Parallel index scan on pq_test using PRIMARY, with parallel partitions: 8 (cost=... rows=20064988)
The result indicates the following information:
The EXPLAIN output shows that the parallel plan includes a Gather operation. Gather is implemented to gather the partial results that are produced by all workers.
In addition, information in the Extra field shows that a parallel scan is performed on the
pq_test
table by using four workers.The number of workers used and the number of nodes are displayed in the Gather operation. A total of 2 nodes are used and each node has 2 workers.
View parallel queries that include subqueries
In PolarDB for MySQL 8.0.1, you can use the EXPLAIN statement on parallel queries that include subqueries.
Sample statement:
EXPLAIN SELECT o_orderpriority, COUNT(*) as order_count FROM orders WHERE o_orderdate >= '1994-04-01' AND o_orderdate < date_add('1994-04-01', interval '3' month) AND exists ( SELECT * FROM lineitem WHERE l_orderkey = o_orderkey and l_commitdate < l_receiptdate ) GROUP BY o_orderpriority ORDER BY o_orderpriority\G
Sample result:
*************************** 1. row *************************** id: 1 select_type: PRIMARY table: <gather1.1> partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 1489068 filtered: 100.00 Extra: Using temporary; Using filesort *************************** 2. row *************************** id: 1 select_type: PRIMARY table: orders partitions: NULL type: range possible_keys: i_o_orderdate key: i_o_orderdate key_len: 3 ref: NULL rows: 568369 filtered: 100.00 Extra: Parallel scan (2 workers); Using index condition; Using where; Using temporary *************************** 3. row *************************** id: 2 select_type: DEPENDENT SUBQUERY table: lineitem partitions: NULL type: ref possible_keys: PRIMARY,i_l_orderkey key: PRIMARY key_len: 4 ref: tpch_10.orders.O_ORDERKEY rows: 4 filtered: 33.3 Extra: Parallel pushdown; Using where 3 rows in set, 2 warnings (0.01 sec)
In the preceding result, the
select_type
of the subquery isDEPENDENT SUBQUERY
and theExtra
field is set toParallel pushdown
, which indicates that the subquery is sent to workers for parallel processing.In PolarDB for MySQL 8.0.2, you can use the
FORMAT=TREE
statement on parallel queries that include subqueries. This returns a more precise description of query handling.Example 1
Sample statement:
EXPLAIN FORMAT=TREE SELECT o_orderpriority, COUNT(*) as order_count FROM orders WHERE o_orderdate >= '1994-04-01' AND o_orderdate < date_add('1994-04-01', interval '3' month) AND exists ( SELECT * FROM lineitem WHERE l_orderkey = o_orderkey and l_commitdate < l_receiptdate ) GROUP BY o_orderpriority ORDER BY o_orderpriority\G
Sample result:
*************************** 1. row *************************** EXPLAIN: -> Sort: <temporary>.o_orderpriority -> Table scan on <temporary> -> Aggregate using temporary table (cost=1746887.76 rows=1489068) -> Gather (slice: 1; workers: 2) (cost=1597980.96 rows=1489068) -> Table scan on <temporary> -> Aggregate using temporary table (cost=1486290.85 rows=744534) -> Filter: exists(select #2) (cost=772982.43 rows=568369) -> Parallel index range scan on orders using i_o_orderdate, with index condition: ((orders.O_ORDERDATE >= DATE'1994-04-01') and (orders.O_ORDERDATE < <cache>(('1994-04-01' + interval '3' month)))), with parallel partitions: 89 (cost=772982.43 rows=568369) -> Select #2 (subquery in condition; dependent) -> Limit: 1 row(s) -> Filter: (lineitem.L_COMMITDATE < lineitem.L_RECEIPTDATE) (cost=1.14 rows=1) -> Index lookup on lineitem using PRIMARY (L_ORDERKEY=orders.O_ORDERKEY) (cost=1.14 rows=4) 1 row in set, 1 warning (0.02 sec)
Example 2
Sample statement:
EXPLAIN FORMAT=TREE select ps_partkey, sum(ps_supplycost * ps_availqty) as value from partsupp, supplier, nation where ps_suppkey = s_suppkey and s_nationkey = n_nationkey and n_name = 'IRAN' group by ps_partkey having sum(ps_supplycost * ps_availqty) > ( select sum(ps_supplycost * ps_availqty) * 0.0000010000 from partsupp, supplier, nation where ps_suppkey = s_suppkey and s_nationkey = n_nationkey and n_name = 'IRAN' ) order by value desc limit 1;
Sample result:
| -> Limit: 1 row(s) (cost=1408498.03 rows=1) -> Gather (merge sort; slice: 1; workers: 256; actual workers: 32) (cost=1408498.03 rows=256) -> Limit: 1 row(s) (cost=1408404.20 rows=1) -> Sort: <temporary>.value DESC, limit input to 1 row(s) per chunk (cost=1408404.20 rows=803182) -> Filter: (sum((partsupp.PS_SUPPLYCOST * partsupp.PS_AVAILQTY)) > (select #2)) -> Table scan on <temporary> -> Aggregate using temporary table (cost=1408404.20 rows=803182) -> Inner hash join (partsupp.PS_SUPPKEY = supplier.S_SUPPKEY) (cost=829770.18 rows=327820) -> Parallel table scan on partsupp, with parallel partitions: 2882, partition_keys: 1 (cost=6347528.15 rows=3176912) -> Hash -> Broadcast (slice: 2; workers: 256; nodes: 16) (cost=103382.56 rows=1029632) -> Nested loop inner join (cost=409.36 rows=4022) -> Filter: (nation.N_NAME = 'IRAN') (cost=2.29 rows=3) -> Table scan on nation (cost=2.29 rows=25) -> Parallel index lookup on supplier using SUPPLIER_FK1 (S_NATIONKEY=nation.N_NATIONKEY), with parallel partitions: 9243 (cost=65.94 rows=1609) -> Select #2 (subquery in condition; run only once; shared access) -> Aggregate: sum(`<collector>`.tmp_field_0) (cost=825576.85 rows=1) -> Gather (slice: 1; workers: 256; nodes: 16) (cost=825564.05 rows=256) -> Aggregate: sum((partsupp.PS_SUPPLYCOST * partsupp.PS_AVAILQTY)) (cost=825541.20 rows=1) -> Inner hash join (partsupp.PS_SUPPKEY = supplier.S_SUPPKEY) (cost=809150.20 rows=327820) -> Parallel table scan on partsupp, with parallel partitions: 14405 (cost=6147699.35 rows=3176912) -> Hash -> Broadcast (slice: 2; workers: 256; nodes: 16) (cost=103382.56 rows=1029632) -> Nested loop inner join (cost=409.36 rows=4022) -> Filter: (nation.N_NAME = 'IRAN') (cost=2.29 rows=3) -> Table scan on nation (cost=2.29 rows=25) -> Parallel index lookup on supplier using SUPPLIER_FK1 (S_NATIONKEY=nation.N_NATIONKEY), with parallel partitions: 9243 (cost=65.94 rows=1609)
In the preceding result, the
select_type
of the subquery isSUBQUERY
and theSelect #2
field is set toShared access
. The PolarDB optimizer runs the parallel subquery in advance and the result is shared among workers. In this query plan, the query block in the outer layer of the subquery cannot execute multi-node elastic parallel query due to policy restrictions. Multiple parallel workers for the outer query can execute single-node elastic parallel query within the node where the query is delivered.
View the execution time of various computing tasks
In PolarDB for MySQL 8.0.2, you can execute the EXPLAIN ANALYZE
statement to view the execution time of various computing tasks when parallel queries are used.
Sample statement:
EXPLAIN ANALYZE select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity) from customer, orders, lineitem where o_orderkey in ( select l_orderkey from lineitem group by l_orderkey having sum(l_quantity) > 313 ) and c_custkey = o_custkey and o_orderkey = l_orderkey group by c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice order by o_totalprice desc, o_orderdate LIMIT 100;
Sample result:
| -> Limit: 100 row(s) (cost=14.62 rows=1) (actual time=1.252..1.252 rows=0 loops=1) -> Gather (merge sort; slice: 1; workers: 1; nodes: 2) (cost=14.62 rows=1) (actual time=1.250..1.250 rows=0 loops=1) -> Limit: 100 row(s) (cost=4.52 rows=1) (actual time=0.084,0.084,0.084..0.084,0.084,0.084 rows=0,0,0 loops=1,1,1) -> Sort: <temporary>.O_TOTALPRICE DESC, <temporary>.O_ORDERDATE, limit input to 100 row(s) per chunk (cost=4.52 rows=1) (actual time=0.083,0.083,0.083..0.083,0.083,0.083 rows=0,0,0 loops=1,1,1) -> Table scan on <temporary> (actual time=0.070,0.070,0.070..0.070,0.070,0.070 rows=0,0,0 loops=1,1,1) -> Aggregate using temporary table (cost=4.52 rows=1) (actual time=0.001,0.001,0.001..0.001,0.001,0.001 rows=0,0,0 loops=1,1,1) -> Nested loop inner join (cost=2.86 rows=4) (actual time=0.039,0.039,0.039..0.039,0.039,0.039 rows=0,0,0 loops=1,1,1) -> Nested loop inner join (cost=1.45 rows=1) (actual time=0.037,0.037,0.037..0.037,0.037,0.037 rows=0,0,0 loops=1,1,1) -> Parallel table scan on customer, with parallel partitions: 1, partition_keys: 1 (cost=0.35 rows=1) (actual time=0.036,0.036,0.036..0.036,0.036,0.036 rows=0,0,0 loops=1,1,1) -> Filter: <in_optimizer>(orders.O_ORDERKEY,<exists>(select #2)) (cost=1.10 rows=1) -> Index lookup on orders using ORDERS_FK1 (O_CUSTKEY=customer.C_CUSTKEY) (cost=1.10 rows=1) -> Select #2 (subquery in condition; dependent) -> Limit: 1 row(s) -> Filter: ((sum(lineitem.L_QUANTITY) > 313) and (<cache>(orders.O_ORDERKEY) = <ref_null_helper>(lineitem.L_ORDERKEY))) -> Group aggregate: sum(lineitem.L_QUANTITY) -> Index scan on lineitem using PRIMARY (cost=41554048.20 rows=380071042) -> Index lookup on lineitem using PRIMARY (L_ORDERKEY=orders.O_ORDERKEY) (cost=1.41 rows=4)
If the EXPLAIN statement contains the ANALYZE keyword, the execution time of various computing tasks when parallel queries are used can be displayed. For parallel queries, the computing time of each worker in each operator is also recorded, including the maximum, minimum, and average computing time of each operator among multiple workers.