This topic describes how to run the EXPLAIN
and EXPLAIN ANALYZE
commands to analyze execution plans.
Prerequisites
The version of the AnalyticDB for MySQL cluster is V3.1.3 or later.
EXPLAIN
You can run the EXPLAIN
command to evaluate the planned execution path for an SQL query statement. This evaluation
is for reference only and is not given the same weight as the actual execution results.
- Syntax
EXPLAIN (format text) <SELECT statement>;
Note If the SQL query statement is not complex, you can add the(format text)
field to theEXPLAIN
command to increase the readability of the tree hierarchy in the returned result. - Example
EXPLAIN (format text) SELECT count(*) FROM nation, region, customer WHERE c_nationkey = n_nationkey AND n_regionkey = r_regionkey AND r_name = 'ASIA';
The following result is returned:Output[count(*)] │ Outputs: [count:bigint] │ Estimates: {rows: 1 (8B)} │ count(*) := count └─ Aggregate(FINAL) │ Outputs: [count:bigint] │ Estimates: {rows: 1 (8B)} │ count := count(`count_1`) └─ LocalExchange[SINGLE] () │ Outputs: [count_0_1:bigint] │ Estimates: {rows: 1 (8B)} └─ RemoteExchange[GATHER] │ Outputs: [count_0_2:bigint] │ Estimates: {rows: 1 (8B)} └─ Aggregate(PARTIAL) │ Outputs: [count_0_4:bigint] │ Estimates: {rows: 1 (8B)} │ count_4 := count(*) └─ InnerJoin[(`c_nationkey` = `n_nationkey`)][$hashvalue, $hashvalue_0_6] │ Outputs: [] │ Estimates: {rows: 302035 (4.61MB)} │ Distribution: REPLICATED ├─ Project[] │ │ Outputs: [c_nationkey:integer, $hashvalue:bigint] │ │ Estimates: {rows: 1500000 (5.72MB)} │ │ $hashvalue := `combine_hash`(BIGINT '0', COALESCE(`$operator$hash_code`(`c_nationkey`), 0)) │ └─ RuntimeFilter │ │ Outputs: [c_nationkey:integer] │ │ Estimates: {rows: 1500000 (5.72MB)} │ ├─ TableScan[adb:AdbTableHandle{schema=tpch, tableName=customer, partitionColumnHandles=[c_custkey]}] │ │ Outputs: [c_nationkey:integer] │ │ Estimates: {rows: 1500000 (5.72MB)} │ │ c_nationkey := AdbColumnHandle{columnName=c_nationkey, type=4, isIndexed=true} │ └─ RuntimeCollect │ │ Outputs: [n_nationkey:integer] │ │ Estimates: {rows: 5 (60B)} │ └─ LocalExchange[ROUND_ROBIN] () │ │ Outputs: [n_nationkey:integer] │ │ Estimates: {rows: 5 (60B)} │ └─ RuntimeScan │ Outputs: [n_nationkey:integer] │ Estimates: {rows: 5 (60B)} └─ LocalExchange[HASH][$hashvalue_0_6] ("n_nationkey") │ Outputs: [n_nationkey:integer, $hashvalue_0_6:bigint] │ Estimates: {rows: 5 (60B)} └─ Project[] │ Outputs: [n_nationkey:integer, $hashvalue_0_10:bigint] │ Estimates: {rows: 5 (60B)} │ $hashvalue_10 := `combine_hash`(BIGINT '0', COALESCE(`$operator$hash_code`(`n_nationkey`), 0)) └─ RemoteExchange[REPLICATE] │ Outputs: [n_nationkey:integer] │ Estimates: {rows: 5 (60B)} └─ InnerJoin[(`n_regionkey` = `r_regionkey`)][$hashvalue_0_7, $hashvalue_0_8] │ Outputs: [n_nationkey:integer] │ Estimates: {rows: 5 (60B)} │ Distribution: REPLICATED ├─ Project[] │ │ Outputs: [n_nationkey:integer, n_regionkey:integer, $hashvalue_0_7:bigint] │ │ Estimates: {rows: 25 (200B)} │ │ $hashvalue_7 := `combine_hash`(BIGINT '0', COALESCE(`$operator$hash_code`(`n_regionkey`), 0)) │ └─ RuntimeFilter │ │ Outputs: [n_nationkey:integer, n_regionkey:integer] │ │ Estimates: {rows: 25 (200B)} │ ├─ TableScan[adb:AdbTableHandle{schema=tpch, tableName=nation, partitionColumnHandles=[]}] │ │ Outputs: [n_nationkey:integer, n_regionkey:integer] │ │ Estimates: {rows: 25 (200B)} │ │ n_nationkey := AdbColumnHandle{columnName=n_nationkey, type=4, isIndexed=true} │ │ n_regionkey := AdbColumnHandle{columnName=n_regionkey, type=4, isIndexed=true} │ └─ RuntimeCollect │ │ Outputs: [r_regionkey:integer] │ │ Estimates: {rows: 1 (4B)} │ └─ LocalExchange[ROUND_ROBIN] () │ │ Outputs: [r_regionkey:integer] │ │ Estimates: {rows: 1 (4B)} │ └─ RuntimeScan │ Outputs: [r_regionkey:integer] │ Estimates: {rows: 1 (4B)} └─ LocalExchange[HASH][$hashvalue_0_8] ("r_regionkey") │ Outputs: [r_regionkey:integer, $hashvalue_0_8:bigint] │ Estimates: {rows: 1 (4B)} └─ ScanProject[table = adb:AdbTableHandle{schema=tpch, tableName=region, partitionColumnHandles=[]}] Outputs: [r_regionkey:integer, $hashvalue_0_9:bigint] Estimates: {rows: 1 (4B)}/{rows: 1 (B)} $hashvalue_9 := `combine_hash`(BIGINT '0', COALESCE(`$operator$hash_code`(`r_regionkey`), 0)) r_regionkey := AdbColumnHandle{columnName=r_regionkey, type=4, isIndexed=true}
The following table describes the main parameters in the preceding result.Parameter Description Outputs: [symbol:type]
The output column and data type of each operator. Estimates: {rows: %s (%sB)}
The estimated number of rows and data volume of each operator. The data can be used to determine the join ordering and data shuffling techniques of the optimizer.
EXPLAIN ANALYZE
You can run the EXPLAIN ANALYZE
command to query the distributed execution plans and actual execution costs of queries,
including the execution durations, memory usage, and input and output data volume.
- Syntax
EXPLAIN ANALYZE <SELECT statement>;
- Example
EXPLAIN ANALYZE SELECT count(*) FROM nation, region, customer WHERE c_nationkey = n_nationkey AND n_regionkey = r_regionkey AND r_name = 'ASIA';
The following result is returned:Fragment 1 [SINGLE] Output: 1 row (9B), PeakMemory: 178KB, WallTime: 1.00ns, Input: 32 rows (288B); per task: avg.: 32.00 std.dev.: 0.00 Output layout: [count] Output partitioning: SINGLE [] Aggregate(FINAL) │ Outputs: [count:bigint] │ Estimates: {rows: 1 (8B)} │ Output: 2 rows (18B), PeakMemory: 24B (0.00%), WallTime: 70.39us (0.03%) │ count := count(`count_1`) └─ LocalExchange[SINGLE] () │ Outputs: [count1:bigint] │ Estimates: {rows: ? (?)} │ Output: 64 rows (576B), PeakMemory: 8KB (0.07%), WallTime: 238.69us (0.10%) └─ RemoteSource[2] Outputs: [count2:bigint] Estimates: Output: 32 rows (288B), PeakMemory: 32KB (0.27%), WallTime: 182.82us (0.08%) Input avg.: 4.00 rows, Input std.dev.: 264.58% Fragment 2 [adb:AdbPartitioningHandle{schema=tpch, tableName=customer, dimTable=false, shards=32, tableEngineType=Cstore, partitionColumns=c_custkey, prunedBuckets= empty}] Output: 32 rows (288B), PeakMemory: 6MB, WallTime: 164.00ns, Input: 1500015 rows (20.03MB); per task: avg.: 500005.00 std.dev.: 21941.36 Output layout: [count4] Output partitioning: SINGLE [] Aggregate(PARTIAL) │ Outputs: [count4:bigint] │ Estimates: {rows: 1 (8B)} │ Output: 64 rows (576B), PeakMemory: 336B (0.00%), WallTime: 1.01ms (0.42%) │ count_4 := count(*) └─ INNER Join[(`c_nationkey` = `n_nationkey`)][$hashvalue, $hashvalue6] │ Outputs: [] │ Estimates: {rows: 302035 (4.61MB)} │ Output: 300285 rows (210B), PeakMemory: 641KB (5.29%), WallTime: 99.08ms (41.45%) │ Left (probe) Input avg.: 46875.00 rows, Input std.dev.: 311.24% │ Right (build) Input avg.: 0.63 rows, Input std.dev.: 264.58% │ Distribution: REPLICATED ├─ ScanProject[table = adb:AdbTableHandle{schema=tpch, tableName=customer, partitionColumnHandles=[c_custkey]}] │ Outputs: [c_nationkey:integer, $hashvalue:bigint] │ Estimates: {rows: 1500000 (5.72MB)}/{rows: 1500000 (5.72MB)} │ Output: 1500000 rows (20.03MB), PeakMemory: 5MB (44.38%), WallTime: 68.29ms (28.57%) │ $hashvalue := `combine_hash`(BIGINT '0', COALESCE(`$operator$hash_code`(`c_nationkey`), 0)) │ c_nationkey := AdbColumnHandle{columnName=c_nationkey, type=4, isIndexed=true} │ Input: 1500000 rows (7.15MB), Filtered: 0.00% └─ LocalExchange[HASH][$hashvalue6] ("n_nationkey") │ Outputs: [n_nationkey:integer, $hashvalue6:bigint] │ Estimates: {rows: 5 (60B)} │ Output: 30 rows (420B), PeakMemory: 394KB (3.26%), WallTime: 455.03us (0.19%) └─ Project[] │ Outputs: [n_nationkey:integer, $hashvalue10:bigint] │ Estimates: {rows: 5 (60B)} │ Output: 15 rows (210B), PeakMemory: 24KB (0.20%), WallTime: 83.61us (0.03%) │ Input avg.: 0.63 rows, Input std.dev.: 264.58% │ $hashvalue_10 := `combine_hash`(BIGINT '0', COALESCE(`$operator$hash_code`(`n_nationkey`), 0)) └─ RemoteSource[3] Outputs: [n_nationkey:integer] Estimates: Output: 15 rows (75B), PeakMemory: 24KB (0.20%), WallTime: 45.97us (0.02%) Input avg.: 0.63 rows, Input std.dev.: 264.58% Fragment 3 [adb:AdbPartitioningHandle{schema=tpch, tableName=nation, dimTable=true, shards=32, tableEngineType=Cstore, partitionColumns=, prunedBuckets= empty}] Output: 5 rows (25B), PeakMemory: 185KB, WallTime: 1.00ns, Input: 26 rows (489B); per task: avg.: 26.00 std.dev.: 0.00 Output layout: [n_nationkey] Output partitioning: BROADCAST [] INNER Join[(`n_regionkey` = `r_regionkey`)][$hashvalue7, $hashvalue8] │ Outputs: [n_nationkey:integer] │ Estimates: {rows: 5 (60B)} │ Output: 11 rows (64B), PeakMemory: 152KB (1.26%), WallTime: 255.86us (0.11%) │ Left (probe) Input avg.: 25.00 rows, Input std.dev.: 0.00% │ Right (build) Input avg.: 0.13 rows, Input std.dev.: 264.58% │ Distribution: REPLICATED ├─ ScanProject[table = adb:AdbTableHandle{schema=tpch, tableName=nation, partitionColumnHandles=[]}] │ Outputs: [n_nationkey:integer, n_regionkey:integer, $hashvalue7:bigint] │ Estimates: {rows: 25 (200B)}/{rows: 25 (200B)} │ Output: 25 rows (475B), PeakMemory: 16KB (0.13%), WallTime: 178.81us (0.07%) │ $hashvalue_7 := `combine_hash`(BIGINT '0', COALESCE(`$operator$hash_code`(`n_regionkey`), 0)) │ n_nationkey := AdbColumnHandle{columnName=n_nationkey, type=4, isIndexed=true} │ n_regionkey := AdbColumnHandle{columnName=n_regionkey, type=4, isIndexed=true} │ Input: 25 rows (250B), Filtered: 0.00% └─ LocalExchange[HASH][$hashvalue8] ("r_regionkey") │ Outputs: [r_regionkey:integer, $hashvalue8:bigint] │ Estimates: {rows: 1 (4B)} │ Output: 2 rows (28B), PeakMemory: 34KB (0.29%), WallTime: 57.41us (0.02%) └─ ScanProject[table = adb:AdbTableHandle{schema=tpch, tableName=region, partitionColumnHandles=[]}] Outputs: [r_regionkey:integer, $hashvalue9:bigint] Estimates: {rows: 1 (4B)}/{rows: 1 (4B)} Output: 1 row (14B), PeakMemory: 8KB (0.07%), WallTime: 308.99us (0.13%) $hashvalue_9 := `combine_hash`(BIGINT '0', COALESCE(`$operator$hash_code`(`r_regionkey`), 0)) r_regionkey := AdbColumnHandle{columnName=r_regionkey, type=4, isIndexed=true} Input: 1 row (5B), Filtered: 0.00%
The following table describes the main parameters in the preceding result.Parameter Description Outputs: [symbol:type]
The output column and data type of each operator. Estimates: {rows: %s (%sB)}
The estimated number of rows and data volume of each operator. The data can be used to determine the join ordering and data shuffling techniques of the optimizer. PeakMemory: %s
The total memory usage. This parameter is used to analyze the causes for bottlenecks that occur during memory use. WallTime: %s
The total execution duration of operators. This parameter is used to analyze the causes for bottlenecks that occur during computing operations. Note This duration is not the actual execution duration due to parallel computing.Input: %s rows (%sB)
The number of input rows and the data volume. per task: avg.: %s std.dev.: %s
The average number of rows and its standard deviation. These parameters are used to analyze the data skews that exist in a stage. Output: %s row (%sB)
The number of output rows and the data volume.
Scenarios
EXPLAIN ANALYZE
command to analyze common issues associated with execution plans.
- A filter is not pushed down.SQL 1 and SQL 2 in the following section indicate two SQL queries. Compared with SQL 1, SQL 2 contains the
length(string_test)
function that cannot be pushed down. When SQL 2 is executed, full data is scanned.- SQL 1
SELECT count(*) FROM test WHERE string_test = 'a';
- SQL 2
SELECT count(*) FROM test WHERE length(string_test) = 1;
You can run theEXPLAIN ANALYZE
command to analyze the execution plans of the preceding two SQL queries. The following section compares Fragment 2 in the two execution plans:- SQL 1 uses the
TableScan
operator and theInput avg.
parameter is set to0.00 rows
. This indicates that the filter is pushed down and the number of scanned rows is zero.Fragment 2 [adb:AdbPartitioningHandle{schema=test4dmp, tableName=test, dimTable=false, shards=4, tableEngineType=Cstore, partitionColumns=id, prunedBuckets= empty}] Output: 4 rows (36B), PeakMemory: 0B, WallTime: 6.00ns, Input: 0 rows (0B); per task: avg.: 0.00 std.dev.: 0.00 Output layout: [count_0_1] Output partitioning: SINGLE [] Aggregate(PARTIAL) │ Outputs: [count_0_1:bigint] │ Estimates: {rows: 1 (8B)} │ Output: 8 rows (72B), PeakMemory: 0B (0.00%), WallTime: 212.92us (3.99%) │ count_0_1 := count(*) └─ TableScan[adb:AdbTableHandle{schema=test4dmp, tableName=test, partitionColumnHandles=[id]}] Outputs: [] Estimates: {rows: 4 (0B)} Output: 0 rows (0B), PeakMemory: 0B (0.00%), WallTime: 4.76ms (89.12%) Input avg.: 0.00 rows, Input std.dev.: ?%
- SQL 2 uses the
ScanFilterProject
operator and theInput
parameter is set to9999 rows
. In addition, thefilterPredicate
property is set to(`test4dmp`.`length`(`string_test`) = BIGINT '1')
. This indicates that the filter is not pushed down and the number of scanned rows is 9999.Fragment 2 [adb:AdbPartitioningHandle{schema=test4dmp, tableName=test, dimTable=false, shards=4, tableEngineType=Cstore, partitionColumns=id, prunedBuckets= empty}] Output: 4 rows (36B), PeakMemory: 0B, WallTime: 102.00ns, Input: 0 rows (0B); per task: avg.: 0.00 std.dev.: 0.00 Output layout: [count_0_1] Output partitioning: SINGLE [] Aggregate(PARTIAL) │ Outputs: [count_0_1:bigint] │ Estimates: {rows: 1 (8B)} │ Output: 8 rows (72B), PeakMemory: 0B (0.00%), WallTime: 252.23us (0.12%) │ count_0_1 := count(*) └─ ScanFilterProject[table = adb:AdbTableHandle{schema=test4dmp, tableName=test, partitionColumnHandles=[id]}, filterPredicate = (`test4dmp`.`length`(`string_test`) = BIGINT '1')] Outputs: [] Estimates: {rows: 9999 (312.47kB)}/{rows: 9999 (312.47kB)}/{rows: ? (?)} Output: 0 rows (0B), PeakMemory: 0B (0.00%), WallTime: 101.31ms (49.84%) string_test := AdbColumnHandle{columnName=string_test, type=13, isIndexed=true} Input: 9999 rows (110.32kB), Filtered: 100.00%
- SQL 1
- Bad SQL memory usage
You can view the total memory usage of each fragment to troubleshoot resource consumption issues by using the
PeakMemory
parameter. Disaster broadcasts may result in high total memory usage. Except in these scenarios, a large value for thePeakMemory
parameter is usually caused by bloats of joined data, a large volume of data in the left joined table, or a large volume of data scanned by theTableScan
operator. You must limit these values by adding conditions based on business requirements. You can also view the total memory usage of each operator to identify the operator that consumes the most resources by using thePeakMemory
parameter.