Use the EXPLAIN and EXPLAIN ANALYZE commands to analyze execution plans - AnalyticDB

Prerequisites

The version of the AnalyticDB for MySQL cluster is V3.1.3 or later.

Note For information about how to view the version of an AnalyticDB for MySQL cluster, see How can I view the version of an AnalyticDB for MySQL cluster? To upgrade the version, submit a ticket.

EXPLAIN

You can run the EXPLAIN command to evaluate the planned execution path for an SQL query statement. This evaluation is for reference only and is not given the same weight as the actual execution results.

Syntax
```
EXPLAIN (format text) <SELECT statement>;
```
Note If the SQL query statement is not complex, you can add the (format text) field to the EXPLAIN command to increase the readability of the tree hierarchy in the returned result.

Example

EXPLAIN (format text)
      SELECT count(*)
      FROM
      nation, region, customer
      WHERE
      c_nationkey = n_nationkey
      AND n_regionkey = r_regionkey
      AND r_name = 'ASIA';

The following result is returned:

Output[count(*)]
│   Outputs: [count:bigint]
│   Estimates: {rows: 1 (8B)}
│   count(*) := count
└─ Aggregate(FINAL)
   │   Outputs: [count:bigint]
   │   Estimates: {rows: 1 (8B)}
   │   count := count(`count_1`)
   └─ LocalExchange[SINGLE] ()
      │   Outputs: [count_0_1:bigint]
      │   Estimates: {rows: 1 (8B)}
      └─ RemoteExchange[GATHER]
         │   Outputs: [count_0_2:bigint]
         │   Estimates: {rows: 1 (8B)}
         └─ Aggregate(PARTIAL)
            │   Outputs: [count_0_4:bigint]
            │   Estimates: {rows: 1 (8B)}
            │   count_4 := count(*)
            └─ InnerJoin[(`c_nationkey` = `n_nationkey`)][$hashvalue, $hashvalue_0_6]
               │   Outputs: []
               │   Estimates: {rows: 302035 (4.61MB)}
               │   Distribution: REPLICATED
               ├─ Project[]
               │  │   Outputs: [c_nationkey:integer, $hashvalue:bigint]
               │  │   Estimates: {rows: 1500000 (5.72MB)}
               │  │   $hashvalue := `combine_hash`(BIGINT '0', COALESCE(`$operator$hash_code`(`c_nationkey`), 0))
               │  └─ RuntimeFilter
               │     │   Outputs: [c_nationkey:integer]
               │     │   Estimates: {rows: 1500000 (5.72MB)}
               │     ├─ TableScan[adb:AdbTableHandle{schema=tpch, tableName=customer, partitionColumnHandles=[c_custkey]}]
               │     │      Outputs: [c_nationkey:integer]
               │     │      Estimates: {rows: 1500000 (5.72MB)}
               │     │      c_nationkey := AdbColumnHandle{columnName=c_nationkey, type=4, isIndexed=true}
               │     └─ RuntimeCollect
               │        │   Outputs: [n_nationkey:integer]
               │        │   Estimates: {rows: 5 (60B)}
               │        └─ LocalExchange[ROUND_ROBIN] ()
               │           │   Outputs: [n_nationkey:integer]
               │           │   Estimates: {rows: 5 (60B)}
               │           └─ RuntimeScan
               │                  Outputs: [n_nationkey:integer]
               │                  Estimates: {rows: 5 (60B)}
               └─ LocalExchange[HASH][$hashvalue_0_6] ("n_nationkey")
                  │   Outputs: [n_nationkey:integer, $hashvalue_0_6:bigint]
                  │   Estimates: {rows: 5 (60B)}
                  └─ Project[]
                     │   Outputs: [n_nationkey:integer, $hashvalue_0_10:bigint]
                     │   Estimates: {rows: 5 (60B)}
                     │   $hashvalue_10 := `combine_hash`(BIGINT '0', COALESCE(`$operator$hash_code`(`n_nationkey`), 0))
                     └─ RemoteExchange[REPLICATE]
                        │   Outputs: [n_nationkey:integer]
                        │   Estimates: {rows: 5 (60B)}
                        └─ InnerJoin[(`n_regionkey` = `r_regionkey`)][$hashvalue_0_7, $hashvalue_0_8]
                           │   Outputs: [n_nationkey:integer]
                           │   Estimates: {rows: 5 (60B)}
                           │   Distribution: REPLICATED
                           ├─ Project[]
                           │  │   Outputs: [n_nationkey:integer, n_regionkey:integer, $hashvalue_0_7:bigint]
                           │  │   Estimates: {rows: 25 (200B)}
                           │  │   $hashvalue_7 := `combine_hash`(BIGINT '0', COALESCE(`$operator$hash_code`(`n_regionkey`), 0))
                           │  └─ RuntimeFilter
                           │     │   Outputs: [n_nationkey:integer, n_regionkey:integer]
                           │     │   Estimates: {rows: 25 (200B)}
                           │     ├─ TableScan[adb:AdbTableHandle{schema=tpch, tableName=nation, partitionColumnHandles=[]}]
                           │     │      Outputs: [n_nationkey:integer, n_regionkey:integer]
                           │     │      Estimates: {rows: 25 (200B)}
                           │     │      n_nationkey := AdbColumnHandle{columnName=n_nationkey, type=4, isIndexed=true}
                           │     │      n_regionkey := AdbColumnHandle{columnName=n_regionkey, type=4, isIndexed=true}
                           │     └─ RuntimeCollect
                           │        │   Outputs: [r_regionkey:integer]
                           │        │   Estimates: {rows: 1 (4B)}
                           │        └─ LocalExchange[ROUND_ROBIN] ()
                           │           │   Outputs: [r_regionkey:integer]
                           │           │   Estimates: {rows: 1 (4B)}
                           │           └─ RuntimeScan
                           │                  Outputs: [r_regionkey:integer]
                           │                  Estimates: {rows: 1 (4B)}
                           └─ LocalExchange[HASH][$hashvalue_0_8] ("r_regionkey")
                              │   Outputs: [r_regionkey:integer, $hashvalue_0_8:bigint]
                              │   Estimates: {rows: 1 (4B)}
                              └─ ScanProject[table = adb:AdbTableHandle{schema=tpch, tableName=region, partitionColumnHandles=[]}]
                                     Outputs: [r_regionkey:integer, $hashvalue_0_9:bigint]
                                     Estimates: {rows: 1 (4B)}/{rows: 1 (B)}
                                     $hashvalue_9 := `combine_hash`(BIGINT '0', COALESCE(`$operator$hash_code`(`r_regionkey`), 0))
                                     r_regionkey := AdbColumnHandle{columnName=r_regionkey, type=4, isIndexed=true}

The following table describes the main parameters in the preceding result.


Parameter	Description
`Outputs: [symbol:type]`	The output column and data type of each operator.
`Estimates: {rows: %s (%sB)}`	The estimated number of rows and data volume of each operator. The data can be used to determine the join ordering and data shuffling techniques of the optimizer.

EXPLAIN ANALYZE

You can run the EXPLAIN ANALYZE command to query the distributed execution plans and actual execution costs of queries, including the execution durations, memory usage, and input and output data volume.

Syntax
```
EXPLAIN ANALYZE <SELECT statement>;
```

Example

EXPLAIN ANALYZE
SELECT count(*)
      FROM
      nation, region, customer
      WHERE
      c_nationkey = n_nationkey
      AND n_regionkey = r_regionkey
      AND r_name = 'ASIA';

The following result is returned:

Fragment 1 [SINGLE]
    Output: 1 row (9B), PeakMemory: 178KB, WallTime: 1.00ns, Input: 32 rows (288B); per task: avg.: 32.00 std.dev.: 0.00
    Output layout: [count]
    Output partitioning: SINGLE []
    Aggregate(FINAL)
    │   Outputs: [count:bigint]
    │   Estimates: {rows: 1 (8B)}
    │   Output: 2 rows (18B), PeakMemory: 24B (0.00%), WallTime: 70.39us (0.03%)
    │   count := count(`count_1`)
    └─ LocalExchange[SINGLE] ()
       │   Outputs: [count1:bigint]
       │   Estimates: {rows: ? (?)}
       │   Output: 64 rows (576B), PeakMemory: 8KB (0.07%), WallTime: 238.69us (0.10%)
       └─ RemoteSource[2]
              Outputs: [count2:bigint]
              Estimates:
              Output: 32 rows (288B), PeakMemory: 32KB (0.27%), WallTime: 182.82us (0.08%)
              Input avg.: 4.00 rows, Input std.dev.: 264.58%

Fragment 2 [adb:AdbPartitioningHandle{schema=tpch, tableName=customer, dimTable=false, shards=32, tableEngineType=Cstore, partitionColumns=c_custkey, prunedBuckets= empty}]
    Output: 32 rows (288B), PeakMemory: 6MB, WallTime: 164.00ns, Input: 1500015 rows (20.03MB); per task: avg.: 500005.00 std.dev.: 21941.36
    Output layout: [count4]
    Output partitioning: SINGLE []
    Aggregate(PARTIAL)
    │   Outputs: [count4:bigint]
    │   Estimates: {rows: 1 (8B)}
    │   Output: 64 rows (576B), PeakMemory: 336B (0.00%), WallTime: 1.01ms (0.42%)
    │   count_4 := count(*)
    └─ INNER Join[(`c_nationkey` = `n_nationkey`)][$hashvalue, $hashvalue6]
       │   Outputs: []
       │   Estimates: {rows: 302035 (4.61MB)}
       │   Output: 300285 rows (210B), PeakMemory: 641KB (5.29%), WallTime: 99.08ms (41.45%)
       │   Left (probe) Input avg.: 46875.00 rows, Input std.dev.: 311.24%
       │   Right (build) Input avg.: 0.63 rows, Input std.dev.: 264.58%
       │   Distribution: REPLICATED
       ├─ ScanProject[table = adb:AdbTableHandle{schema=tpch, tableName=customer, partitionColumnHandles=[c_custkey]}]
       │      Outputs: [c_nationkey:integer, $hashvalue:bigint]
       │      Estimates: {rows: 1500000 (5.72MB)}/{rows: 1500000 (5.72MB)}
       │      Output: 1500000 rows (20.03MB), PeakMemory: 5MB (44.38%), WallTime: 68.29ms (28.57%)
       │      $hashvalue := `combine_hash`(BIGINT '0', COALESCE(`$operator$hash_code`(`c_nationkey`), 0))
       │      c_nationkey := AdbColumnHandle{columnName=c_nationkey, type=4, isIndexed=true}
       │      Input: 1500000 rows (7.15MB), Filtered: 0.00%
       └─ LocalExchange[HASH][$hashvalue6] ("n_nationkey")
          │   Outputs: [n_nationkey:integer, $hashvalue6:bigint]
          │   Estimates: {rows: 5 (60B)}
          │   Output: 30 rows (420B), PeakMemory: 394KB (3.26%), WallTime: 455.03us (0.19%)
          └─ Project[]
             │   Outputs: [n_nationkey:integer, $hashvalue10:bigint]
             │   Estimates: {rows: 5 (60B)}
             │   Output: 15 rows (210B), PeakMemory: 24KB (0.20%), WallTime: 83.61us (0.03%)
             │   Input avg.: 0.63 rows, Input std.dev.: 264.58%
             │   $hashvalue_10 := `combine_hash`(BIGINT '0', COALESCE(`$operator$hash_code`(`n_nationkey`), 0))
             └─ RemoteSource[3]
                    Outputs: [n_nationkey:integer]
                    Estimates:
                    Output: 15 rows (75B), PeakMemory: 24KB (0.20%), WallTime: 45.97us (0.02%)
                    Input avg.: 0.63 rows, Input std.dev.: 264.58%

Fragment 3 [adb:AdbPartitioningHandle{schema=tpch, tableName=nation, dimTable=true, shards=32, tableEngineType=Cstore, partitionColumns=, prunedBuckets= empty}]
    Output: 5 rows (25B), PeakMemory: 185KB, WallTime: 1.00ns, Input: 26 rows (489B); per task: avg.: 26.00 std.dev.: 0.00
    Output layout: [n_nationkey]
    Output partitioning: BROADCAST []
    INNER Join[(`n_regionkey` = `r_regionkey`)][$hashvalue7, $hashvalue8]
    │   Outputs: [n_nationkey:integer]
    │   Estimates: {rows: 5 (60B)}
    │   Output: 11 rows (64B), PeakMemory: 152KB (1.26%), WallTime: 255.86us (0.11%)
    │   Left (probe) Input avg.: 25.00 rows, Input std.dev.: 0.00%
    │   Right (build) Input avg.: 0.13 rows, Input std.dev.: 264.58%
    │   Distribution: REPLICATED
    ├─ ScanProject[table = adb:AdbTableHandle{schema=tpch, tableName=nation, partitionColumnHandles=[]}]
    │      Outputs: [n_nationkey:integer, n_regionkey:integer, $hashvalue7:bigint]
    │      Estimates: {rows: 25 (200B)}/{rows: 25 (200B)}
    │      Output: 25 rows (475B), PeakMemory: 16KB (0.13%), WallTime: 178.81us (0.07%)
    │      $hashvalue_7 := `combine_hash`(BIGINT '0', COALESCE(`$operator$hash_code`(`n_regionkey`), 0))
    │      n_nationkey := AdbColumnHandle{columnName=n_nationkey, type=4, isIndexed=true}
    │      n_regionkey := AdbColumnHandle{columnName=n_regionkey, type=4, isIndexed=true}
    │      Input: 25 rows (250B), Filtered: 0.00%
    └─ LocalExchange[HASH][$hashvalue8] ("r_regionkey")
       │   Outputs: [r_regionkey:integer, $hashvalue8:bigint]
       │   Estimates: {rows: 1 (4B)}
       │   Output: 2 rows (28B), PeakMemory: 34KB (0.29%), WallTime: 57.41us (0.02%)
       └─ ScanProject[table = adb:AdbTableHandle{schema=tpch, tableName=region, partitionColumnHandles=[]}]
              Outputs: [r_regionkey:integer, $hashvalue9:bigint]
              Estimates: {rows: 1 (4B)}/{rows: 1 (4B)}
              Output: 1 row (14B), PeakMemory: 8KB (0.07%), WallTime: 308.99us (0.13%)
              $hashvalue_9 := `combine_hash`(BIGINT '0', COALESCE(`$operator$hash_code`(`r_regionkey`), 0))
              r_regionkey := AdbColumnHandle{columnName=r_regionkey, type=4, isIndexed=true}
              Input: 1 row (5B), Filtered: 0.00%

The following table describes the main parameters in the preceding result.


Parameter	Description
`Outputs: [symbol:type]`	The output column and data type of each operator.
`Estimates: {rows: %s (%sB)}`	The estimated number of rows and data volume of each operator. The data can be used to determine the join ordering and data shuffling techniques of the optimizer.
`PeakMemory: %s`	The total memory usage. This parameter is used to analyze the causes for bottlenecks that occur during memory use.
`WallTime: %s`	The total execution duration of operators. This parameter is used to analyze the causes for bottlenecks that occur during computing operations. Note This duration is not the actual execution duration due to parallel computing.
`Input: %s rows (%sB)`	The number of input rows and the data volume.
`per task: avg.: %s std.dev.: %s`	The average number of rows and its standard deviation. These parameters are used to analyze the data skews that exist in a stage.
`Output: %s row (%sB)`	The number of output rows and the data volume.

Scenarios

You can run the EXPLAIN ANALYZE command to analyze common issues associated with execution plans.

A filter is not pushed down.

SQL 1 and SQL 2 in the following section indicate two SQL queries. Compared with SQL 1, SQL 2 contains the length(string_test) function that cannot be pushed down. When SQL 2 is executed, full data is scanned.

SQL 1

SELECT count(*) FROM test WHERE string_test = 'a';

SQL 2

SELECT count(*) FROM test WHERE length(string_test) = 1;

You can run the EXPLAIN ANALYZE command to analyze the execution plans of the preceding two SQL queries. The following section compares Fragment 2 in the two execution plans:

SQL 1 uses the TableScan operator and the Input avg. parameter is set to 0.00 rows. This indicates that the filter is pushed down and the number of scanned rows is zero.

Fragment 2 [adb:AdbPartitioningHandle{schema=test4dmp, tableName=test, dimTable=false, shards=4, tableEngineType=Cstore, partitionColumns=id, prunedBuckets= empty}]
    Output: 4 rows (36B), PeakMemory: 0B, WallTime: 6.00ns, Input: 0 rows (0B); per task: avg.: 0.00 std.dev.: 0.00
    Output layout: [count_0_1]
    Output partitioning: SINGLE []
    Aggregate(PARTIAL)
    │   Outputs: [count_0_1:bigint]
    │   Estimates: {rows: 1 (8B)}
    │   Output: 8 rows (72B), PeakMemory: 0B (0.00%), WallTime: 212.92us (3.99%)
    │   count_0_1 := count(*)
    └─ TableScan[adb:AdbTableHandle{schema=test4dmp, tableName=test, partitionColumnHandles=[id]}]
           Outputs: []
           Estimates: {rows: 4 (0B)}
           Output: 0 rows (0B), PeakMemory: 0B (0.00%), WallTime: 4.76ms (89.12%)
           Input avg.: 0.00 rows, Input std.dev.: ?%

SQL 2 uses the ScanFilterProject operator and the Input parameter is set to 9999 rows. In addition, the filterPredicate property is set to (`test4dmp`.`length`(`string_test`) = BIGINT '1'). This indicates that the filter is not pushed down and the number of scanned rows is 9999.

Fragment 2 [adb:AdbPartitioningHandle{schema=test4dmp, tableName=test, dimTable=false, shards=4, tableEngineType=Cstore, partitionColumns=id, prunedBuckets= empty}]
    Output: 4 rows (36B), PeakMemory: 0B, WallTime: 102.00ns, Input: 0 rows (0B); per task: avg.: 0.00 std.dev.: 0.00
    Output layout: [count_0_1]
    Output partitioning: SINGLE []
    Aggregate(PARTIAL)
    │   Outputs: [count_0_1:bigint]
    │   Estimates: {rows: 1 (8B)}
    │   Output: 8 rows (72B), PeakMemory: 0B (0.00%), WallTime: 252.23us (0.12%)
    │   count_0_1 := count(*)
    └─ ScanFilterProject[table = adb:AdbTableHandle{schema=test4dmp, tableName=test, partitionColumnHandles=[id]}, filterPredicate = (`test4dmp`.`length`(`string_test`) = BIGINT '1')]
           Outputs: []
           Estimates: {rows: 9999 (312.47kB)}/{rows: 9999 (312.47kB)}/{rows: ? (?)}
           Output: 0 rows (0B), PeakMemory: 0B (0.00%), WallTime: 101.31ms (49.84%)
           string_test := AdbColumnHandle{columnName=string_test, type=13, isIndexed=true}
           Input: 9999 rows (110.32kB), Filtered: 100.00%

Bad SQL memory usage
You can view the total memory usage of each fragment to troubleshoot resource consumption issues by using the PeakMemory parameter. Disaster broadcasts may result in high total memory usage. Except in these scenarios, a large value for the PeakMemory parameter is usually caused by bloats of joined data, a large volume of data in the left joined table, or a large volume of data scanned by the TableScan operator. You must limit these values by adding conditions based on business requirements. You can also view the total memory usage of each operator to identify the operator that consumes the most resources by using the PeakMemory parameter.