全部產品
Search
文件中心

PolarDB:使用EXPLAIN查看並行計劃

更新時間:Jul 06, 2024

本文介紹如何使用EXPLAIN語句查看執行計畫輸出中與並行查詢相關的內容。

查詢用表

本文樣本中使用pq_test表進行並行查詢測試。

  • 表結構如下:

    SHOW CREATE TABLE pq_test\G
    *************************** 1. row ***************************
    Table: pq_test
    Create Table: CREATE TABLE `pq_test` (
      `id` BIGINT(20) NOT NULL AUTO_INCREMENT,
      `help_topic_id` INT(10) UNSIGNED NOT NULL,
      `name` CHAR(64) NOT NULL,
      `help_category_id` SMALLINT(5) UNSIGNED NOT NULL,
      `description` TEXT NOT NULL,
      `example` TEXT NOT NULL,
      `url` TEXT NOT NULL,
      PRIMARY KEY (`id`)
    ) ENGINE=InnoDB AUTO_INCREMENT=21495809 DEFAULT CHARSET=utf8
    1 row in set (0.00 sec)
  • 表大小如下:

    SHOW TABLE STATUS\G
    *************************** 1. row ***************************
               Name: pq_test
             Engine: InnoDB
            Version: 10
         Row_format: Dynamic
               Rows: 20064988
     Avg_row_length: 1898
        Data_length: 38085328896
    Max_data_length: 0
       Index_length: 0
          Data_free: 4194304
     Auto_increment: 21495809
        Create_time: 2019-07-30 01:35:27
        Update_time: NULL
         Check_time: NULL
          Collation: utf8_general_ci
           Checksum: NULL
     Create_options:
            Comment:
    1 row in set (0.02 sec)
  • 查詢SQL如下:

    SELECT COUNT(*) FROM pq_test;

EXPLAIN查詢語句

不使用並行查詢的執行情況

通過EXPLAIN語句查看不使用並行查詢的執行情況。

查詢語句如下:

SET max_parallel_degree=0; EXPLAIN SELECT COUNT(*) FROM pq_test\G

查詢結果如下:

*************************** 1. row ***************************
           Id: 1
  Select_type: SIMPLE
        Table: pq_test
  Partitions: NULL
         Type: index
Possible_keys: NULL
          Key: PRIMARY
      Key_len: 8
          Ref: NULL
         Rows: 20064988
     Filtered: 100.00
        Extra: Using index
1 row in set, 1 warning (0.03 sec)

通過EXPLAIN語句使用並行查詢的執行情況

  • PolarDB MySQL版8.0.1版本中,通過EXPLAIN語句查看使用並行查詢的情況。

    查詢語句如下:

    EXPLAIN SELECT COUNT(*) FROM pq_test\G

    查詢結果如下:

    *************************** 1. row ***************************
               Id: 1
      Select_type: SIMPLE
            Table: <gather2>
       Partitions: NULL
             Type: ALL
    Possible_keys: NULL
              Key: NULL
          Key_len: NULL
              Ref: NULL
             Rows: 20064988
         Filtered: 100.00
            Extra: NULL
    *************************** 2. row ***************************
               Id: 2
      Select_type: SIMPLE
            Table: pq_test
       Partitions: NULL
             Type: index
    Possible_keys: NULL
              Key: PRIMARY
          Key_len: 8
              Ref: NULL
             Rows: 10032494
         Filtered: 100.00
            Extra: Parallel scan (2 workers); Using index
    2 rows in set, 1 warning (0.00 sec)
  • PolarDB MySQL版8.0.2版本中,通過EXPLAIN語句查看使用並行查詢的情況,可以使用更為清晰的EXPLAIN FORMAT=TREE查詢。

    查詢語句如下:

    EXPLAIN FORMAT=TREE SELECT COUNT(*) FROM pq_test\G

    查詢結果如下:

    *************************** 1. row ***************************
    EXPLAIN: -> Count rows in gather  (cost=... rows=1)
        -> Gather (slice: 1; workers: 2, nodes: 2)
            -> Parallel index scan on pq_test using PRIMARY, with parallel partitions: 8  (cost=... rows=20064988)

從上述結果可以看出:

  • 上述並行計劃中包含了Gather操作,該操作負責匯總所有Worker返回的中間結果。

  • 從執行計畫輸出的Extra資訊中可以看到pq_test表使用了Parallel scan(並行掃描)策略,期望用4個Workers來並存執行。

  • Gather操作中顯示了使用的worker數量和節點數量,這裡一共使用了2個節點,每個節點2個worker。

通過帶有子查詢的EXPLAIN語句查看使用並行查詢的執行情況

  • PolarDB MySQL版8.0.1版本中,通過帶有子查詢的EXPLAIN語句查看使用並行查詢的情況。

    查詢語句如下:

    EXPLAIN SELECT
        o_orderpriority,
        COUNT(*) as order_count
    FROM
        orders
    WHERE
        o_orderdate >= '1994-04-01'
        AND o_orderdate < date_add('1994-04-01', interval '3' month)
        AND exists (
            SELECT
                *
            FROM
                lineitem
            WHERE
                l_orderkey = o_orderkey
                and l_commitdate < l_receiptdate
        )
    GROUP BY
        o_orderpriority
    ORDER BY
        o_orderpriority\G

    查詢結果如下:

    *************************** 1. row ***************************
               id: 1
      select_type: PRIMARY
            table: <gather1.1>
       partitions: NULL
             type: ALL
    possible_keys: NULL
              key: NULL
          key_len: NULL
              ref: NULL
             rows: 1489068
         filtered: 100.00
            Extra: Using temporary; Using filesort
    *************************** 2. row ***************************
               id: 1
      select_type: PRIMARY
            table: orders
       partitions: NULL
             type: range
    possible_keys: i_o_orderdate
              key: i_o_orderdate
          key_len: 3
              ref: NULL
             rows: 568369
         filtered: 100.00
            Extra: Parallel scan (2 workers); Using index condition; Using where; Using temporary
    *************************** 3. row ***************************
               id: 2
      select_type: DEPENDENT SUBQUERY
            table: lineitem
       partitions: NULL
             type: ref
    possible_keys: PRIMARY,i_l_orderkey
              key: PRIMARY
          key_len: 4
              ref: tpch_10.orders.O_ORDERKEY
             rows: 4
         filtered: 33.3
            Extra: Parallel pushdown; Using where
    3 rows in set, 2 warnings (0.01 sec)

    從上述結果可以看出,select_typeDEPENDENT SUBQUERY的子查詢中,Extra顯示為Parallel pushdown,表示該子查詢使用了Parallel pushdown策略,即子查詢被整個下推到Worker去執行。

  • PolarDB MySQL版8.0.2版本中,通過帶有子查詢的EXPLAIN語句查看使用並行查詢的情況,通過FORMAT=TREE可以看到更清晰的執行計畫。

    • 樣本1

      查詢語句如下:

      EXPLAIN FORMAT=TREE SELECT
          o_orderpriority,
          COUNT(*) as order_count
      FROM
          orders
      WHERE
          o_orderdate >= '1994-04-01'
          AND o_orderdate < date_add('1994-04-01', interval '3' month)
          AND exists (
              SELECT
                  *
              FROM
                  lineitem
              WHERE
                  l_orderkey = o_orderkey
                  and l_commitdate < l_receiptdate
          )
      GROUP BY
          o_orderpriority
      ORDER BY
          o_orderpriority\G

      查詢結果如下:

      *************************** 1. row ***************************
      EXPLAIN: -> Sort: <temporary>.o_orderpriority
          -> Table scan on <temporary>
              -> Aggregate using temporary table  (cost=1746887.76 rows=1489068)
                  -> Gather (slice: 1; workers: 2)  (cost=1597980.96 rows=1489068)
                      -> Table scan on <temporary>
                          -> Aggregate using temporary table  (cost=1486290.85 rows=744534)
                              -> Filter: exists(select #2)  (cost=772982.43 rows=568369)
                                  -> Parallel index range scan on orders using i_o_orderdate, with index condition: ((orders.O_ORDERDATE >= DATE'1994-04-01') and (orders.O_ORDERDATE < <cache>(('1994-04-01' + interval '3' month)))), with parallel partitions: 89  (cost=772982.43 rows=568369)
                                  -> Select #2 (subquery in condition; dependent)
                                      -> Limit: 1 row(s)
                                          -> Filter: (lineitem.L_COMMITDATE < lineitem.L_RECEIPTDATE)  (cost=1.14 rows=1)
                                              -> Index lookup on lineitem using PRIMARY (L_ORDERKEY=orders.O_ORDERKEY)  (cost=1.14 rows=4)
      
      1 row in set, 1 warning (0.02 sec)
    • 樣本2

      查詢語句如下:

      EXPLAIN FORMAT=TREE
      select
      	ps_partkey,
      	sum(ps_supplycost * ps_availqty) as value
      from
      	partsupp,
      	supplier,
      	nation
      where
      	ps_suppkey = s_suppkey
      	and s_nationkey = n_nationkey
      	and n_name = 'IRAN'
      group by
      	ps_partkey having
      		sum(ps_supplycost * ps_availqty) > (
      			select
      				sum(ps_supplycost * ps_availqty) * 0.0000010000
      			from
      				partsupp,
      				supplier,
      				nation
      			where
      				ps_suppkey = s_suppkey
      				and s_nationkey = n_nationkey
      				and n_name = 'IRAN'
      		)
      order by
      	value desc limit 1;

      查詢結果如下:

      | -> Limit: 1 row(s)  (cost=1408498.03 rows=1)
          -> Gather (merge sort; slice: 1; workers: 256; actual workers: 32)  (cost=1408498.03 rows=256)
              -> Limit: 1 row(s)  (cost=1408404.20 rows=1)
                  -> Sort: <temporary>.value DESC, limit input to 1 row(s) per chunk  (cost=1408404.20 rows=803182)
                      -> Filter: (sum((partsupp.PS_SUPPLYCOST * partsupp.PS_AVAILQTY)) > (select #2))
                          -> Table scan on <temporary>
                              -> Aggregate using temporary table  (cost=1408404.20 rows=803182)
                                  -> Inner hash join (partsupp.PS_SUPPKEY = supplier.S_SUPPKEY)  (cost=829770.18 rows=327820)
                                      -> Parallel table scan on partsupp, with parallel partitions: 2882, partition_keys: 1  (cost=6347528.15 rows=3176912)
                                      -> Hash
                                          -> Broadcast (slice: 2; workers: 256; nodes: 16)  (cost=103382.56 rows=1029632)
                                              -> Nested loop inner join  (cost=409.36 rows=4022)
                                                  -> Filter: (nation.N_NAME = 'IRAN')  (cost=2.29 rows=3)
                                                      -> Table scan on nation  (cost=2.29 rows=25)
                                                  -> Parallel index lookup on supplier using SUPPLIER_FK1 (S_NATIONKEY=nation.N_NATIONKEY), with parallel partitions: 9243  (cost=65.94 rows=1609)
                          -> Select #2 (subquery in condition; run only once; shared access)
                              -> Aggregate: sum(`<collector>`.tmp_field_0)  (cost=825576.85 rows=1)
                                  -> Gather (slice: 1; workers: 256; nodes: 16)  (cost=825564.05 rows=256)
                                      -> Aggregate: sum((partsupp.PS_SUPPLYCOST * partsupp.PS_AVAILQTY))  (cost=825541.20 rows=1)
                                          -> Inner hash join (partsupp.PS_SUPPKEY = supplier.S_SUPPKEY)  (cost=809150.20 rows=327820)
                                              -> Parallel table scan on partsupp, with parallel partitions: 14405  (cost=6147699.35 rows=3176912)
                                              -> Hash
                                                  -> Broadcast (slice: 2; workers: 256; nodes: 16)  (cost=103382.56 rows=1029632)
                                                      -> Nested loop inner join  (cost=409.36 rows=4022)
                                                          -> Filter: (nation.N_NAME = 'IRAN')  (cost=2.29 rows=3)
                                                              -> Table scan on nation  (cost=2.29 rows=25)
                                                          -> Parallel index lookup on supplier using SUPPLIER_FK1 (S_NATIONKEY=nation.N_NATIONKEY), with parallel partitions: 9243  (cost=65.94 rows=1609)

      從上述結果可以看出,select_typeSUBQUERY的子查詢中,在Select #2子查詢處顯示為Shared access,表示該子查詢使用了Shared access策略,即PolarDB最佳化器選擇提前並存執行該子查詢並將執行結果Share給外層所有Worker做共用訪問,在這種查詢計劃下,子查詢外層的查詢塊由於策略限制,將無法做跨機執行,即外層查詢的多個並行worker需要在查詢下發的節點內單機並存執行。

查看使用並行查詢時各類計算的具體執行時間

PolarDB MySQL版8.0.2版本中,可以通過EXPLAIN ANALYZE語句查看使用並行查詢時,各類計算的具體執行時間。

  • 查詢語句如下:

    EXPLAIN ANALYZE select
        c_name,
        c_custkey,
        o_orderkey,
        o_orderdate,
        o_totalprice,
        sum(l_quantity)
    from
        customer,
        orders,
        lineitem
    where
        o_orderkey in (
            select
                l_orderkey
            from
                lineitem
            group by
                l_orderkey having
                    sum(l_quantity) > 313
        )
        and c_custkey = o_custkey
        and o_orderkey = l_orderkey
    group by
        c_name,
        c_custkey,
        o_orderkey,
        o_orderdate,
        o_totalprice
    order by
        o_totalprice desc,
        o_orderdate
    LIMIT 100;
  • 查詢結果如下:

    | -> Limit: 100 row(s)  (cost=14.62 rows=1) (actual time=1.252..1.252 rows=0 loops=1)
        -> Gather (merge sort; slice: 1; workers: 1; nodes: 2)  (cost=14.62 rows=1) (actual time=1.250..1.250 rows=0 loops=1)
            -> Limit: 100 row(s)  (cost=4.52 rows=1) (actual time=0.084,0.084,0.084..0.084,0.084,0.084 rows=0,0,0 loops=1,1,1)
                -> Sort: <temporary>.O_TOTALPRICE DESC, <temporary>.O_ORDERDATE, limit input to 100 row(s) per chunk  (cost=4.52 rows=1) (actual time=0.083,0.083,0.083..0.083,0.083,0.083 rows=0,0,0 loops=1,1,1)
                    -> Table scan on <temporary> (actual time=0.070,0.070,0.070..0.070,0.070,0.070 rows=0,0,0 loops=1,1,1)
                        -> Aggregate using temporary table  (cost=4.52 rows=1) (actual time=0.001,0.001,0.001..0.001,0.001,0.001 rows=0,0,0 loops=1,1,1)
                            -> Nested loop inner join  (cost=2.86 rows=4) (actual time=0.039,0.039,0.039..0.039,0.039,0.039 rows=0,0,0 loops=1,1,1)
                                -> Nested loop inner join  (cost=1.45 rows=1) (actual time=0.037,0.037,0.037..0.037,0.037,0.037 rows=0,0,0 loops=1,1,1)
                                    -> Parallel table scan on customer, with parallel partitions: 1, partition_keys: 1  (cost=0.35 rows=1) (actual time=0.036,0.036,0.036..0.036,0.036,0.036 rows=0,0,0 loops=1,1,1)
                                    -> Filter: <in_optimizer>(orders.O_ORDERKEY,<exists>(select #2))  (cost=1.10 rows=1)
                                        -> Index lookup on orders using ORDERS_FK1 (O_CUSTKEY=customer.C_CUSTKEY)  (cost=1.10 rows=1)
                                        -> Select #2 (subquery in condition; dependent)
                                            -> Limit: 1 row(s)
                                                -> Filter: ((sum(lineitem.L_QUANTITY) > 313) and (<cache>(orders.O_ORDERKEY) = <ref_null_helper>(lineitem.L_ORDERKEY)))
                                                    -> Group aggregate: sum(lineitem.L_QUANTITY)
                                                        -> Index scan on lineitem using PRIMARY  (cost=41554048.20 rows=380071042)
                                -> Index lookup on lineitem using PRIMARY (L_ORDERKEY=orders.O_ORDERKEY)  (cost=1.41 rows=4)

查詢語句帶有ANALYZE關鍵字時,會實際執行查詢語句,並統計其中各個計算的耗時情況。對於並行查詢,也可以記錄各個worker在每個運算元中的計算耗時。包括各個運算元在多個worker上的最長、最短及平均執行耗時。