EXPLAIN - MaxCompute - Alibaba Cloud ドキュメントセンター

ほとんどの場合、クエリステートメントまたはテーブルスキーマを分析して、開発中にパフォーマンスのボトルネックを見つける必要があります。 MaxCompute SQLは、クエリステートメントの分析に役立つEXPLAINステートメントを提供します。このトピックでは、EXPLAINステートメントの機能と構文について説明します。このトピックでは、EXPLAINステートメントの使用例も示します。

説明

EXPLAIN文は、MaxCompute SQLのDQL文の実行プラン構造を表示できます。このステートメントは、SQLステートメントの実行方法を理解するのに役立ち、SQLステートメントを最適化するためのガイダンスを提供します。 1つのクエリステートメントは複数のジョブに対応し、1つのジョブは複数のタスクに対応します。

説明

クエリステートメントが複雑で、EXPLAINステートメントの出力結果の行サイズが4 MBを超える場合、上位層アプリケーションのAPIで指定されたしきい値に達します。その結果、出力結果を完全に表示することができない。この問題に対処するには、クエリステートメントを複数のサブクエリに分割し、各サブクエリに対してEXPLAINステートメントを実行してジョブの構造を取得します。

構文

explain <dml query>;

dml query: 必須です。 SELECTステートメント。詳細については、「SELECT構文」をご参照ください。

戻り値

EXPLAINステートメントの出力結果には、次の情報が含まれます。

ジョブ間の依存関係
たとえば、job0 is a root job。クエリにjob0のみが必要な場合、1行のデータのみが表示されます。
タスク間の依存関係
```
In Job job0:
root Tasks: M1, M2
J3_1_2_Stg1 depends on: M1, M2
```
job0には、M1、M2、J3_1_2_Stg1のタスクが含まれます。 M1およびM2タスクの実行後、MaxComputeはJ3_1_2_Stg1タスクを実行します。
タスクの命名規則:
- MaxComputeには、map、reduce、join、およびローカル作業の4つのタスクタイプがあります。タスク名の最初の文字は、タスクのタイプを示します。たとえば、M2Stg1はマップタスクです。
- 最初の文字に続く数字はタスクIDを示します。このIDは、特定のクエリに対応するすべてのタスク間で一意です。
- アンダースコア (_) で区切られた桁は、タスクの直接依存関係を表します。たとえば、J3_1_2_Stg1は、IDが3のタスクがM1タスクとM2タスクに依存することを示します。
タスク内のすべての演算子間の依存関係
演算子文字列は、タスクの実行セマンティクスを記述する。演算子文字列構造:
```
In Task M2:
    Data source: mf_mc_bj.sale_detail_jt/sale_date=2013/region=china # Data source describes the input of the task. 
    TS: mf_mc_bj.sale_detail_jt/sale_date=2013/region=china           # TableScanOperator
        FIL: ISNOTNULL(customer_id)                                   # FilterOperator
            RS: order: +                                              # ReduceSinkOperator
                nullDirection: *
                optimizeOrderBy: False
                valueDestLimit: 0
                dist: HASH
                keys:
                      customer_id
                values:
                      customer_id (string)
                      total_price (double)
                partitions:
                      customer_id


In Task J3_1_2:
    JOIN:                                                           # JoinOperator
         StreamLineRead1 INNERJOIN StreamLineRead2
         keys:
             0:customer_id
             1:customer_id

        AGGREGATE: group by:customer_id                            # GroupByOperator
         UDAF: SUM(total_price) (__agg_0_sum)[Complete],SUM(total_price) (__agg_1_sum)[Complete]
            RS: order: +
                nullDirection: *
                optimizeOrderBy: True
                valueDestLimit: 10
                dist: HASH
                keys:
                      customer_id
                values:
                      customer_id (string)
                      __agg_0 (double)
                      __agg_1 (double)
                partitions:


In Task R4_3:
    SEL: customer_id,__agg_0,__agg_1                               # SelectOperator
        LIM:limit 10                                               # LimitOperator
            FS: output: Screen                                     # FileSinkOperator
                schema:
                  customer_id (string) AS ashop
                  __agg_0 (double) AS ap
                  __agg_1 (double) AS bp
```
オペレータの説明:
- TableScanOperator (TS): クエリステートメント内のFROMステートメントブロックのロジックを説明します。入力テーブルのエイリアスは、EXPLAINステートメントの出力結果に表示されます。
- SelectOperator (SEL): クエリステートメント内のSELECTステートメントブロックのロジックを記述します。次の演算子に渡された列は、EXPLAIN文の実行結果に表示されます。複数の列はコンマ (,) で区切ります。
  - 列が指定されている場合、値は <alias>.<column_name> 形式で表示されます。
  - 式が指定されている場合、値はfunc1(arg1_1, arg1_2, func2(arg2_1, arg2_2)) などの関数のリストとして表示されます。
  - 定数が指定されている場合は、定数値が表示されます。
- FilterOperator (FIL): クエリステートメント内のWHEREステートメントブロックのロジックを記述します。 EXPLAINステートメントの出力結果には、SelectOperatorと同様の形式のWHERE式が含まれます。
- JoinOperator (JOIN): クエリステートメント内のJOINステートメントブロックのロジックについて説明します。 EXPLAINステートメントの出力結果は、どのテーブルがどのように結合されているかを示します。
- GroupByOperator (AGGREGATE): 集計操作のロジックを説明します。この演算子は、クエリ文で集計関数が使用されている場合に表示されます。 EXPLAINステートメントの実行結果に集計関数の内容が表示されます。
- ReduceSinkOperator (RS): タスク間のデータ分散のロジックを説明します。タスクの結果を別のタスクに転送する場合は、タスクの最終段階でReduceSinkOperatorを使用してデータを配布する必要があります。 EXPLAIN文の出力結果には、ハッシュ値の計算に使用される結果の並べ替え方法、分散キー、分散値、および列が表示されます。
- FileSinkOperator (FS): 最終データレコードに対するストレージ操作のロジックを説明します。 INSERTステートメントブロックがクエリステートメントに含まれている場合、データを挿入するテーブルの名前がEXPLAINステートメントの出力結果に表示されます。
- LimitOperator (LIM): クエリステートメント内のLIMITステートメントブロックのロジックを説明します。 LIMIT文ブロックで指定された戻り行数は、EXPLAIN文の実行結果に表示されます。
- MapjoinOperator (HASHJOIN): 大きなテーブルに対するJOIN操作について説明します。この演算子はJoinOperatorに似ています。

サンプルデータ

サンプルソースデータは、このトピックの例をよりよく理解するために提供されています。次のステートメントは、sale_detailテーブルとsale_detail_jtテーブルを作成し、テーブルにデータを挿入する方法を示しています。

-- Create two partitioned tables named sale_detail and sale_detail_jt. 
create table if not exists sale_detail
(
shop_name     string,
customer_id   string,
total_price   double
)
partitioned by (sale_date string, region string);
create table if not exists sale_detail_jt
(
shop_name     string,
customer_id   string,
total_price   double
)
partitioned by (sale_date string, region string);

-- Add partitions to the two tables. 
alter table sale_detail add partition (sale_date='2013', region='china') partition (sale_date='2014', region='shanghai');
alter table sale_detail_jt add partition (sale_date='2013', region='china');

-- Insert data into the tables. 
insert into sale_detail partition (sale_date='2013', region='china') values ('s1','c1',100.1),('s2','c2',100.2),('s3','c3',100.3);
insert into sale_detail partition (sale_date='2014', region='shanghai') values ('null','c5',null),('s6','c6',100.4),('s7','c7',100.5);
insert into sale_detail_jt partition (sale_date='2013', region='china') values ('s1','c1',100.1),('s2','c2',100.2),('s5','c2',100.2);

Query data from the sale_detail and sale_detail_jt tables. Sample statements:
select * from sale_detail;
+------------+-------------+-------------+------------+------------+
| shop_name  | customer_id | total_price | sale_date  | region     |
+------------+-------------+-------------+------------+------------+
| s1         | c1          | 100.1       | 2013       | china      |
| s2         | c2          | 100.2       | 2013       | china      |
| s3         | c3          | 100.3       | 2013       | china      |
| null       | c5          | NULL        | 2014       | shanghai   |
| s6         | c6          | 100.4       | 2014       | shanghai   |
| s7         | c7          | 100.5       | 2014       | shanghai   |
+------------+-------------+-------------+------------+------------+
select * from sale_detail_jt;
+------------+-------------+-------------+------------+------------+
| shop_name  | customer_id | total_price | sale_date  | region     |
+------------+-------------+-------------+------------+------------+
| s1         | c1          | 100.1       | 2013       | china      |
| s2         | c2          | 100.2       | 2013       | china      |
| s5         | c2          | 100.2       | 2013       | china      |
+------------+-------------+-------------+------------+------------+

-- Create a table for the JOIN operation. 
SET odps.sql.allow.fullscan=true;
create table shop as select shop_name, customer_id, total_price from sale_detail;

例

サンプルデータに基づいて次のステートメントを実行します。

-- Execute the query statement. 
select a.customer_id as ashop, sum(a.total_price) as ap,count(b.total_price) as bp 
from (select * from sale_detail_jt where sale_date='2013' and region='china') a 
inner join (select * from sale_detail where sale_date='2013' and region='china') b 
on a.customer_id=b.customer_id 
group by a.customer_id 
order by a.customer_id 
limit 10;
-- Obtain the execution plan of the query statement. 
explain 
select a.customer_id as ashop, sum(a.total_price) as ap,count(b.total_price) as bp 
from (select * from sale_detail_jt where sale_date='2013' and region='china') a 
inner join (select * from sale_detail where sale_date='2013' and region='china') b 
on a.customer_id=b.customer_id 
group by a.customer_id 
order by a.customer_id 
limit 10;

次の応答が返されます。

job0 is root job

In Job job0:
root Tasks: M1

In Task M1_U0:
    TS: doc_test_dev.sale_detail_jt/sale_date=2013/region=china
        FIL: ISNOTNULL(customer_id)
            HASHJOIN:
                     Filter1 INNERJOIN Filter2
                     keys:
                         0:customer_id
                         1:customer_id
                     non-equals:
                         0:
                         1:
                     bigTable: Filter1

                LocalSortBy: order: +
                             nullDirection: *
                             keys:customer_id
                    AGGREGATE: group by:customer_id
                     UDAF: SUM(total_price) (__agg_0_sum)[Complete],COUNT(total_price) (__agg_1_count)[Complete]
                        LIM:limit 10
                            FS: output: Screen
                                schema:
                                  customer_id (string) AS ashop
                                  __agg_0 (double) AS ap
                                  __agg_1 (bigint) AS bp


In Task M1_U1:
    TS: doc_test_dev.sale_detail/sale_date=2013/region=china
        FIL: ISNOTNULL(customer_id)
            HASHJOIN:
                     Filter1 INNERJOIN Filter2
                     keys:
                         0:customer_id
                         1:customer_id
                     non-equals:
                         0:
                         1:
                     bigTable: Filter1

                LocalSortBy: order: +
                             nullDirection: *
                             keys:customer_id
                    AGGREGATE: group by:customer_id
                     UDAF: SUM(total_price) (__agg_0_sum)[Complete],COUNT(total_price) (__agg_1_count)[Complete]
                        LIM:limit 10
                            FS: output: Screen
                                schema:
                                  customer_id (string) AS ashop
                                  __agg_0 (double) AS ap
                                  __agg_1 (bigint) AS bp

サンプルデータに基づいて次のステートメントを実行します。

-- Execute the query statement. 
select /*+ mapjoin(a) */
       a.customer_id as ashop, sum(a.total_price) as ap,count(b.total_price) as bp 
 from (select * from sale_detail_jt 
where sale_date='2013' and region='china') a 
inner join (select * from sale_detail where sale_date='2013' and region='china') b 
on a.total_price<b.total_price 
group by a.customer_id 
order by a.customer_id 
limit 10;
-- Obtain the execution plan of the query statement. 
explain 
select /*+ mapjoin(a) */
       a.customer_id as ashop, sum(a.total_price) as ap,count(b.total_price) as bp 
 from (select * from sale_detail_jt 
where sale_date='2013' and region='china') a 
inner join (select * from sale_detail where sale_date='2013' and region='china') b 
on a.total_price<b.total_price 
group by a.customer_id 
order by a.customer_id 
limit 10;

次の応答が返されます。

job0 is root job

In Job job0:
root Tasks: M1

In Task M1_U0:
    TS: doc_test_dev.sale_detail_jt/sale_date=2013/region=china
        HASHJOIN:
                 TableScan1 INNERJOIN TableScan2
                 keys:
                     0:
                     1:
                 non-equals:
                     0:
                     1:
                 bigTable: TableScan2

            FIL: LT(total_price,total_price)
                LocalSortBy: order: +
                             nullDirection: *
                             keys:customer_id
                    AGGREGATE: group by:customer_id
                     UDAF: SUM(total_price) (__agg_0_sum)[Complete],COUNT(total_price) (__agg_1_count)[Complete]
                        LIM:limit 10
                            FS: output: Screen
                                schema:
                                  customer_id (string) AS ashop
                                  __agg_0 (double) AS ap
                                  __agg_1 (bigint) AS bp


In Task M1_U1:
    TS: doc_test_dev.sale_detail/sale_date=2013/region=china
        HASHJOIN:
                 TableScan1 INNERJOIN TableScan2
                 keys:
                     0:
                     1:
                 non-equals:
                     0:
                     1:
                 bigTable: TableScan2

            FIL: LT(total_price,total_price)
                LocalSortBy: order: +
                             nullDirection: *
                             keys:customer_id
                    AGGREGATE: group by:customer_id
                     UDAF: SUM(total_price) (__agg_0_sum)[Complete],COUNT(total_price) (__agg_1_count)[Complete]
                        LIM:limit 10
                            FS: output: Screen
                                schema:
                                  customer_id (string) AS ashop
                                  __agg_0 (double) AS ap
                                  __agg_1 (bigint) AS bp