ツリーの深さ - Platform For AI - Alibaba Cloud ドキュメントセンター

ツリーの深さは、ルート頂点から最下層の頂点までの経路の長さである。ツリー深さコンポーネントは、各頂点の深さ及び関連するツリーID (ルート頂点ID) を生成することができる。

コンポーネントの設定

方法1: パイプラインページでコンポーネントを設定する

Tree Depthコンポーネントは、Platform for AI (PAI) コンソールのMachine Learning Designerのパイプラインページに追加できます。下表に、各パラメーターを説明します。

タブ	パラメーター	説明
フィールド設定	エッジテーブル: 開始頂点列	エッジテーブルの開始頂点列。
フィールド設定	エッジテーブル: 端頂点列	エッジテーブルの末尾の頂点列。
チューニング	労働者	並列ジョブ実行の頂点の数。並列性とフレームワーク通信コストの程度は、このパラメータの値とともに増加します。
	Workerあたりのメモリサイズ	1つのジョブで使用できるメモリの最大サイズ。単位：MB。デフォルト値: 4096 使用済みメモリのサイズがこのパラメーターの値を超えると、`OutOfMemory`エラーが報告されます。
	データ分割サイズ (MB)	データ分割サイズ。単位：MB。デフォルト値: 64。

方法2: PAIコマンドを使用してコンポーネントを構成する

Tree Depthコンポーネントは、PAIコマンドを使用して設定できます。 SQLスクリプトコンポーネントを使用してPAIコマンドを実行できます。詳細については、「SQLスクリプト」トピックの「シナリオ4: SQLスクリプトコンポーネント内でPAIコマンドを実行する」をご参照ください。

PAI -name TreeDepth
    -project algo_public
    -DinputEdgeTableName=TreeDepth_func_test_edge
    -DfromVertexCol=flow_out_id
    -DtoVertexCol=flow_in_id
    -DoutputTableName=TreeDepth_func_test_result;

パラメーター	必須	デフォルト値	説明
inputEdgeTableName	可	デフォルト値なし	入力エッジテーブルの名前。
inputEdgeTablePartitions	不可	フルテーブル	入力エッジテーブルのパーティション。
fromVertexCol	可	デフォルト値なし	入力エッジテーブルの開始頂点列。
toVertexCol	可	デフォルト値なし	入力エッジテーブルの末尾の頂点列。
outputTableName	可	デフォルト値なし	出力テーブルの名前。
outputTablePartitions	不可	デフォルト値なし	出力テーブルのパーティション。
ライフサイクルの設定 (Set lifecycle)	不可	デフォルト値なし	出力テーブルのライフサイクル。
workerNum	不可	指定なし	並列ジョブ実行の頂点の数。並列性とフレームワーク通信コストの程度は、このパラメータの値とともに増加します。
workerMem	不可	4096	1つのジョブで使用できるメモリの最大サイズ。単位：MB。デフォルト値: 4096 使用済みメモリのサイズがこのパラメーターの値を超えると、`OutOfMemory`エラーが報告されます。
splitSize	不可	64	データ分割サイズ。

例：

SQLスクリプトコンポーネントを頂点としてキャンバスに追加し、次のSQL文を実行してトレーニングデータを生成します。

drop table if exists TreeDepth_func_test_edge;
create table TreeDepth_func_test_edge as
select * from
(
    select '0' as flow_out_id, '1' as flow_in_id
    union all
    select '0' as flow_out_id, '2' as flow_in_id
    union all
    select '1' as flow_out_id, '3' as flow_in_id
    union all
    select '1' as flow_out_id, '4' as flow_in_id
    union all
    select '2' as flow_out_id, '4' as flow_in_id
    union all
    select '2' as flow_out_id, '5' as flow_in_id
    union all
    select '4' as flow_out_id, '6' as flow_in_id
    union all
    select 'a' as flow_out_id, 'b' as flow_in_id
    union all
    select 'a' as flow_out_id, 'c' as flow_in_id
    union all
    select 'c' as flow_out_id, 'd' as flow_in_id
    union all
    select 'c' as flow_out_id, 'e' as flow_in_id
)tmp;
drop table if exists TreeDepth_func_test_result;
create table TreeDepth_func_test_result
(
  node string,
  root string,
  depth bigint
);

データ構造

图结构

SQLスクリプトコンポーネントを頂点としてキャンバスに追加し、次のPAIコマンドを実行してモデルをトレーニングします。

drop table if exists ${o1};
PAI -name TreeDepth
    -project algo_public
    -DinputEdgeTableName=TreeDepth_func_test_edge
    -DfromVertexCol=flow_out_id
    -DtoVertexCol=flow_in_id
    -DoutputTableName=${o1};

SQLスクリプトコンポーネントを右クリックし、[データの表示]> [SQLスクリプトの出力] を選択してトレーニング結果を表示します。

| node | root | depth |
| ---- | ---- | ----- |
| a    | a    | 0     |
| b    | a    | 1     |
| c    | a    | 1     |
| d    | a    | 2     |
| e    | a    | 2     |
| 0    | 0    | 0     |
| 1    | 0    | 1     |
| 2    | 0    | 1     |
| 3    | 0    | 2     |
| 4    | 0    | 2     |
| 5    | 0    | 2     |
| 6    | 0    | 3     |