Single-source Shortest Path - Platform For AI - Alibaba Cloud Documentation Center

The single-source shortest path refers to the shortest path between a start vertex and all other vertices. The shortest path is calculated by using the Dijkstra algorithm. The Single-source Shortest Path component can provide the shortest paths between a start vertex and all other vertices and the number of shortest paths.

Configure the component

Method 1: Configure the component on the pipeline page

You can add the Single-source Shortest Path component on the pipeline page of Machine Learning Designer in the Platform for AI (PAI) console. The following table describes the parameters.

Tab	Parameter	Description
Fields Setting	Source Vertex Column	The start vertex column in the edge table.
	Target Vertex Column	The end vertex column in the edge table.
	Edge Weight Column	The edge weight column in the edge table.
Parameters Setting	Initial Node ID	The start vertex that is used to calculate the shortest path.
Tuning	Number of Workers	The number of vertices for parallel job execution. The degree of parallelism and framework communication costs increase with the value of this parameter.
Tuning	Worker Memory (MB)	The maximum size of memory that a single job can use. Unit: MB. Default value: 4096. If the size of used memory exceeds the value of this parameter, the `OutOfMemory` error is reported.

Method 2: Configure the component by using PAI commands

You can configure the Single-source Shortest Path component by using PAI commands. You can use the SQL Script component to run PAI commands. For more information, see Scenario 4: Execute PAI commands within the SQL script component in the "SQL Script" topic.

PAI -name SSSP
    -project algo_public
    -DinputEdgeTableName=SSSP_func_test_edge
    -DfromVertexCol=flow_out_id
    -DtoVertexCol=flow_in_id
    -DoutputTableName=SSSP_func_test_result
    -DhasEdgeWeight=true
    -DedgeWeightCol=edge_weight
    -DstartVertex=a;

Parameter	Required	Default value	Description
inputEdgeTableName	Yes	No default value	The name of the input edge table.
inputEdgeTablePartitions	No	Full table	The partitions in the input edge table.
fromVertexCol	Yes	No default value	The start vertex column in the input edge table.
toVertexCol	Yes	No default value	The end vertex column in the input edge table.
outputTableName	Yes	No default value	The name of the output table.
outputTablePartitions	No	No default value	The partitions in the output table.
lifecycle	No	No default value	The lifecycle of the output table.
workerNum	No	No default value	The number of vertices for parallel job execution. The degree of parallelism and framework communication costs increase with the value of this parameter.
workerMem	No	4096	The maximum size of memory that a single job can use. Unit: MB. Default value: 4096. If the size of used memory exceeds the value of this parameter, the `OutOfMemory` error is reported.
splitSize	No	64	The data split size. Unit: MB.
startVertex	Yes	No default value	The ID of the start vertex.
hasEdgeWeight	No	false	Specifies whether the edges in the input edge table have weights.
edgeWeightCol	No	No default value	The edge weight column in the input edge table.

Example

Add the SQL Script component as a vertex to the canvas and execute the following SQL statements to generate training data.

drop table if exists SSSP_func_test_edge;
create table SSSP_func_test_edge as
select
    flow_out_id,flow_in_id,edge_weight
from
(
    select "a" as flow_out_id,"b" as flow_in_id,1.0 as edge_weight
    union all
    select "b" as flow_out_id,"c" as flow_in_id,2.0 as edge_weight
    union all
    select "c" as flow_out_id,"d" as flow_in_id,1.0 as edge_weight
    union all
    select "b" as flow_out_id,"e" as flow_in_id,2.0 as edge_weight
    union all
    select "e" as flow_out_id,"d" as flow_in_id,1.0 as edge_weight
    union all
    select "c" as flow_out_id,"e" as flow_in_id,1.0 as edge_weight
    union all
    select "f" as flow_out_id,"g" as flow_in_id,3.0 as edge_weight
    union all
    select "a" as flow_out_id,"d" as flow_in_id,4.0 as edge_weight
) tmp;

Data structure

Add the SQL Script component as a vertex to the canvas and run the following PAI commands to train the model.

drop table if exists ${o1};
PAI -name SSSP
    -project algo_public
    -DinputEdgeTableName=SSSP_func_test_edge
    -DfromVertexCol=flow_out_id
    -DtoVertexCol=flow_in_id
    -DoutputTableName=${o1}
    -DhasEdgeWeight=true
    -DedgeWeightCol=edge_weight
    -DstartVertex=a;

Right-click the SQL Script component and choose View Data > SQL Script Output to view the training results.

| start_node | dest_node | distance | distance_cnt |
| ---------- | --------- | -------- | ------------ |
| a          | a         | 0.0      | 0            |
| a          | b         | 1.0      | 1            |
| a          | c         | 3.0      | 1            |
| a          | d         | 4.0      | 3            |
| a          | e         | 3.0      | 1            |