Vertex Clustering Coefficient - Platform For AI - Alibaba Cloud Documentation Center

The Vertex Clustering Coefficient is a metric in network analysis used to quantify the degree of clustering among a node's neighbors. Specifically, it represents the ratio of the actual number of edges present among a node's neighbors to the total possible number of edges that could exist between them. The value of this coefficient ranges from 0 to 1, with a higher value indicating a tighter connection among the neighbors of the node, reflecting the local clustering characteristics within the network.

Algorithm description

In an undirected graph, the clustering coefficient of a node represents the density of connections around that node. A star network has a density of 0, while a fully connected network has a density of 1.

In network analysis, star networks and fully connected networks are two typical network topologies:

Star network: This structure consists of a central node and multiple peripheral nodes, with all peripheral nodes connected only to the central node. The characteristic of a star network is that the clustering coefficient of the central node is 0 because its neighbors (the peripheral nodes) do not have direct connections with each other.
Fully connected network: In this structure, every node is directly connected to all other nodes. The characteristic of a fully connected network is that all nodes have a clustering coefficient of 1 because all possible connections exist between each node and its neighbors.

These two structures represent the extreme cases in network topology, with the star network having the lowest local clustering and the fully connected network having the highest local clustering.

Configure the component

Method 1: Configure the component on the pipeline page

Add a Vertex Clustering Coefficient component on the pipeline page and configure the following parameters:

Category	Parameter	Description
Fields Setting	Start Vertex	The start vertex column in the edge table.
Fields Setting	End Vertex	The end vertex column in the edge table.
Parameters Setting	Largest Vertex Degree	If the vertex degree is larger than the value of this parameter, sampling is required. Default value: 500.
Tuning	Workers	The number of vertices for parallel job execution. The degree of parallelism and framework communication costs increase with the value of this parameter.
	Memory Size per Worker (MB)	The maximum size of memory that a single job can use. Unit: MB. Default value: 4096. If the size of used memory exceeds the value of this parameter, the `OutOfMemory` error is reported.
	Data Split Size (MB)	The data split size. Unit: MB. Default value: 64.

Method 2: Configure the component by using PAI commands

You can configure the Vertex Clustering Coefficient component by using PAI commands. You can use the SQL Script component to run PAI commands. For more information, see Scenario 4: Execute PAI commands within the SQL script component in the "SQL Script" topic.

PAI -name NodeDensity
    -project algo_public
    -DinputEdgeTableName=NodeDensity_func_test_edge
    -DfromVertexCol=flow_out_id
    -DtoVertexCol=flow_in_id
    -DoutputTableName=NodeDensity_func_test_result
    -DmaxEdgeCnt=500;

Parameter	Required	Default value	Description
inputEdgeTableName	Yes	No default value	The name of the input edge table.
inputEdgeTablePartitions	No	Full table	The partitions in the input edge table.
fromVertexCol	Yes	No default value	The start vertex column in the input edge table.
toVertexCol	Yes	No default value	The end vertex column in the input edge table.
outputTableName	Yes	No default value	The name of the output table.
outputTablePartitions	No	No default value	The partitions in the output table.
lifecycle	No	No default value	The lifecycle of the output table.
maxEdgeCnt	No	500	If the vertex degree is greater than the value of this parameter, sampling is required.
workerNum	No	No default value	The number of vertices for parallel job execution. The degree of parallelism and framework communication costs increase with the value of this parameter.
workerMem	No	4096	The maximum size of memory that a single job can use. Unit: MB. Default value: 4096. If the size of used memory exceeds the value of this parameter, the `OutOfMemory` error is reported.
splitSize	No	64	The data split size. Unit: MB.

Example

Add a SQL Script component. Deselect Use Script Mode and Whether the system adds a create table statement. Enter the following SQL statements.

drop table if exists NodeDensity_func_test_edge;
create table NodeDensity_func_test_edge as
select * from
(
  select '1' as flow_out_id, '2' as flow_in_id
  union all
  select '1' as flow_out_id, '3' as flow_in_id
  union all
  select '1' as flow_out_id, '4' as flow_in_id
  union all
  select '1' as flow_out_id, '5' as flow_in_id
  union all
  select '1' as flow_out_id, '6' as flow_in_id
  union all
  select '2' as flow_out_id, '3' as flow_in_id
  union all
  select '3' as flow_out_id, '4' as flow_in_id
  union all
  select '4' as flow_out_id, '5' as flow_in_id
  union all
  select '5' as flow_out_id, '6' as flow_in_id
  union all
  select '5' as flow_out_id, '7' as flow_in_id
  union all
  select '6' as flow_out_id, '7' as flow_in_id
)tmp;
drop table if exists NodeDensity_func_test_result;
create table NodeDensity_func_test_result
(
  node string,
  node_cnt bigint,
  edge_cnt bigint,
  density double,
  log_density double
);

Data structure

Add a SQL Script component. Deselect Use Script Mode and Whether the system adds a create table statement. Enter the following PAI commands and connect the two SQL Script components.

drop table if exists ${o1};
PAI -name NodeDensity
    -project algo_public
    -DinputEdgeTableName=NodeDensity_func_test_edge
    -DfromVertexCol=flow_out_id
    -DtoVertexCol=flow_in_id
    -DoutputTableName=${o1}
    -DmaxEdgeCnt=500;

Click in the upper left corner to run the pipeline.

Right-click the SQL Script component created in Step 2 and choose View Data > SQL Script Output to view the training results.

| node | node_cnt | edge_cnt | density | log_density |
| ---- | -------- | -------- | ------- | ----------- |
| 1    | 5        | 4        | 0.4     | 1.45657     |
| 2    | 2        | 1        | 1.0     | 1.24696     |
| 3    | 3        | 2        | 0.66667 | 1.35204     |
| 4    | 3        | 2        | 0.66667 | 1.35204     |
| 5    | 4        | 3        | 0.5     | 1.41189     |
| 6    | 3        | 2        | 0.66667 | 1.35204     |
| 7    | 2        | 1        | 1.0     | 1.24696     |