All Products
Search
Document Center

Platform For AI:Use graph algorithms to manage financial risks

Last Updated:Jun 04, 2024

This topic describes how to use graph algorithms to manage financial risks.

Background information

Graph algorithms are used in relationship analysis scenarios. Graph algorithms arrange data into a relationship graph that contains connections between vertices. The connections are represented as edges. Machine Learning Platform for AI (PAI) provides several graph algorithm components, including K-Core, Maximum Connected Subgraph, and Label Propagation Classification.

The following figure provides an example on the relationship graph of an interlinked group of people. The arrows in the figure represent the relationships between these people, such as colleagues or relatives. In this graph, Enoch is a trusted customer and Evan is a fraudster. Based on this information and the relationship graph, you can use graph algorithms to calculate the credit index of each person, that is, the probability of the person being a fraudster. Relationship graphs

Datasets

The following table describes the fields in the dataset that is used in this topic.

Field

Meaning

Type

Description

start_point

Start vertex of an edge

STRING

The name of a person.

end_point

End vertex of an edge

STRING

The name of a person.

count

Closeness

DOUBLE

The closeness between two persons. A greater value indicates a closer relationship between the two persons.

The following figure shows the sample data that is used in the pipeline. Sample data

Procedure

  1. Go to the Machine Learning Designer page.

    1. Log on to the PAI console.

    2. In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace that you want to manage.

    3. In the left-side navigation pane of the workspace page, choose Model Development and Training > Visual Modeling (Designer) to go to the Machine Learning Designer page.

  2. Create a pipeline.

    1. On the Visualized Modeling (Designer) page, click the Preset Templates tab.

    2. On this tab, find the Financial Risk Management template and click Create.

    3. In the Create Pipeline dialog box, configure the following parameters. You can use their default values.

      The value specified for the Pipeline Data Path parameter is the Object Storage Service (OSS) bucket path of the temporary data and models generated during the runtime of the pipeline.

    4. Click OK.

      It requires about 10 seconds to create the pipeline.

    5. On the Pipelines tab, double-click the Financial Risk Management pipeline to open the pipeline.

    6. View the components of the pipeline on the canvas, as shown in the following figure. The system automatically creates the pipeline based on the built-in template.

      Graph algorithm pipeline

      Section

      Description

      The Maximum Connected Subgraph component classifies the people in the relationship graph into two groups, and assigns an ID to each group. Then, the SQL Script and JOIN components remove unrelated people in the relationship graph.

      The Maximum Connected Subgraph component can find the set that contains the largest number of interlinked people to remove unrelated people, as shown in the following figure. max connected set

      The component displayed in this section explores the distance between two vertices. In the output of the Single-source Shortest Path component, the distance field indicates the number of people that Enoch must contact to reach the desired people, as shown in the following figure. Single-source shortest path results

      The Data Source component imports the labeled data. The weight field indicates the probability of a person being a fraudster. Then, the Label Propagation Classification component predicts the labels of unlabeled vertices. Finally, the SQL Script component filters results and shows the probability of each person being a fraudster. Example of labeled data

      Label propagation classification is a semi-supervised classification algorithm. It uses a relationship graph and labeled data as its input and predicts the labels of unlabeled vertices based on the labels of labeled ones. Label propagation classification propagates the label of each vertex to the vertices next to the vertex.

  3. Run the pipeline and view the results.

    1. In the upper-left corner of the canvas, click image.

    2. After the pipeline is run, right-click SQL Script on the canvas and select View Data. On the tab that appears, view the probability of each person being a fraudster.