All Products
Search
Document Center

Platform For AI:Create a Lingjun cluster

Last Updated:Jun 06, 2024

A cluster is a group of resources that are required to run PAI-Lingjun AI Computing Service (Lingjun), including compute nodes and Lingjun connection instances. You can divide the nodes in a cluster into node groups. Each node group contains one or more compute nodes that have the same configurations. This topic describes how to create a Lingjun cluster.

Prerequisites

Create and configure a Lingjun cluster

  1. Log on to the Intelligent Computing Lingjun console.

  2. In the left-side navigation pane, choose Resources and Nodes > Cluster Management.

  3. On the Cluster Management page, click Create Cluster

  4. On the Create Managed Cloud Cluster page, go to the Basic Lingjun Cluster Service section and click Click to create a cluster.

    Configure the parameters in the Clusters and Groups and Network Configurations steps by following the on-screen instructions.

Configure clusters and node groups

You can plan multiple clusters based on your business requirements and divide the compute nodes in a cluster into node groups. You can improve the resource utilization of compute nodes by planning your clusters and compute nodes. After you plan your clusters and compute nodes, perform the steps described in this section to configure clusters and node groups.

image
  1. Configure the cluster information.

    Specify information such as the cluster name, root password of cluster nodes, and resource group. For more information about how to create a resource group, see Create a resource group.

  2. Click Create Group to create a node group.

    1. In the Create Group dialog box, specify the group name and information about the nodes that belong to the group such as the node model and image based on your plan.

    2. Click Select Node Instances next to Node Instance to select the nodes to be added to the group.

  3. Click Save and go to the next step. Network Configurations.

Configure networks

A Lingjun cluster initially resides in an isolated network. You must connect the cluster to the Alibaba Cloud public cloud by using a Lingjun connection instance and a CEN instance, and specify a VPC to monitor network connectivity.

image

As shown in the preceding figure, the network topology involves the following core networks:

  • Cluster network: the CIDR block that is used by the cluster to assign IP addresses to compute nodes. The CIDR block is a private CIDR block.

  • Monitoring network: the VPC that is used to monitor network connectivity.

When you plan and configure networks, make sure that the CIDR blocks of the preceding networks do not conflict with each other. After you plan your networks, you can perform the following steps to configure networks for your cluster.

Note

After you configure the networks of the cluster, you must check whether the network configurations of the CEN instance are correct. For more information about how to configure a CEN instance, see the "CEN configurations" section of the Activate and configure other Alibaba Cloud services topic.

  1. Configure a Lingjun VPD.

    • The Lingjun Virtual Private Datacenter (VPD) is used to assign IP addresses to compute nodes in the Lingjun cluster. Enter a valid private CIDR block.

    • A Lingjun subnet is a subnet of the Lingjun VPD. For more information about Lingjun VPDs and their subnets, see Manage Lingjun VPDs.

    Note
    • You must plan a Lingjun VPD in advance. The Lingjun VPD cannot conflict with the CIDR blocks of other networks to which the Lingjun cluster is to be connected, such as CIDR blocks of VPCs or data centers.

    • The number of available IP addresses in a Lingjun VPD determines the maximum number of nodes that can be deployed in the Lingjun cluster. You must prepare a CIDR block whose subnet mask length is greater than 22 bits to prevent the cluster from being unable to be scaled up.

  2. Optional. Configure the bond allocation policy of a Lingjun subnet. If you select specific node models, you must configure the bond allocation policy for the physical NICs of Lingjun nodes. Bonds are associated with Lingjun nodes. You can configure bonds by configuring the bond allocation policy, node model allocation policy, or node allocation policy.

    Configure a bond allocation policy

    The number of bonds varies based on the node model. The number of bonds in a cluster is equal to the maximum number of bonds among all node models in the cluster. The bonds of a cluster are named in the bondx format. x starts from 0.

    For example, if the number of bonds for Node A is 3 and the number of bonds for Node B is 4, the number of bonds in the cluster is 4. The bonds in the cluster are named from bond0 to bond3. Node A uses the policies of bond0, bond1, and bond2.

    Note

    You can configure only one bond allocation policy for a cluster.

    Procedure

    1. Configure a bond allocation policy for the cluster.

    2. Optional. Configure the default bond allocation policy. The bonds that are not assigned a policy use the default bond allocation policy. Select Apply to all to assign the default bond policy to all bonds.

    Configure a node model allocation policy

    You can specify a node model allocation policy for each node model in a cluster. The maximum number of node model allocation policies in a cluster is equal to the number of node groups in the cluster.

    Procedure

    1. Click Model Type. In the AddModel Type dialog box, select a node model from the Model drop-down list.

    2. Configure a node model allocation policy. The policy is applied to all nodes of the selected model.

    Configure a node allocation policy

    You can configure a node allocation policy for each node in a cluster. Different bonds of a node can be connected to different CIDR blocks or subnets of a Lingjun cluster.

    Procedure

    1. Click Node Policy. In the AddNode Policy dialog box, select a node form the Node drop-down list.

    2. Configure a node allocation policy. The policy is applied to the selected node.

  3. Configure a Lingjun connection instance.

    1. Click Authorize to authorize the Lingjun connection instance to access other Alibaba Cloud services.

      You can use the Lingjun connection instance to connect the Lingjun cluster to a CEN instance and access other Alibaba Cloud services. Therefore, you must authorize Lingjun to access other Alibaba Cloud services. For more information, see Appendix: Service-linked role for Lingjun connection instances.

    2. Select the ID of the Lingjun connection instance from the InstanceID drop-down list. The Lingjun connection instance is used by the cluster to connect to the Alibaba Cloud public cloud.

    3. Select a CEN instance from the CEN drop-down list. The cluster is connected to the CEN instance by using the Lingjun connection instance.

      Important

      You must create a transit router in the CEN instance. The region of the transit router must be the same as that of the Lingjun nodes. For more information, see Transit routers.

  4. Configure the monitoring network.

    1. Configure the CEN instance. Connect a VPC to the transit router of the CEN instance that is created in the previous step. You can create a VPC or use an existing VPC. For more information, see the "CEN configurations" section of the Activate and configure other Alibaba Cloud services topic. Make sure that the vSwitch in the VPC has at least one idle IP address. The Lingjun cluster uses this vSwitch to monitor the network connectivity of the Lingjun connection instance.

      Important
      • You can select a VPC from the drop-down list only if you connect the VPC to the selected transit router.

      • The Lingjun VPD and the CIDR block of the VPC that is used as the monitoring network cannot conflict with each other, and the CIDR block of the VPC that is used as the monitoring network cannot conflict with the CIDR blocks of other networks to which the Lingjun cluster is to be connected, such as CIDR blocks of other VPCs or data centers.

    2. Click the image icons next to the VPC drop-down list and the Switch(VSwitch) drop-down list. Then, select the VPC and vSwitch that you create.

  5. Click Save and go to the next step. Basic Parameters of Software Instance.

Confirm the cluster configurations

In the Confirm Configuration step, confirm the basic information, network information, and instance parameters of the cluster, and click Submit Configuration to create the cluster. After the cluster is created, you are navigated to the Cluster Management page.