All Products
Search
Document Center

Platform For AI:Create a Lingjun cluster with ACK activated

Last Updated:Jun 03, 2024

Container Service for Kubernetes (ACK) allows you to create ACK Lingjun managed clusters whose control planes are highly available and fully managed by ACK. ACK Lingjun managed clusters are developed based on PAI-Lingjun AI Computing Service and contain Lingjun compute nodes that serve as worker nodes. This topic describes how to create a Lingjun cluster with ACK activated.

Prerequisites

  • Lingjun compute nodes and Lingjun connections are purchased based on your business requirements. For more information, see Activate Lingjun AI Computing Service and purchase resources.

  • Relevant cloud services, such as Cloud Enterprise Network (CEN), Application Real-Time Monitoring Service (ARMS), Virtual Private Cloud (VPC), and ACK Lingjun managed clusters, are purchased and configured based on your business requirements. For more information, see Activate and configure other Alibaba Cloud services.

  • Real-name verification is complete for your account, and the cash balance or credit balance of your account is at least CNY 100.

Background information

ACK Lingjun managed clusters provide fully managed and highly available control planes, and support efficient heterogeneous resource management and heterogeneous task scheduling. This type of cluster can be used as the cloud-native base of Machine Learning Platform for AI, and provides enhanced cloud-native capabilities that are suitable for AI scenarios and high performance computing (HPC) scenarios. For more information, see What is ACK Lingjun?

Create and configure a cluster

  1. Log on to the Intelligent Computing Lingjun console.

  2. In the left-side navigation pane, choose Resources and Nodes > Cluster Management. The Cluster Management page appears.

  3. Click Create Cluster to go to the Create Managed Cloud Cluster page.

  4. Click Lingjun Clusters (Including Machine Learning Platform for AI, ACK, and CPFS).

    In the Create Cluster wizard, complete the configurations in the Clusters and Groups, Create a Lingjun cluster with ACK activated, Basic Parameters of Software Instance, and Mapping Relationships between Software Instances and Groups steps.

Note

You are separately charged for ACK Lingjun managed clusters. For more information, see Billing of ACK Lingjun clusters.

Configure clusters and node groups

You can plan multiple clusters based on your business requirements and divide the compute nodes in a cluster into node groups. You can improve the resource utilization of compute nodes by planning your clusters and compute nodes. After you plan your clusters and compute nodes, perform the steps described in this section to configure clusters and node groups.

image
  1. Configure the cluster information.

    Specify information such as the cluster name, root password of cluster nodes, and resource group. For more information about how to create a resource group, see Create a resource group.

  2. Click Create Group to create a node group.

    1. In the Create Group dialog box, specify the group name and information about the nodes that belong to the group such as the node model and image based on your plan.

    2. Click Select Node Instances next to Node Instance to select the nodes to be added to the group.

  3. Click Save and go to the next step. Network Configurations.

Configure the basic parameters of software instances

  1. Configure the basic parameters.

    ACK

    Configure the parameters for an ACK Lingjun managed cluster. For more information about the parameters, see Create an ACK managed cluster.

    Important

    The Service CIDR block and the pod CIDR block, public CIDR block, and VPC CIDR block of the ACK Lingjun managed cluster cannot overlap with each other.

    CPFS

    Configure the parameters for a Cloud Parallel File Storage (CPFS) file system.

    Note

    After a CPFS file system is created, you can view the information about the file system in the CPFS console.

    Machine Learning Platform for AI

    Configure the parameters for Machine Learning Platform for AI.

    Note

    For more information about how to configure ApsaraDB RDS, Apsara File Storage NAS and CPFS file systems, Container Registry, and OAuth authentication, see Activate and configure other Alibaba Cloud services.

  2. Click Save and Go to Next Step: Mapping Relationships between Software Instances and Groups.

Configure the mappings between software instances and node groups

ACK Lingjun managed clusters provide Lingjun node pools in which you can deploy Lingjun compute nodes. This allows you to manage Lingjun nodes in an efficient manner. For example, you can configure and manage nodes, schedule applications to specified nodes, and configure GPUs based on node pools. For more information about node pools, see Overview of Lingjun node pools.

  1. Click Create Node Pool.

  2. Configure the information about an ACK node pool, such as the name of the node pool and the maximum number of nodes.

  3. Click Select Associated Group. In the dialog box that appears, select the node groups with which you want to associate the node pool and click OK.

  4. Click Save and Go to Next Step: Confirm Configuration.

Confirm the configurations

  1. In the Confirm Configuration step, check the parameters for basic cluster information, network configurations, mappings between software instances and node groups, and software instances. If the configurations are correct, click Submit Configuration to create the cluster.

  2. Click Complete Authorization in the Dependency Check section to complete the authorization for ACK.