All Products
Search
Document Center

Platform For AI:Create a resource group and purchase Lingjun resources

Last Updated:Oct 31, 2024

As an essential component of the AI computing engine of Alibaba Cloud Platform for AI (PAI), Lingjun resources are designed for large-scale and high-density computing. Lingjun resources provide heterogeneous computing power tailored for high-performance AI training and computing. You can use Lingjun resources in Data Science Workshop (DSW), Deep Learning Containers (DLC), and Elastic Algorithm Service (EAS) to facilitate AI development, training, and service deployment. This topic describes how to create a resource group and purchase Lingjun resources.

Overview

Lingjun resource

Lingjun resources are the new-generation intelligent computing resources developed by Alibaba Cloud that provide the following features:

  • High-speed Remote Direct Memory Access (RDMA) network architecture

  • High-performance communication library

  • High-performance acceleration software

  • Technical solution for GPU virtualization

Lingjun resources can meet your requirements for high-performance computing.

Lingjun resource group

PAI provides fully managed Lingjun resources that you can purchase and use in resource groups in the PAI console. If you purchase Lingjun hardware resources, you can add the resources to the PAI console as semi-managed resources and use them to run training jobs.

Limits

  • Supported regions

    Lingjun resources are available only in the China (Ulanqab) and Singapore regions.

  • Supported users

    Only users in the whitelist can use Lingjun resources. If you want to use Lingjun resources to run training jobs, submit a ticket to apply to join the whitelist.

  • Supported job types

    Lingjun resources support training jobs of only the following type: TensorFlow, PyTorch, ElasticBatch, and MPIJob.

Account and permission requirements

  • Alibaba Cloud account: You can use an Alibaba Cloud account to perform all operations without additional authorization.

  • RAM user: Contact your Alibaba Cloud account to grant permissions to manage the resource pool or attach the AliyunPAIFullAccess policy to the RAM user. For more information, see the "Permissions to manage the resource pool" section in the Custom policies for RAM users topic.

    Important

    The AliyunPAIFullAccess policy provides permissions to manage all resources and features of PAI. Exercise caution when you grant these permissions.

Dependencies

Lingjun resources depend on the following Alibaba Cloud services. To create, purchase, and use Lingjun resources, familiarize yourself with and activate these Alibaba Cloud services and prepare resources based on your business requirements.

VPC (required)

When you allocate Lingjun resources, you must associate the resources with a virtual private cloud (VPC) in the same region and configure a vSwitch and a security group. This ensures the network connectivity between the Lingjun resources and other Alibaba Cloud services.

Internet NAT gateway and EIP (optional)

Your Lingjun resources may need to access the Internet. For example, they may need to pull custom images from the Internet. In this case, you must configure an Internet NAT gateway with SNAT enabled and associate an elastic IP address (EIP) with the Internet NAT gateway.

For more information, see Use the SNAT feature of an Internet NAT gateway to access the Internet.

OSS, NAS, and CPFS (optional)

To submit DLC training jobs to Lingjun resources, you must create datasets first. Lingjun resources supports only Object Storage Service (OSS), File Storage NAS (NAS), and Cloud Parallel File Storage (CPFS) datasets. For more information, see the Prepare a dataset section of the "General process" topic.

Procedure

Create a Lingjun resource group

  1. Go to the Resource Pool page in the PAI console.

  2. On the Intelligent Computing Lingjun resources tab, click Create Resource Group.

  3. In the Create Resource Group dialog box, configure the parameters described in the following table and click OK.

    Parameter

    Description

    Type

    Select Dedicated Resource Group.

    Resource Group Name

    Enter a resource group name based on the naming rule.

Purchase Lingjun resources

To purchase Lingjun resources for a dedicated resource group, perform the following steps. For more information about the billing of Lingjun resources, see Billing of Lingjun resources (Serverless Edition).

  1. On the Intelligent Computing Lingjun resources tab, click the name of the resource group that you want to manage.

  2. In the upper-right corner of the resource group details page, click Create Order.

  3. On the buy page, configure the parameters such as Node Specification, Nodes, and Duration. Then, click Buy Now.image

  4. After you complete the payment, the purchased Lingjun resources are displayed on the Orders tab of the resource group details page.image

References

After you create a resource group and purchase computing resources, you can perform the following operations:

  • On the resource group details page, view the basic information about the resource group and manage the purchased resources. For more information, see the Manage resources section of the "Overview" topic.

  • Allocate the purchased resources to specific training jobs by configuring resource quotas. For more information, see Lingjun resource quotas.