As an essential component of the AI computing engine of Alibaba Cloud Platform for AI (PAI), Lingjun resources are designed for large-scale and high-density computing. Lingjun resources provide heterogeneous computing power tailored for high-performance AI training and computing. You can use Lingjun resources in Data Science Workshop (DSW), Deep Learning Containers (DLC), and Elastic Algorithm Service (EAS) to facilitate AI development, training, and service deployment. This topic describes how to create a resource group and purchase Lingjun resources.
Overview
Lingjun resource
Lingjun resources are the new-generation intelligent computing resources developed by Alibaba Cloud that provide the following features:
High-speed Remote Direct Memory Access (RDMA) network architecture
High-performance communication library
High-performance acceleration software
Technical solution for GPU virtualization
Lingjun resources can meet your requirements for high-performance computing.
Lingjun resource group
PAI provides fully managed Lingjun resources that you can purchase and use in resource groups in the PAI console. If you purchase Lingjun hardware resources, you can add the resources to the PAI console as semi-managed resources and use them to run training jobs.
Limits
Supported regions
Lingjun resources are available only in the China (Ulanqab) and Singapore regions.
Supported users
Only users in the whitelist can use Lingjun resources. If you want to use Lingjun resources to run training jobs, submit a ticket to apply to join the whitelist.
Supported job types
Lingjun resources support training jobs of only the following type: TensorFlow, PyTorch, ElasticBatch, and MPIJob.
Account and permission requirements
Alibaba Cloud account: You can use an Alibaba Cloud account to perform all operations without additional authorization.
RAM user: Contact your Alibaba Cloud account to grant permissions to manage the resource pool or attach the AliyunPAIFullAccess policy to the RAM user. For more information, see the "Permissions to manage the resource pool" section in the Custom policies for RAM users topic.
ImportantThe AliyunPAIFullAccess policy provides permissions to manage all resources and features of PAI. Exercise caution when you grant these permissions.
Dependencies
Lingjun resources depend on the following Alibaba Cloud services. To create, purchase, and use Lingjun resources, familiarize yourself with and activate these Alibaba Cloud services and prepare resources based on your business requirements.
VPC (required)
When you allocate Lingjun resources, you must associate the resources with a virtual private cloud (VPC) in the same region and configure a vSwitch and a security group. This ensures the network connectivity between the Lingjun resources and other Alibaba Cloud services.
Internet NAT gateway and EIP (optional)
Your Lingjun resources may need to access the Internet. For example, they may need to pull custom images from the Internet. In this case, you must configure an Internet NAT gateway with SNAT enabled and associate an elastic IP address (EIP) with the Internet NAT gateway.
For more information, see Use the SNAT feature of an Internet NAT gateway to access the Internet.
OSS, NAS, and CPFS (optional)
To submit DLC training jobs to Lingjun resources, you must create datasets first. Lingjun resources supports only Object Storage Service (OSS), File Storage NAS (NAS), and Cloud Parallel File Storage (CPFS) datasets. For more information, see the Prepare a dataset section of the "General process" topic.
Procedure
Create a Lingjun resource group
Go to the Resource Pool page in the PAI console.
On the Intelligent Computing Lingjun resources tab, click Create Resource Group.
In the Create Resource Group dialog box, configure the parameters described in the following table and click OK.
Parameter
Description
Type
Select Dedicated Resource Group.
Resource Group Name
Enter a resource group name based on the naming rule.
Purchase Lingjun resources
To purchase Lingjun resources for a dedicated resource group, perform the following steps. For more information about the billing of Lingjun resources, see Billing of Lingjun resources (Serverless Edition).
On the Intelligent Computing Lingjun resources tab, click the name of the resource group that you want to manage.
In the upper-right corner of the resource group details page, click Create Order.
On the buy page, configure the parameters such as Node Specification, Nodes, and Duration. Then, click Buy Now.
After you complete the payment, the purchased Lingjun resources are displayed on the Orders tab of the resource group details page.
References
After you create a resource group and purchase computing resources, you can perform the following operations:
On the resource group details page, view the basic information about the resource group and manage the purchased resources. For more information, see the Manage resources section of the "Overview" topic.
Allocate the purchased resources to specific training jobs by configuring resource quotas. For more information, see Lingjun resource quotas.