You can manage the resource quotas that you created. For example, you can view resource quota details, scale up and down resource quotas, and add child-level resource quotas.
View resource quota details
Go to the Resource Quota page in the Platform for AI (PAI) console. You can view the existing resource quotas on the Intelligent Computing Lingjun Resources and General Computing Resources tabs.
You can filter resource quotas by using resource quota name, resource quota ID, and workspace ID.
You can sort resource quotas based on the total or the scheduled amount of resources, such as CPU, memory, and GPU to monitor resource allocation and usage.
You can click the Refresh icon to obtain the updated information about the status, scheduled amount, and total amount of resource quotas.
Click the resource quota name to view the resource quota details.
You can view the details of resource quotas, including the scheduled amount of vCPUs, memory (TiB), and GPUs. You can view the total amount of resources, the consumption of current-level and child-level resources to fully understand resource usage.
You can view the details of resource quotas on the Overview, Node, Job, User, Monitoring, and Topology tabs.
Overview
On the resource quota details page, you can click the Overview tab to view the basic information, resource information, network information, and resource change history of the resource quota.
You can modify the following configurations:
Update basic information
Click the icon to update the resource quota name, the workspace to which the resource quota belongs, or the tag. After you associate the resource quota with a workspace, you can use the resource quota in the workspace. You can go to the Scheduling Center of a workspace and configure resource usage policies for the resource quotas that are associated to the workspace. The policy includes the PAI modules in which you want to use the resource quota, roles that are allowed to use the resource quota, GPUs, and resource specification templates. For more information, see Workspace scheduling center.
Update resource information
Click the icon next to the Scheduling Policy to modify the scheduling policy. For information about the principles of each scheduling policy, see Scheduling policies.
Enable or disable the Preempt Child-level Resources feature. After you enable the feature, the Deep Learning Containers (DLC) jobs that run on the current resource quota are allowed to preempt the computing resources of DLC jobs that run on the child-level resource quotas.
View resource change history
In the Resource Change History section, you can view the operation records of creating, scaling, and deleting the resource quota.
Node
On the resource quota details page, you can click the Node tab to view the node specifications of the resource quota.
Node management
You can click Stop Scheduling or Clear Node in the Actions column of the node to manage the node. After you stop scheduling, the system stops assigning jobs to the node and pauses the resource consumption.
Other common functions
You can view the total number of jobs and instances created from the node in the Number of Tasks and Number of Instances columns. Click Details in the Number of Tasks or Number of Instances column to view the details of the job and instance.
You can filter nodes by node status or click the icon to sort nodes.
Job
On the resource quota details page, you can click the Job tab to view the jobs that are created by using the current-level and child-level resource quotas.
You can turn on the View Current Resource Quota switch to view the jobs that are created by using the current resource quota.
User
On the resource quota details page, you can click the Users tab to view the users who created the current-level and child-level resource quotas.
You can view the scheduled vCPU, memory, and GPU resources of the user. You can also view the total number of jobs that the user submitted. You can click Details in the Number of Tasks column to view the details of the job.
Monitoring
On the resource quota details page, you can click the Monitoring tab to view the resource usage and metric data.
You can select a monitoring dimension. Valid values: Quota and Node dimensions.
You can select a time range. Valid values:
You can configure the number of metrics displayed in each row:
You can configure alert rules and enable alert notifications for monitoring metrics. If the resource usage fluctuates, an alert notification is sent. For more information, see Resource quota monitoring and alerting.
Topology
On the resource quota details page, you can click the Topology tab to view the resource view and task view. You can monitor the resource usage of each node in real time. This allows you to adjust the resource allocation policy and improve resource utilization.
Comparison between the views:
In the resource view, you can view the allocation of vCPU, memory, and GPU resources of the current-level and child-level resource quotas in detail.
In the task view, you can view the total number of jobs created by using the current-level and child-level resource quotas and the number of jobs in each status.
Scale up or down a resource quota
You can adjust the size of the resource quota based on the current job volume to implement effective cost management. On the Resource Quota page, find the resource quota that you want to manage and click Scale in the Actions column. On the page that appears, modify the Source and Specifications/Resources parameters.
Scale up: add resources to the resource quota and integrate resources of different specifications in the resource pool into the same resource quota.
Scale down: reduce the number of nodes of associated resource specifications or remove specific specifications from the resource quota.
Add child resource quotas
You can add child resource quotas to implement fine-grained resource management, optimize allocation policies, and improve resource utilization efficiency.
On the Resource Quota page, find the resource quota that you want to manage and click New Child-level Resource Quota in the Actions column to add a child-level resource quota to the resource quota. You can associate the child resource quota to a workspace and use child-level resource quota in the workspace.