In an ACK managed cluster Pro, you can assign scheduling labels to GPU nodes to optimize resource utilization and precisely schedule applications. These labels define properties such as exclusive access, shared use, topology awareness, and specific GPU card models.
Scheduling label overview
GPU scheduling labels identify GPU models and resource allocation policies to support fine-grained resource management and efficient scheduling.
Scheduling mode | Label value | Scenarios |
Exclusive scheduling (Default) |
| Performance-critical tasks that require exclusive access to an entire GPU, such as model training and high-performance computing (HPC). |
Shared scheduling |
| Improves GPU utilization. Ideal for scenarios with multiple concurrent lightweight tasks, such as multitenancy and inference.
|
| Optimizes the resource allocation strategy on multi-GPU nodes when
| |
Topology-aware scheduling |
| Automatically assigns the optimal combination of GPUs to a pod based on the physical GPU topology of a single node. This is suitable for tasks that are sensitive to GPU-to-GPU communication latency. |
Card model scheduling |
Use these labels with card model scheduling to set the GPU memory and total number of GPU cards for a GPU job. | Schedules tasks to nodes with a specific GPU model or avoids nodes with a specific model. |
Enable scheduling features
A node can use only one GPU scheduling mode at a time: exclusive, shared, or topology-aware. When one mode is enabled, the extended resources for other modes are automatically set to 0.
Exclusive scheduling
If a node has no GPU scheduling labels, exclusive scheduling is enabled by default. In this mode, the node allocates GPU resources to pods in whole-card units.
If you have enabled other GPU scheduling modes, deleting the label does not restore exclusive scheduling. To restore exclusive scheduling, you must manually change the label value to ack.node.gpu.schedule: default.Shared scheduling
Shared scheduling is available only for ACK managed cluster Pro. For more information, see Limits.
Install the
ack-ai-installercomponent.Log on to the ACK console. In the left navigation pane, click Clusters.
On the Clusters page, click the name of the target cluster. In the navigation pane on the left, choose .
On the Cloud-native AI Suite page, click Deploy. On the Deploy Cloud-native AI Suite page, select Scheduling Policy Extension (Batch Task Scheduling, GPU Sharing, Topology-aware GPU Scheduling).
For more information about how to set the computing power scheduling policy for the cGPU service, see Install and use the cGPU service.
On the Cloud-native AI Suite page, click Deploy Cloud-native AI Suite.
On the Cloud-native AI Suite page, verify that the ack-ai-installer component appears in the list of installed components.
Enable shared scheduling.
On the Clusters page, click the name of your target cluster. In the navigation pane on the left, choose .
On the Node Pools page, click Create Node Pool, configure the node labels, and then click Confirm.
You can keep the default values for other configuration items. For more information about the function of each label, see Scheduling label overview.
Configure basic shared scheduling.
Click the
icon for Node Labels, set the Key to ack.node.gpu.schedule, and select a value such ascgpu,core_mem,share, ormps(requires installing the MPS Control Daemon component).Configure multi-card shared scheduling.
On multi-GPU nodes, you can add a placement policy to your basic shared scheduling configuration to optimize resource allocation.
Click the
icon for Node Labels, set the Key to ack.node.gpu.placement, and select eitherbinpackorspreadas the label value.
Verify that shared scheduling is enabled.
cgpu/share/mpsReplace <NODE_NAME> with the name of a node in the target node pool and run the following command to verify that
cgpu,share, ormpsshared scheduling is enabled on the node.kubectl get nodes <NODE_NAME> -o yaml | grep -q "aliyun.com/gpu-mem"Expected output:
aliyun.com/gpu-mem: "60"If the
aliyun.com/gpu-memfield is not 0,cgpu,share, ormpsshared scheduling is enabled.core_memReplace
<NODE_NAME>with the name of a node in the target node pool and run the following command to verify thatcore_memshared scheduling is enabled.kubectl get nodes <NODE_NAME> -o yaml | grep -E 'aliyun\.com/gpu-core\.percentage|aliyun\.com/gpu-mem'Expected output:
aliyun.com/gpu-core.percentage:"80" aliyun.com/gpu-mem:"6"If the
aliyun.com/gpu-core.percentageandaliyun.com/gpu-memfields are both non-zero,core_memshared scheduling is enabled.binpackUse the shared GPU resource query tool to check the GPU resource allocation on the node:
kubectl inspect cgpuExpected output:
NAME IPADDRESS GPU0(Allocated/Total) GPU1(Allocated/Total) GPU2(Allocated/Total) GPU3(Allocated/Total) GPU Memory(GiB) cn-shanghai.192.0.2.109 192.0.2.109 15/15 9/15 0/15 0/15 24/60 -------------------------------------------------------------------------------------- Allocated/Total GPU Memory In Cluster: 24/60 (40%)The output shows that GPU0 is fully allocated (15/15) while GPU1 is partially allocated (9/15). This confirms that the
binpackpolicy is active. This policy fills one GPU completely before allocating resources on the next.spreadUse the shared scheduling GPU resource query tool to check the GPU resource allocation on the node:
kubectl inspect cgpuExpected output:
NAME IPADDRESS GPU0(Allocated/Total) GPU1(Allocated/Total) GPU2(Allocated/Total) GPU3(Allocated/Total) GPU Memory(GiB) cn-shanghai.192.0.2.109 192.0.2.109 4/15 4/15 0/15 4/15 12/60 -------------------------------------------------------------------------------------- Allocated/Total GPU Memory In Cluster: 12/60 (20%)The output shows that the resource allocation is 4/15 on GPU0, 4/15 on GPU1, and 4/15 on GPU3. This is consistent with the scheduling policy that prioritizes spreading pods across different GPUs, which confirms that the
spreadpolicy is in effect.
Topology-aware scheduling
Topology-aware scheduling is available only for ACK managed cluster Pro. For more information, see System component version requirements.
Enable topology-aware scheduling.
Replace <NODE_NAME> with the name of your target node and run the following command to add a label to the node and enable topology-aware GPU scheduling.
kubectl label node <NODE_NAME> ack.node.gpu.schedule=topologyAfter you enable topology-aware scheduling on a node, the node no longer supports GPU workloads that are not topology-aware. To restore exclusive scheduling, run the command
kubectl label node <NODE_NAME> ack.node.gpu.schedule=default --overwrite.Verify that topology-aware scheduling is enabled.
Replace <NODE_NAME> with the name of your target node and run the following command to verify that
topologytopology-aware scheduling is enabled.kubectl get nodes <NODE_NAME> -o yaml | grep aliyun.com/gpuExpected output:
aliyun.com/gpu: "2"If the
aliyun.com/gpufield is not 0,topologytopology-aware scheduling is enabled.
Card model scheduling
You can schedule Jobs to nodes with a specific GPU model or avoid nodes with a specific model.
View the GPU card model on the node.
Run the following command to query the GPU card model of the nodes in your cluster.
The NVIDIA_NAME field shows the GPU card model.
kubectl get nodes -L aliyun.accelerator/nvidia_nameThe expected output is similar to the following:
NAME STATUS ROLES AGE VERSION NVIDIA_NAME cn-shanghai.192.XX.XX.176 Ready <none> 17d v1.26.3-aliyun.1 Tesla-V100-SXM2-32GB cn-shanghai.192.XX.XX.177 Ready <none> 17d v1.26.3-aliyun.1 Tesla-V100-SXM2-32GBEnable card model scheduling.
On the Clusters page, find the cluster you want and click its name. In the left navigation pane, choose .
On the Jobs page, click Create From YAML. Use the following examples to create an application and enable card model scheduling.

Specify a particular card model
Use the GPU card model scheduling label to ensure your application runs on nodes with a specific card model.
In the code
aliyun.accelerator/nvidia_name: "Tesla-V100-SXM2-32GB", replaceTesla-V100-SXM2-32GBwith the card model of your node.After the job is created, choose from the navigation pane on the left. The pod list shows that the example pod is scheduled to a matching node. This confirms that scheduling based on the GPU card model label is working.
Exclude a particular card model
Use the GPU card model scheduling label with node affinity and anti-affinity to prevent your application from running on certain card models.
In
values: - "Tesla-V100-SXM2-32GB", replaceTesla-V100-SXM2-32GBwith the card model of your node.After the job is created, the application is not scheduled on nodes with the
aliyun.accelerator/nvidia_nametag key and theTesla-V100-SXM2-32GBtag value. However, it can be scheduled on other GPU nodes.
