All Products
Search
Document Center

Container Service for Kubernetes:Labels for enabling GPU scheduling policies and methods for changing label values

Last Updated:Jun 19, 2024

After you install the scheduling component ack-ai-installer provided by the cloud-native AI suite, you can add a label to a GPU-accelerated node to enable a scheduling policy, such as GPU sharing or topology-aware GPU scheduling. This topic describes the labels for enabling different GPU scheduling policies. This topic also describes how to change the value of a label.

Labels for enabling GPU scheduling policies

ack.node.gpu.schedule

Policy

Label value

Whether other label values are supported

Description

Exclusive GPU scheduling

default

Yes. The following label values are supported:

  • Label value that corresponds to the GPU sharing policy.

  • Label value that corresponds to the topology-aware GPU scheduling policy.

  • This label value enables the exclusive GPU scheduling policy. GPU resources are allocated to pods by GPU.

  • By default, GPU-accelerated nodes that are newly added to a cluster do not have the ack.node.gpu.schedule:default label and use the exclusive GPU scheduling policy.

  • If the ack.node.gpu.schedule label is already added to a node to enable the GPU sharing policy and you want the node to use the exclusive GPU scheduling policy, you need to add the ack.node.gpu.schedule:default label to the node. You cannot configure the node to use the exclusive GPU scheduling policy by deleting the ack.node.gpu.schedule label from the node.

GPU sharing

cgpu

Yes. The following label values are supported:

  • Label value that corresponds to the exclusive GPU scheduling policy.

  • Label value that corresponds to the topology-aware GPU scheduling policy.

  • This label value enables the GPU sharing policy.

  • The node that has GPU sharing enabled is installed with the GPU isolation module cGPU. For more information about cGPU, see Install and use cGPU on a Docker container.

  • By default, the computing power scheduling policy of cGPU is set to 5. For more information about the computing power scheduling policy, see Install and use cGPU on a Docker container.

  • Pods can request only GPU memory. GPU memory isolation and computing power sharing are implemented for pods that share the same GPU.

core_mem

Yes. The following label values are supported:

  • Label value that corresponds to the exclusive GPU scheduling policy.

  • Label value that corresponds to the topology-aware GPU scheduling policy.

  • This label value enables the GPU sharing policy.

  • The node that has GPU sharing enabled is installed with the GPU isolation module cGPU. For more information about cGPU, see Install and use cGPU on a Docker container.

  • By default, the computing power scheduling policy of cGPU is set to 3. For more information about the computing power scheduling policy, see Install and use cGPU on a Docker container.

  • Pods need to request both GPU memory and computing power. Pods that share the same GPU can use only limited GPU memory and computing power.

share

Yes. The following label values are supported:

  • Label value that corresponds to the exclusive GPU scheduling policy.

  • Label value that corresponds to the topology-aware GPU scheduling policy.

  • cgpu

  • core_mem

  • This label value enables the GPU sharing policy.

  • The node that has GPU sharing enabled is not installed with the GPU isolation module cGPU.

  • Pods can request only GPU memory. Computing power sharing is implemented for pods that share the same GPU.

Topology-aware GPU scheduling

topology

Yes. The following label values are supported:

  • Label value that corresponds to the exclusive GPU scheduling policy.

  • Label value that corresponds to the topology-aware GPU scheduling policy.

  • The label value enables the topology-aware GPU scheduling policy.

  • GPU resources are allocated to pods by GPU. The number of GPUs that are allocated to each pod depends on the bandwidth of GPU-to-GPU data transfer.

Dynamic Multi-Instance GPU (MIG) partitioning

mig

No.

  • This label value enables the MIG feature.

  • A node reports the maximum number of MIG instances that the node supports to the scheduler. Each container can request at most one MIG instance.

  • To enable the MIG feature, you also need to modify the hardware attributes of nodes. Therefore, if you want to disable the MIG feature for a node pool, you need to delete the current node pool and then create another node pool that has MIG disabled. You cannot switch to another GPU scheduling policy by running the kubectl label command to change the label value.

ack.node.gpu.placement

Policy

Label value

Whether other label values are supported

Description

GPU sharing

spread

Yes. You can change the label value to binpack.

  • The policy takes effect only when the node has GPU sharing enabled.

  • This policy spreads pods to different GPUs when the node has multiple GPUs.

binpack

Yes. You can change the label value to spread.

  • The policy takes effect only when the node has GPU sharing enabled.

  • The policy allocates all resources of a GPU to pods before switching to another GPU when the node has multiple GPUs. This helps avoid GPU fragments.

  • If a GPU-accelerated node that has GPU sharing enabled does not have the ack.node.gpu.placement label, the scheduler uses the binpack policy to schedule GPU resources to pods.

Change label values

Issues that may occur if you use the kubectl label nodes command or use the label management feature to change label values in the ACK console

The following issues may occur if you run the kubectl label nodes command to switch the GPU scheduling policy of a GPU-accelerated node from A to B, or use the Labels feature on the Nodes page of the ACK console to change label values:

  • Applications that use GPU resources may be deployed on the node. The pods of these applications request GPU resources based on Scheduling Policy A. After the scheduling policy is switched from A to B, the preceding pods will not be included in the GPU resource ledger of the node maintained by the scheduler. The GPU resource ledger is inconsistent with the actual GPU resource allocation. Consequently, the preceding applications will compete for GPU resources with other GPU-heavy applications.

  • Some features are enabled by adding settings to the configuration of the node. Using the kubectl label nodes command or using the label management feature in the Container Service for Kubernetes (ACK) console to change label values does not reset the configuration of the node. Consequently, the node may fail to enable the specified scheduling policy.

To avoid the preceding issues, we recommend that you configure GPU scheduling policies for node pools.

Configure GPU scheduling policies for node pools

Assume that you want to enable GPU sharing (with GPU memory isolation only) and GPU sharing (with GPU memory isolation and computing power limits) for a cluster. In this scenario, you can create two node pools in the cluster:

  • Node Pool A: manages nodes that have GPU memory isolation enabled.

  • Node Pool B: manages nodes that have GPU memory isolation and computing power limits enabled.

2.png

To change the GPU scheduling policy of a GPU-accelerated node from the policy used by Node Pool A to the policy used by Node Pool B, remove the node from Node Pool A and add the node to Node Pool B. For more information, see Remove a node and Add existing ECS instances to an ACK cluster.

Manually change the GPU scheduling policy of a node

You can also manually change the GPU scheduling policy of a node. To do this, you need to perform the following operations:

  1. Set the node to Unschedulable: Disable the node to accept new pods.

  2. Drain the node: Evict all existing pods from the node.

  3. Log on to the node and reset the label settings: The reset operation varies based on the value of the label. For more information, see Reset label settings.

  4. Change the label value: After you complete the reset operation, run the kubectl label command to change the label value.

  5. Set the node to Schedulable: Enable the node to accept new pods.

3.png

Reset label settings

Policy

Label

Operation to reset label settings before changing the label value

GPU sharing

ack.node.gpu.schedule=cgpu

bash /usr/local/cgpu-installer/uninstall.sh

ack.node.gpu.schedule=core_mem

bash /usr/local/cgpu-installer/uninstall.sh

References

For more information about the labels that are used to specify GPU model, how to schedule applications to specific GPU models, and how to avoid scheduling applications to specific GPU models, see Labels used to specify GPU models.