All Products
Search
Document Center

Container Service for Kubernetes:Use cGPU to allocate computing power

Last Updated:Jan 14, 2025

ACK Pro clusters support the allocation of GPU video memory and computing power for applications, enabling more fine-grained resource management. This topic explains how to allocate computing power using cGPU.

Prerequisites

  • You must have created an ACK Pro cluster with a version of v1.20 or later. For instructions, see Create an ACK managed cluster. Additionally, ensure the scheduler version meets the requirements for different ACK versions. For details on the features supported by each scheduler version, see kube-scheduler.

    ACK cluster version

    Scheduler version

    1.28

    v1.28.1-aliyun-5.6-998282b9 or later

    1.26

    v1.26.3-aliyun-4.1-a520c096 or later

    1.24

    v1.24.3-ack-2.0 or later

    1.22

    v1.22.15-ack-2.0 or later

    1.20

    v1.20.4-ack-8.0 or later

  • The cGPU component must be installed with a Helm chart version of 1.2.0 or later. For installation steps, see Install the cGPU scheduling component.

  • Ensure cGPU version 1.0.5 or later is installed. For upgrade instructions, see Upgrade the cGPU version of nodes.

Limits

  • cGPU scheduling currently supports two task types: Applying For Video Memory Only and Applying For Both Video Memory And Computing Power. These tasks cannot coexist on the same node.

  • The following limits apply when requesting computing power for tasks:

    • Each GPU provides computing power measured as 100, representing 100% utilization. For instance, requesting 20 equates to 20% of the GPU card's computing power.

    • The requested computing power must be a multiple of 5, with a minimum of 5. Requests not meeting this criterion will result in task submission failure.

  • Only certain regions support GPU video memory and computing power allocation. Ensure your cluster is located in a supported region as listed in the following table.

    Region

    Region ID

    China (Beijing)

    cn-beijing

    China (Shanghai)

    cn-shanghai

    China (Hangzhou)

    cn-hangzhou

    China (Zhangjiakou)

    cn-zhangjiakou

    China (Shenzhen)

    cn-shenzhen

    China (Chengdu)

    cn-chengdu

    China (Heyuan)

    cn-heyuan

    Hong Kong (China)

    cn-hongkong

    Indonesia (Jakarta)

    ap-southeast-5

    Singapore

    ap-southeast-1

    Thailand (Bangkok)

    ap-southeast-7

    US (Virginia)

    us-east-1

    US (Silicon Valley)

    us-west-1

    Japan (Tokyo)

    ap-northeast-1

    China (Shanghai) Finance Cloud

    cn-shanghai-finance-1

  • The scheduler compatible with cGPU computing power allocation was released on March 1, 2022. Clusters created after this date use the updated scheduler. For clusters established before this date, the scheduler will not automatically upgrade. You must manually upgrade it by following these steps:

    1. Submit a ticket to request the beta version of the new cGPU scheduler.

    2. Uninstall the outdated cGPU component.

      If the outdated cGPU component (supporting only video memory sharing, Helm chart version ≤1.2.0) is installed, follow these steps:

      1. Log on to the ACK console. In the left-side navigation pane, click Clusters.

      2. On the Clusters page, find the cluster that you want to manage and click its name. In the left-side navigation pane, choose Applications > Helm.

      3. On the Helm page, click Delete in the Actions column next to ack-ai-installer. In the Delete Application dialog box, click OK.

    3. Install the latest cGPU component version. For more information, see install the cGPU scheduling component.

Step 1: Create a node pool that supports computing power allocation

  1. Log on to the ACK console. In the left-side navigation pane, click Clusters.

  2. On the Clusters page, find the cluster that you want to manage and click its name. In the left-side navigation pane, choose Nodes > Node Pools.

  3. On the Node Pool page, click Create Node Pool.

    Some parameter configurations are as follows. For detailed information, refer to Create and Manage Node Pools.

    Configuration Item

    Description

    Node Pool Name

    Enter a name for the node pool. In this topic, the configuration is set to gpu-core.

    Desired Number Of Nodes

    Specify the initial number of nodes in the node pool. If you do not need to create nodes, enter 0.

    ECS Tags

    Add labels to the Elastic Compute Service (ECS) instances.

    Node Tags

    Add labels to the nodes in the node pool. The configuration in this topic is as follows. For more information about node tags, see Node Tag Description for ACK Scheduling GPU Usage.

    • Enable GPU memory and computing power isolation on nodes: Click 添加节点标签, and enter the Key of the first node tag as ack.node.gpu.schedule, and the Value as core_mem.

    • Use the Binpack algorithm on nodes to select GPU cards for pods: Click 添加节点标签, and enter the Key of the second node tag as ack.node.gpu.placement, and the Value as binpack.

    Important

    To switch existing GPU nodes in the cluster to computing power isolation mode, remove the node from the cluster and re-add it to a node pool that supports computing power isolation. Directly using the kubectl label nodes <NODE_NAME> ack.node.gpu.schedule=core_mem command for this purpose is not supported.

Step 2: Check whether the node pool has enabled the computing power allocation feature

To verify if the node pool has enabled the computing power allocation feature, execute the following command:

kubectl get nodes <NODE_NAME> -o yaml

Expected output:

# Irrelevant fields are not shown.
status:
  # Irrelevant fields are not shown.
  allocatable:
    # The node has a total computing power of 400%, with 4 GPU cards, each providing 100% computing power.
    aliyun.com/gpu-core.percentage: "400"
    aliyun.com/gpu-count: "4"
    # The node has a total of 60 GiB of video memory, with 4 GPU cards, each providing 15 GiB of video memory.
    aliyun.com/gpu-mem: "60"
  capacity:
    aliyun.com/gpu-core.percentage: "400"
    aliyun.com/gpu-count: "4"
    aliyun.com/gpu-mem: "60"

The output should include the aliyun.com/gpu-core.percentage field, indicating that the computing power allocation feature is enabled.

Step 3: Use the computing power allocation feature

Before using the computing power allocation feature, a pod can utilize 100% of a GPU card with 15 GiB of video memory. This example demonstrates a Task That Requests Both Video Memory And Computing Power, requesting 2 GiB of video memory and 30% of thecomputing power of a GPU card.

  1. Create a job with the following YAML content to request GPU video memory and computing power.

    apiVersion: batch/v1
    kind: Job
    metadata:
      name: cuda-sample
    spec:
      parallelism: 1
      template:
        metadata:
          labels:
            app: cuda-sample
        spec:
          containers:
          - name: cuda-sample
            image:  registry.cn-beijing.aliyuncs.com/ai-samples/gpushare-sample:benchmark-tensorflow-2.2.3
            command:
            - bash
            - run.sh
            - --num_batches=500000000
            - --batch_size=8
            resources:
              limits:
                # The unit is GiB. This pod requests a total of 2 GiB of video memory.
                aliyun.com/gpu-mem: 2
                # Request 30% of the computing power of a GPU card.
                aliyun.com/gpu-core.percentage: 30
            workingDir: /root
          restartPolicy: Never
  2. Deploy the cuda-sample job by executing the command below.

    kubectl apply -f /tmp/cuda-sample.yaml
    Note

    The job's image is large and may take time to pull. Please be patient.

  3. Check the running status of the cuda-sample job by executing the following command.

    kubectl get po -l app=cuda-sample

    Expected output:

    NAME                READY   STATUS    RESTARTS   AGE
    cuda-sample-m****   1/1     Running   0          15s

    The output should show that the STATUS is Running, indicating successful deployment.

  4. To check the usage of video memory and computing power, execute the command below.

    kubectl exec -ti cuda-sample-m**** -- nvidia-smi

    Expected output:

    Thu Dec 16 02:53:22 2021
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 450.102.04   Driver Version: 450.102.04   CUDA Version: 11.0     |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |===============================+======================+======================|
    |   0  Tesla V100-SXM2...  On   | 00000000:00:08.0 Off |                    0 |
    | N/A   33C    P0    56W / 300W |    337MiB /  2154MiB |     30%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+
    
    +-----------------------------------------------------------------------------+
    | Processes:                                                                  |
    |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
    |        ID   ID                                                   Usage      |
    |=============================================================================|
    +-----------------------------------------------------------------------------+

    The output should indicate the following:

    • Video memory: Prior to enabling the computing power allocation feature, a pod could use the full 15 GiB of video memory on the GPU card. After enabling, the pod is currently using 337 MiB out of an allocated 2154 MiB (approximately 2 GiB), confirming effective video memory isolation.

    • Computing power: Before the feature, a pod could use the full 100% of the GPU card. After enabling, the pod is using 30% of the computing power, confirming effective computing power isolation.

    Note

    If there are n jobs, each requesting 30% of computing power (n≤3), and they are running on the same GPU card, executing the nvidia-smi command in each pod will show a computing power of n×30%. The nvidia-smi command currently only displays the computing power utilization rate per card.

  5. View the pod log using the following command.

    kubectl logs cuda-sample-m**** -f

    Expected output:

    [CUDA Bandwidth Test] - Starting...
    Running on...
    
     Device 0: Tesla V100-SXM2-16GB
     Quick Mode
    
    time: 2021-12-16/02:50:59,count: 0,memSize: 32000000,succeed to copy data from host to gpu
    time: 2021-12-16/02:51:01,count: 1,memSize: 32000000,succeed to copy data from host to gpu
    time: 2021-12-16/02:51:02,count: 2,memSize: 32000000,succeed to copy data from host to gpu
    time: 2021-12-16/02:51:03,count: 3,memSize: 32000000,succeed to copy data from host to gpu

    The output shows that after enabling the computing power allocation feature, the pod log refreshes more slowly, indicating that the computing power is limited to approximately 30% of the GPU card.

  6. Optional: Delete the cuda-sample job by executing the command below.

    After verification, you may delete the job.

    kubectl delete job cuda-sample