All Products
Search
Document Center

Container Service for Kubernetes:Allocate computing power by scheduling shared GPU

Last Updated:Sep 20, 2024

You can request GPU memory and computing power for applications in Container Service for Kubernetes (ACK) Pro clusters. This feature allows you to use GPU memory and computing power in a finer-grained manner. This topic describes how to schedule shared GPU to allocate computing power.

Prerequisites

  • An ACK Pro cluster that runs Kubernetes 1.20 or later is created. For more information, see Create an ACK managed cluster. The version of kube-scheduler meets the requirement based on the cluster version. For more information about the features supported by each kube-scheduler version, see kube-scheduler.

    ACK cluster version

    Scheduler version

    1.28

    1.28.1-aliyun-5.6-998282b9 or later

    1.26

    v1.26.3-aliyun-4.1-a520c096 or later

    1.24

    1.24.3-ack-2.0 or later

    1.22

    1.22.15-ack-2.0 or later

    1.20

    1.20.4-ack-8.0 or later

  • The GPU sharing component is installed and the version of the installed Helm chart is later than 1.2.0. For more information about how to install the GPU sharing component, see Configure the GPU sharing component.

  • cGPU 1.0.5 or later is installed. For more information about how to update the cGPU version, see Update the cGPU version on a node.

Limits

  • GPU sharing supports jobs that request only GPU memory and jobs that request both GPU memory and computing power. However, you cannot deploy both types of jobs on a node at the same time. You can create only jobs that request only GPU memory or jobs that request both GPU memory and computing power on a node.

  • The following limits apply when you request computing power for jobs:

    • When you configure parameters to allocate the computing power of a GPU, the maximum value you can specify is 100, which indicates 100% of the computing power of the GPU. For example, a value of 20 indicates 20% of the computing power of the GPU.

    • The computing power value that you can specify must be a multiple of 5 and the minimum value is 5. If the value that you specify is not a multiple of 5, the job cannot be submitted.

  • Only the regions listed in the following table support the allocation of GPU memory and computing power. If you want to allocate GPU memory and computing power, make sure that your cluster resides in one of the following regions.

    Region

    Region ID

    China (Beijing)

    cn-beijing

    China (Shanghai)

    cn-shanghai

    China (Hangzhou)

    cn-hangzhou

    China (Zhangjiakou)

    cn-zhangjiakou

    China (Shenzhen)

    cn-shenzhen

    China (Chengdu)

    cn-chengdu

    China (Heyuan)

    cn-heyuan

    China (Hong Kong)

    cn-hongkong

    Indonesia (Jakarta)

    ap-southeast-5

    Singapore

    ap-southeast-1

    Thailand (Bangkok)

    ap-southeast-7

    US (Virginia)

    us-east-1

    US (Silicon Valley)

    us-west-1

    Japan (Tokyo)

    ap-northeast-1

    China East 2 Finance

    cn-shanghai-finance-1

  • The scheduler version that supports computing power allocation was released on March 1, 2022. Clusters that were created on March 1, 2022 or later use the latest scheduler version. The version of the scheduler used by clusters that were created before March 1, 2022 cannot be automatically updated. You must manually update the scheduler version. If your cluster was created before March 1, 2022, perform the following steps:

    1. Submit a ticket to apply to join the private preview for shared GPU scheduling of the latest version.

    2. Uninstall the GPU sharing component of the outdated version

      If the Helm chart version of the installed GPU sharing component is 1.2.0 or earlier, the version of the GPU sharing component is outdated and supports only memory sharing. Perform the following steps to uninstall the outdated version:

      1. Log on to the ACK console. In the left-side navigation pane, click Clusters.

      2. On the Clusters page, find the cluster that you want to manage and click its name. In the left-side navigation pane, choose Applications > Helm.

      3. On the Helm page, find ack-ai-installer and click Delete in the Actions column. In the Delete dialog box, click OK.

    3. Install the GPU sharing component of the latest version. For more information, see Configure the GPU sharing component.

Step 1: Create a node pool that supports computing power allocation

  1. Log on to the ACK console. In the left-side navigation pane, click Clusters.

  2. On the Clusters page, find the cluster that you want to manage and click its name. In the left-side navigation pane, choose Nodes > Node Pools.

  3. In the upper-right corner of the Node Pools page, click Create Node Pool.

    The following table describes some of the parameters used to configure the node pool. For more information, see Step 6 of the "Create an ACK Pro cluster" topic.

    Parameter

    Description

    Node Pool Name

    The name for the node pool. In this example, gpu-core is used.

    Expected Nodes

    The initial number of nodes in the node pool. If you do not want to create nodes in the node pool, set this parameter to 0.

    ECS Tags

    The labels that you want to add to the Elastic Compute Service (ECS) instances in the node pool.

    Node Label

    The labels that you want to add to the nodes in the node pool. The following configurations are used in this topic. For more information about node labels, see Labels for enabling GPU scheduling policies.

    • To enable GPU memory isolation and computing power isolation, click the 添加节点标签 icon and set the key to ack.node.gpu.schedule and value to core_mem.

    • To use the binpack algorithm to select GPUs for pods, click the 添加节点标签 icon and then set the key to ack.node.gpu.placement and value to binpack.

    Important

    If you want to enable computing power isolation for existing GPU-accelerated nodes in the cluster, you must first remove the nodes from the cluster and then add the nodes to a node pool that supports computing power isolation. You cannot directly run the kubectl label nodes <NODE_NAME> ack.node.gpu.schedule=core_mem command to enable computing power isolation for existing GPU-accelerated nodes.

Step 2: Check whether computing power allocation is enabled for the node pool

Run the following command to check whether computing power allocation is enabled for the nodes in the node pool:

kubectl get nodes <NODE_NAME> -o yaml

Expected output:

# Irrelevant fields are omitted. 
status:
  # Irrelevant fields are omitted. 
  allocatable:
    # The nodes have 4 GPUs, which provide 400% of computing power in total. Each GPU provides 100% of computing power. 
    aliyun.com/gpu-core.percentage: "400"
    aliyun.com/gpu-count: "4"
    # The nodes have 4 GPUs, which provide 60 GiB of memory in total. Each GPU provides 15 GiB of memory. 
    aliyun.com/gpu-mem: "60"
  capacity:
    aliyun.com/gpu-core.percentage: "400"
    aliyun.com/gpu-count: "4"
    aliyun.com/gpu-mem: "60"

The output contains the aliyun.com/gpu-core.percentage field, which indicates that computing power allocation is enabled.

Step 3: Use the computing power allocation feature

If you do not enable computing power allocation, a pod can use 100% of the computing power of a GPU. In this example, the memory of a GPU is 15 GiB. The following steps show how to create a job that requests both GPU memory and computing power. The job requests 2 GiB of GPU memory and 30% of the computing power of the GPU.

  1. Use the following YAML template to create a job that requests both GPU memory computing power:

    apiVersion: batch/v1
    kind: Job
    metadata:
      name: cuda-sample
    spec:
      parallelism: 1
      template:
        metadata:
          labels:
            app: cuda-sample
        spec:
          containers:
          - name: cuda-sample
            image:  registry.cn-beijing.aliyuncs.com/ai-samples/gpushare-sample:benchmark-tensorflow-2.2.3
            command:
            - bash
            - run.sh
            - --num_batches=500000000
            - --batch_size=8
            resources:
              limits:
                # Apply for 2 GiB of GPU memory. 
                aliyun.com/gpu-mem: 2
                # Apply for 30% of the computing power of the GPU. 
                aliyun.com/gpu-core.percentage: 30
            workingDir: /root
          restartPolicy: Never
  2. Run the following command to deploy the cuda-sample.yaml file and submit the cuda-sample job:

    kubectl apply -f /tmp/cuda-sample.yaml
    Note

    The image used by the job is large in size. Therefore, the image pulling process may be time-consuming.

  3. Run the following command to query the cuda-sample job:

    kubectl get po -l app=cuda-sample

    Expected output:

    NAME                READY   STATUS    RESTARTS   AGE
    cuda-sample-m****   1/1     Running   0          15s

    In the output, Running is displayed in the STATUS column, which indicates that the job is deployed.

  4. Run the following command to query the amount of GPU memory and computing power used by the pod that is provisioned for the job:

    kubectl exec -ti cuda-sample-m**** -- nvidia-smi

    Expected output:

    Thu Dec 16 02:53:22 2021
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 450.102.04   Driver Version: 450.102.04   CUDA Version: 11.0     |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |===============================+======================+======================|
    |   0  Tesla V100-SXM2...  On   | 00000000:00:08.0 Off |                    0 |
    | N/A   33C    P0    56W / 300W |    337MiB /  2154MiB |     30%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+
    
    +-----------------------------------------------------------------------------+
    | Processes:                                                                  |
    |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
    |        ID   ID                                                   Usage      |
    |=============================================================================|
    +-----------------------------------------------------------------------------+

    The output indicates the following information:

    • GPU memory: Before you enable computing power allocation, the pod can use 100% of the memory provided by the GPU. In this example, the total amount of memory provided by the GPU is 15 GiB. You can run the nvidia-smi command on the node to query the total amount of memory provided by the GPU. After you enable computing power allocation, the amount of memory that the pod uses is 337 MiB and the total amount of memory that the pod can use is 2,154 MiB, which is about 2 GiB. This indicates that memory isolation is enabled.

    • Computing power: Before you enable computing power allocation, the pod can use 100% of the computing power of the GPU. You can set the requested amount to 100 to verify that the pod can use 100% of the computing power. After you enable computing power allocation, the pod uses 30% of the computing power of the GPU. This indicates that computing power isolation is enabled.

    Note

    For example, a number of n jobs are created. Each job requests 30% of the computing power and the value of n is no greater than 3. The jobs are scheduled to one GPU. If you log on to the pods of the jobs and run the nvidia-smi command, the output shows that the pods use n × 30% of the computing power. The output of the nvidia-smi command shows only the computing power utilization per GPU. The command does not show the computing power utilization per job.

  5. Run the following command to display the pod log:

    kubectl logs cuda-sample-m**** -f

    Expected output:

    [CUDA Bandwidth Test] - Starting...
    Running on...
    
     Device 0: Tesla V100-SXM2-16GB
     Quick Mode
    
    time: 2021-12-16/02:50:59,count: 0,memSize: 32000000,succeed to copy data from host to gpu
    time: 2021-12-16/02:51:01,count: 1,memSize: 32000000,succeed to copy data from host to gpu
    time: 2021-12-16/02:51:02,count: 2,memSize: 32000000,succeed to copy data from host to gpu
    time: 2021-12-16/02:51:03,count: 3,memSize: 32000000,succeed to copy data from host to gpu

    The output shows that the pod log is generated at a lower rate after you enable computing power allocation. This is because each pod can use only about 30% of the computing power of the GPU.

  6. Optional: Run the following command to delete the cuda-sample job:

    After you verify that computing power allocation works as expected, you can delete the job.

    kubectl delete job cuda-sample