How to use cGPU to allocate computing power - Container Service for Kubernetes

ACK Pro clusters support the allocation of GPU video memory and computing power for applications, enabling more fine-grained resource management. This topic explains how to allocate computing power using cGPU.

Prerequisites

You must have created an ACK Pro cluster with a version of v1.20 or later. For instructions, see Create an ACK managed cluster. Additionally, ensure the scheduler version meets the requirements for different ACK versions. For details on the features supported by each scheduler version, see kube-scheduler.

ACK cluster version	Scheduler version
1.28	v1.28.1-aliyun-5.6-998282b9 or later
1.26	v1.26.3-aliyun-4.1-a520c096 or later
1.24	v1.24.3-ack-2.0 or later
1.22	v1.22.15-ack-2.0 or later
1.20	v1.20.4-ack-8.0 or later

The cGPU component must be installed with a Helm chart version of 1.2.0 or later. For installation steps, see Install the cGPU scheduling component.
Ensure cGPU version 1.0.5 or later is installed. For upgrade instructions, see Upgrade the cGPU version of nodes.

Limits

cGPU scheduling currently supports two task types: Applying For Video Memory Only and Applying For Both Video Memory And Computing Power. These tasks cannot coexist on the same node.
The following limits apply when requesting computing power for tasks:
- Each GPU provides computing power measured as 100, representing 100% utilization. For instance, requesting 20 equates to 20% of the GPU card's computing power.
- The requested computing power must be a multiple of 5, with a minimum of 5. Requests not meeting this criterion will result in task submission failure.

Only certain regions support GPU video memory and computing power allocation. Ensure your cluster is located in a supported region as listed in the following table.

Region	Region ID
China (Beijing)	cn-beijing
China (Shanghai)	cn-shanghai
China (Hangzhou)	cn-hangzhou
China (Zhangjiakou)	cn-zhangjiakou
China (Shenzhen)	cn-shenzhen
China (Chengdu)	cn-chengdu
China (Heyuan)	cn-heyuan
Hong Kong (China)	cn-hongkong
Indonesia (Jakarta)	ap-southeast-5
Singapore	ap-southeast-1
Thailand (Bangkok)	ap-southeast-7
US (Virginia)	us-east-1
US (Silicon Valley)	us-west-1
Japan (Tokyo)	ap-northeast-1
China (Shanghai) Finance Cloud	cn-shanghai-finance-1

The scheduler compatible with cGPU computing power allocation was released on March 1, 2022. Clusters created after this date use the updated scheduler. For clusters established before this date, the scheduler will not automatically upgrade. You must manually upgrade it by following these steps:
1. Submit a ticket to request the beta version of the new cGPU scheduler.
2. Uninstall the outdated cGPU component.
  
  If the outdated cGPU component (supporting only video memory sharing, Helm chart version ≤1.2.0) is installed, follow these steps:
  1. Log on to the ACK console. In the left-side navigation pane, click Clusters.
  2. On the Clusters page, find the cluster that you want to manage and click its name. In the left-side navigation pane, choose Applications > Helm.
  3. On the Helm page, click Delete in the Actions column next to ack-ai-installer. In the Delete Application dialog box, click OK.
3. Install the latest cGPU component version. For more information, see install the cGPU scheduling component.

Step 1: Create a node pool that supports computing power allocation

Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side navigation pane, choose Nodes > Node Pools.

On the Node Pool page, click Create Node Pool.

Some parameter configurations are as follows. For detailed information, refer to Create and Manage Node Pools.

Configuration Item	Description
Node Pool Name	Enter a name for the node pool. In this topic, the configuration is set to `gpu-core`.
Desired Number Of Nodes	Specify the initial number of nodes in the node pool. If you do not need to create nodes, enter `0`.
ECS Tags	Add labels to the Elastic Compute Service (ECS) instances.
Node Tags	Add labels to the nodes in the node pool. The configuration in this topic is as follows. For more information about node tags, see Node Tag Description for ACK Scheduling GPU Usage. Enable GPU memory and computing power isolation on nodes: Click , and enter the Key of the first node tag as `ack.node.gpu.schedule`, and the Value as `core_mem`. Use the Binpack algorithm on nodes to select GPU cards for pods: Click , and enter the Key of the second node tag as `ack.node.gpu.placement`, and the Value as `binpack`.

Important

To switch existing GPU nodes in the cluster to computing power isolation mode, remove the node from the cluster and re-add it to a node pool that supports computing power isolation. Directly using the kubectl label nodes <NODE_NAME> ack.node.gpu.schedule=core_mem command for this purpose is not supported.

Step 2: Check whether the node pool has enabled the computing power allocation feature

To verify if the node pool has enabled the computing power allocation feature, execute the following command:

kubectl get nodes <NODE_NAME> -o yaml

Expected output:

# Irrelevant fields are not shown.
status:
  # Irrelevant fields are not shown.
  allocatable:
    # The node has a total computing power of 400%, with 4 GPU cards, each providing 100% computing power.
    aliyun.com/gpu-core.percentage: "400"
    aliyun.com/gpu-count: "4"
    # The node has a total of 60 GiB of video memory, with 4 GPU cards, each providing 15 GiB of video memory.
    aliyun.com/gpu-mem: "60"
  capacity:
    aliyun.com/gpu-core.percentage: "400"
    aliyun.com/gpu-count: "4"
    aliyun.com/gpu-mem: "60"

The output should include the aliyun.com/gpu-core.percentage field, indicating that the computing power allocation feature is enabled.

Step 3: Use the computing power allocation feature

Before using the computing power allocation feature, a pod can utilize 100% of a GPU card with 15 GiB of video memory. This example demonstrates a Task That Requests Both Video Memory And Computing Power, requesting 2 GiB of video memory and 30% of thecomputing power of a GPU card.

Create a job with the following YAML content to request GPU video memory and computing power.

apiVersion: batch/v1
kind: Job
metadata:
  name: cuda-sample
spec:
  parallelism: 1
  template:
    metadata:
      labels:
        app: cuda-sample
    spec:
      containers:
      - name: cuda-sample
        image:  registry.cn-beijing.aliyuncs.com/ai-samples/gpushare-sample:benchmark-tensorflow-2.2.3
        command:
        - bash
        - run.sh
        - --num_batches=500000000
        - --batch_size=8
        resources:
          limits:
            # The unit is GiB. This pod requests a total of 2 GiB of video memory.
            aliyun.com/gpu-mem: 2
            # Request 30% of the computing power of a GPU card.
            aliyun.com/gpu-core.percentage: 30
        workingDir: /root
      restartPolicy: Never

Deploy the cuda-sample job by executing the command below.
```
kubectl apply -f /tmp/cuda-sample.yaml
```
Note
The job's image is large and may take time to pull. Please be patient.
Check the running status of the cuda-sample job by executing the following command.
```
kubectl get po -l app=cuda-sample
```
Expected output:
```
NAME                READY   STATUS    RESTARTS   AGE
cuda-sample-m****   1/1     Running   0          15s
```
The output should show that the STATUS is Running, indicating successful deployment.

To check the usage of video memory and computing power, execute the command below.

kubectl exec -ti cuda-sample-m**** -- nvidia-smi

Expected output:

Thu Dec 16 02:53:22 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.102.04   Driver Version: 450.102.04   CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000000:00:08.0 Off |                    0 |
| N/A   33C    P0    56W / 300W |    337MiB /  2154MiB |     30%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

The output should indicate the following:

Video memory: Prior to enabling the computing power allocation feature, a pod could use the full 15 GiB of video memory on the GPU card. After enabling, the pod is currently using 337 MiB out of an allocated 2154 MiB (approximately 2 GiB), confirming effective video memory isolation.
Computing power: Before the feature, a pod could use the full 100% of the GPU card. After enabling, the pod is using 30% of the computing power, confirming effective computing power isolation.

Note

If there are n jobs, each requesting 30% of computing power (n≤3), and they are running on the same GPU card, executing the nvidia-smi command in each pod will show a computing power of n×30%. The nvidia-smi command currently only displays the computing power utilization rate per card.

View the pod log using the following command.

kubectl logs cuda-sample-m**** -f

Expected output:

[CUDA Bandwidth Test] - Starting...
Running on...

 Device 0: Tesla V100-SXM2-16GB
 Quick Mode

time: 2021-12-16/02:50:59,count: 0,memSize: 32000000,succeed to copy data from host to gpu
time: 2021-12-16/02:51:01,count: 1,memSize: 32000000,succeed to copy data from host to gpu
time: 2021-12-16/02:51:02,count: 2,memSize: 32000000,succeed to copy data from host to gpu
time: 2021-12-16/02:51:03,count: 3,memSize: 32000000,succeed to copy data from host to gpu

The output shows that after enabling the computing power allocation feature, the pod log refreshes more slowly, indicating that the computing power is limited to approximately 30% of the GPU card.

Optional: Delete the cuda-sample job by executing the command below.
After verification, you may delete the job.
```
kubectl delete job cuda-sample
```