You can request GPU memory and computing power for applications in Container Service for Kubernetes (ACK) Pro clusters. This feature allows you to use GPU memory and computing power in a finer-grained manner. This topic describes how to schedule shared GPU to allocate computing power.
Prerequisites
An ACK Pro cluster that runs Kubernetes 1.20 or later is created. For more information, see Create an ACK managed cluster. The version of kube-scheduler meets the requirement based on the cluster version. For more information about the features supported by each kube-scheduler version, see kube-scheduler.
ACK cluster version
Scheduler version
1.28
1.28.1-aliyun-5.6-998282b9 or later
1.26
v1.26.3-aliyun-4.1-a520c096 or later
1.24
1.24.3-ack-2.0 or later
1.22
1.22.15-ack-2.0 or later
1.20
1.20.4-ack-8.0 or later
The GPU sharing component is installed and the version of the installed Helm chart is later than 1.2.0. For more information about how to install the GPU sharing component, see Configure the GPU sharing component.
cGPU 1.0.5 or later is installed. For more information about how to update the cGPU version, see Update the cGPU version on a node.
Limits
GPU sharing supports jobs that request only GPU memory and jobs that request both GPU memory and computing power. However, you cannot deploy both types of jobs on a node at the same time. You can create only jobs that request only GPU memory or jobs that request both GPU memory and computing power on a node.
The following limits apply when you request computing power for jobs:
When you configure parameters to allocate the computing power of a GPU, the maximum value you can specify is 100, which indicates 100% of the computing power of the GPU. For example, a value of 20 indicates 20% of the computing power of the GPU.
The computing power value that you can specify must be a multiple of 5 and the minimum value is 5. If the value that you specify is not a multiple of 5, the job cannot be submitted.
Only the regions listed in the following table support the allocation of GPU memory and computing power. If you want to allocate GPU memory and computing power, make sure that your cluster resides in one of the following regions.
Region
Region ID
China (Beijing)
cn-beijing
China (Shanghai)
cn-shanghai
China (Hangzhou)
cn-hangzhou
China (Zhangjiakou)
cn-zhangjiakou
China (Shenzhen)
cn-shenzhen
China (Chengdu)
cn-chengdu
China (Heyuan)
cn-heyuan
China (Hong Kong)
cn-hongkong
Indonesia (Jakarta)
ap-southeast-5
Singapore
ap-southeast-1
Thailand (Bangkok)
ap-southeast-7
US (Virginia)
us-east-1
US (Silicon Valley)
us-west-1
Japan (Tokyo)
ap-northeast-1
China East 2 Finance
cn-shanghai-finance-1
The scheduler version that supports computing power allocation was released on March 1, 2022. Clusters that were created on March 1, 2022 or later use the latest scheduler version. The version of the scheduler used by clusters that were created before March 1, 2022 cannot be automatically updated. You must manually update the scheduler version. If your cluster was created before March 1, 2022, perform the following steps:
Submit a ticket to apply to join the private preview for shared GPU scheduling of the latest version.
Uninstall the GPU sharing component of the outdated version
If the Helm chart version of the installed GPU sharing component is 1.2.0 or earlier, the version of the GPU sharing component is outdated and supports only memory sharing. Perform the following steps to uninstall the outdated version:
Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side navigation pane, choose .
On the Helm page, find ack-ai-installer and click Delete in the Actions column. In the Delete dialog box, click OK.
Install the GPU sharing component of the latest version. For more information, see Configure the GPU sharing component.
Step 1: Create a node pool that supports computing power allocation
Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side navigation pane, choose .
In the upper-right corner of the Node Pools page, click Create Node Pool.
The following table describes some of the parameters used to configure the node pool. For more information, see Step 6 of the "Create an ACK Pro cluster" topic.
Parameter
Description
Node Pool Name
The name for the node pool. In this example, gpu-core is used.
Expected Nodes
The initial number of nodes in the node pool. If you do not want to create nodes in the node pool, set this parameter to 0.
ECS Tags
The labels that you want to add to the Elastic Compute Service (ECS) instances in the node pool.
Node Label
The labels that you want to add to the nodes in the node pool. The following configurations are used in this topic. For more information about node labels, see Labels for enabling GPU scheduling policies.
To enable GPU memory isolation and computing power isolation, click the icon and set the key to ack.node.gpu.schedule and value to core_mem.
To use the binpack algorithm to select GPUs for pods, click the icon and then set the key to ack.node.gpu.placement and value to binpack.
ImportantIf you want to enable computing power isolation for existing GPU-accelerated nodes in the cluster, you must first remove the nodes from the cluster and then add the nodes to a node pool that supports computing power isolation. You cannot directly run the
kubectl label nodes <NODE_NAME> ack.node.gpu.schedule=core_mem
command to enable computing power isolation for existing GPU-accelerated nodes.
Step 2: Check whether computing power allocation is enabled for the node pool
Run the following command to check whether computing power allocation is enabled for the nodes in the node pool:
kubectl get nodes <NODE_NAME> -o yaml
Expected output:
# Irrelevant fields are omitted.
status:
# Irrelevant fields are omitted.
allocatable:
# The nodes have 4 GPUs, which provide 400% of computing power in total. Each GPU provides 100% of computing power.
aliyun.com/gpu-core.percentage: "400"
aliyun.com/gpu-count: "4"
# The nodes have 4 GPUs, which provide 60 GiB of memory in total. Each GPU provides 15 GiB of memory.
aliyun.com/gpu-mem: "60"
capacity:
aliyun.com/gpu-core.percentage: "400"
aliyun.com/gpu-count: "4"
aliyun.com/gpu-mem: "60"
The output contains the aliyun.com/gpu-core.percentage
field, which indicates that computing power allocation is enabled.
Step 3: Use the computing power allocation feature
If you do not enable computing power allocation, a pod can use 100% of the computing power of a GPU. In this example, the memory of a GPU is 15 GiB. The following steps show how to create a job that requests both GPU memory and computing power. The job requests 2 GiB of GPU memory and 30% of the computing power of the GPU.
Use the following YAML template to create a job that requests both GPU memory computing power:
apiVersion: batch/v1 kind: Job metadata: name: cuda-sample spec: parallelism: 1 template: metadata: labels: app: cuda-sample spec: containers: - name: cuda-sample image: registry.cn-beijing.aliyuncs.com/ai-samples/gpushare-sample:benchmark-tensorflow-2.2.3 command: - bash - run.sh - --num_batches=500000000 - --batch_size=8 resources: limits: # Apply for 2 GiB of GPU memory. aliyun.com/gpu-mem: 2 # Apply for 30% of the computing power of the GPU. aliyun.com/gpu-core.percentage: 30 workingDir: /root restartPolicy: Never
Run the following command to deploy the cuda-sample.yaml file and submit the cuda-sample job:
kubectl apply -f /tmp/cuda-sample.yaml
NoteThe image used by the job is large in size. Therefore, the image pulling process may be time-consuming.
Run the following command to query the cuda-sample job:
kubectl get po -l app=cuda-sample
Expected output:
NAME READY STATUS RESTARTS AGE cuda-sample-m**** 1/1 Running 0 15s
In the output,
Running
is displayed in theSTATUS
column, which indicates that the job is deployed.Run the following command to query the amount of GPU memory and computing power used by the pod that is provisioned for the job:
kubectl exec -ti cuda-sample-m**** -- nvidia-smi
Expected output:
Thu Dec 16 02:53:22 2021 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 450.102.04 Driver Version: 450.102.04 CUDA Version: 11.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla V100-SXM2... On | 00000000:00:08.0 Off | 0 | | N/A 33C P0 56W / 300W | 337MiB / 2154MiB | 30% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| +-----------------------------------------------------------------------------+
The output indicates the following information:
GPU memory: Before you enable computing power allocation, the pod can use 100% of the memory provided by the GPU. In this example, the total amount of memory provided by the GPU is 15 GiB. You can run the
nvidia-smi
command on the node to query the total amount of memory provided by the GPU. After you enable computing power allocation, the amount of memory that the pod uses is 337 MiB and the total amount of memory that the pod can use is 2,154 MiB, which is about 2 GiB. This indicates that memory isolation is enabled.Computing power: Before you enable computing power allocation, the pod can use 100% of the computing power of the GPU. You can set the requested amount to 100 to verify that the pod can use 100% of the computing power. After you enable computing power allocation, the pod uses 30% of the computing power of the GPU. This indicates that computing power isolation is enabled.
NoteFor example, a number of n jobs are created. Each job requests 30% of the computing power and the value of n is no greater than 3. The jobs are scheduled to one GPU. If you log on to the pods of the jobs and run the
nvidia-smi
command, the output shows that the pods use n × 30% of the computing power. The output of thenvidia-smi
command shows only the computing power utilization per GPU. The command does not show the computing power utilization per job.Run the following command to display the pod log:
kubectl logs cuda-sample-m**** -f
Expected output:
[CUDA Bandwidth Test] - Starting... Running on... Device 0: Tesla V100-SXM2-16GB Quick Mode time: 2021-12-16/02:50:59,count: 0,memSize: 32000000,succeed to copy data from host to gpu time: 2021-12-16/02:51:01,count: 1,memSize: 32000000,succeed to copy data from host to gpu time: 2021-12-16/02:51:02,count: 2,memSize: 32000000,succeed to copy data from host to gpu time: 2021-12-16/02:51:03,count: 3,memSize: 32000000,succeed to copy data from host to gpu
The output shows that the pod log is generated at a lower rate after you enable computing power allocation. This is because each pod can use only about 30% of the computing power of the GPU.
Optional: Run the following command to delete the cuda-sample job:
After you verify that computing power allocation works as expected, you can delete the job.
kubectl delete job cuda-sample