This topic describes how to use the GPU sharing feature to schedule and isolate GPU resources on the Lingjun nodes in a Container Service for Kubernetes (ACK) Lingjun managed cluster.
Prerequisites
An ACK Lingjun managed cluster is created and the cluster contains GPU-accelerated Lingjun nodes. For more information, see Create a Lingjun cluster with ACK activated.
By default, the GPU sharing component is installed in ACK Lingjun managed clusters. You can add specific labels to GPU-accelerated nodes to enable GPU sharing. For more information, see Labels for enabling GPU scheduling policies.
Work with GPU sharing
GPU sharing is applicable in the following scenarios:
Enable GPU sharing without GPU memory isolation: In this scenario, multiple pods can share the same GPU, and the GPU memory allocated to a pod is not isolated from the GPU memory allocated to other pods. In this case, GPU memory contention is not addressed or may be addressed by upper-layer applications.
Enable GPU sharing with GPU memory isolation: In this scenario, multiple pods can share the same GPU, but the GPU memory allocated to a pod is isolated from the GPU memory allocated to other pods. This addresses GPU memory contention among pods.
Scenario 1: Enable GPU sharing with GPU memory isolation
In some scenarios, you may require GPU sharing without GPU memory isolation. Some workloads provide GPU memory isolation. For example, when you launch a Java application, you can configure Java options to specify the maximum amount of GPU memory that the application can use. In this case, if you install the GPU isolation module, resource contention may occur. To avoid this issue, you can enable GPU sharing without installing the GPU isolation module on some nodes.
Step 1: Enable GPU sharing for a node
Check whether the
/etc/lingjun_metadata
file exists on the node.If the file exists, run the
nvidia-smi
command. If an output is returned, the node is a Lingjun node. You can proceed to the next step.If the file does not exist, the node is not a Lingjun node. You cannot enable GPU sharing for the node. To enable GPU sharing in this case, you need to create a Lingjun node. For more information, see Overview of Lingjun node pools.
Run the following command to add the
ack.node.gpu.schedule
label to the node to enable GPU sharing:
kubectl label node <NODE_NAME> ack.node.gpu.schedule=share
Step 2: Use shared GPU resources
Create a file named
tensorflow.yaml
and copy the following code block to the file:apiVersion: batch/v1 kind: Job metadata: name: tensorflow-mnist-share spec: parallelism: 1 template: metadata: labels: app: tensorflow-mnist-share spec: # The YAML template specifies a job that uses a TensorFlow MNIST dataset. The job creates one pod and the pod requests 4 GiB of GPU memory. containers: - name: tensorflow-mnist-share image: registry.cn-beijing.aliyuncs.com/ai-samples/gpushare-sample:tensorflow-1.5 command: - python - tensorflow-sample-code/tfjob/docker/mnist/main.py - --max_steps=100000 - --data_dir=tensorflow-sample-code/data resources: # If you want the pod to request 4 GiB of GPU memory, specify aliyun.com/gpu-mem: 4 in the resources.limits parameter of the pod configurations. limits: aliyun.com/gpu-mem: 4 # The pod requests 4 GiB of GPU memory. workingDir: /root restartPolicy: Never
Run the following command to submit the job:
kubectl apply -f tensorflow.yaml
Step 3: Verify GPU sharing without GPU memory isolation
Query the pod created by the job and run the following command:
kubectl get pod | grep tensorflow
kubectl exec -ti tensorflow-mnist-share-xxxxx -- nvidia-smi
Expected output:
Wed Jun 14 06:45:56 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.105.01 Driver Version: 515.105.01 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... On | 00000000:00:09.0 Off | 0 |
| N/A 35C P0 59W / 300W | 334MiB / 16384MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
The output indicates that the pod can use all memory provided by the GPU, which is 16,384 MiB in size. This means that GPU sharing is implemented without GPU memory isolation. If the GPU isolation module is installed, the memory size displayed in the output equals the amount of memory requested by the pod.
In this example, a V100 GPU is used and 4 GiB of GPU memory is requested by the pod. The actual configurations depend on your environment.
The pod determines the amount of GPU memory that it can use based on the following environment variables:
ALIYUN_COM_GPU_MEM_CONTAINER=4 # The GPU memory available for the pod.
ALIYUN_COM_GPU_MEM_DEV=16 # The memory size of each GPU.
To calculate the ratio of the GPU memory that the pod can use to the total GPU memory, use the following formula:
percetange = ALIYUN_COM_GPU_MEM_CONTAINER / ALIYUN_COM_GPU_MEM_DEV = 4 / 16 = 0.25
Scenario 2: Enable GPU sharing with GPU memory isolation
To ensure container stability, you must isolate the GPU resources that are allocated to each container. When you run multiple containers on one GPU, GPU resources are allocated to each container as requested. However, if one container occupies excessive GPU resources, the performance of other containers may be affected. To resolve this issue, many solutions are provided in the computing industry. For example, NVIDIA vGPU, Multi-Process Service (MPS), vCUDA, and eGPU enable fine-grained sharing of GPUs. The following section describes how to use eGPU.
Step 1: Enable GPU sharing for a node
Check whether the
/etc/lingjun_metadata
file exists on the node.If the file exists, run the
nvidia-smi
command. If an output is returned, the node is a Lingjun node. You can proceed to the next step.If the file does not exist, the node is not a Lingjun node. You cannot enable GPU sharing for the node. To enable GPU sharing in this case, you need to create a Lingjun node. For more information, see Overview of Lingjun node pools.
Run the following command to add the
ack.node.gpu.schedule
label to the node to enable GPU sharing:kubectl label node <NODE_NAME> ack.node.gpu.schedule=egpu_mem
If you set the value of the label to egpu_mem, only GPU memory isolation is enabled. In the preceding example, the value of the label is set to egpu_mem.
If you set the value of the label to egpu_core_mem, both GPU memory isolation and computing power isolation are enabled.
GPU computing power must be requested together with GPU memory. You can request only GPU memory separately.
Step 2: Use shared GPU resources
Wait until the GPU information is reported by the node.
Run the following command to query the resources provided by the node:
kubectl get node <NODE_NAME> -oyaml
Expected output:
allocatable:
aliyun.com/gpu-count: "1"
aliyun.com/gpu-mem: "80"
...
nvidia.com/gpu: "0"
...
capacity:
aliyun.com/gpu-count: "1"
aliyun.com/gpu-mem: "80
...
nvidia.com/gpu: "0"
...
The output indicates that the aliyun.com/gpu-mem
resource is available and the node provides 1 GPU with 80 GB of memory.
If you want to allocate an entire GPU to a pod, add the ack.gpushare.placement=require-whole-device
label to the pod and specify the requested amount of GPU memory in gpu-mem
. Then, a GPU that can provide the requested amount of GPU memory is automatically allocated to the pod.
Step 3: Run a job to verify GPU sharing
Use the following YAML file to submit a benchmarking job:
apiVersion: batch/v1 kind: Job metadata: name: benchmark-job spec: parallelism: 1 template: spec: containers: - name: benchmark-job image: registry.cn-beijing.aliyuncs.com/ai-samples/gpushare-sample:benchmark-tensorflow-2.2.3 command: - bash - run.sh - --num_batches=500000000 - --batch_size=8 resources: limits: aliyun.com/gpu-mem: 10 # The job requests 10 GB of memory. workingDir: /root restartPolicy: Never hostNetwork: true tolerations: - operator: Exists
Run the following command to submit the job:
kubectl apply -f benchmark.yaml
After the pod runs, run the following command to access the pod:
kubectl exec -ti benchmark-job-xxxx bash
Run the following command in the pod to query the GPU isolation information:
vgpu-smi
Expected output:
+------------------------------------------------------------------------------+ | VGPU_SMI 460.91.03 DRIVER_VERSION: 460.91.03 CUDA Version: 11.2 | +-------------------------------------------+----------------------------------+ | GPU Name Bus-Id | Memory-Usage GPU-Util | |===========================================+==================================| | 0 xxxxxxxx 00000000:00:07.0 | 8307MiB / 10782MiB 100% / 100% | +-------------------------------------------+----------------------------------+
The output indicates that 10 GB of GPU memory is allocated to the pod.
FAQ
How do I check whether the GPU sharing component is installed in my cluster?
Run the following command to check whether the eGPU-based GPU sharing component is installed:
kubectl get ds -nkube-system | grep gpushare
Expected output:
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
gpushare-egpu-device-plugin-ds 0 0 0 0 0 <none>
gpushare-egpucore-device-plugin-ds 0 0 0 0 0 <none>
The output indicates that the GPU sharing component is installed.