This topic describes how to deploy a YAML file to create containers that share one GPU. After you deploy the file, you can use GPU sharing to isolate the GPU memory that is allocated to each container. This improves GPU resource utilization.
Table of contents
Prerequisites
Usage notes
For GPU nodes that are managed in Container Service for Kubernetes (ACK) clusters, you need to pay attention to the following items when you request GPU resources for applications and use GPU resources.
Do not run GPU-heavy applications directly on nodes.
Do not use tools, such as
Docker
,Podman
, ornerdctl
, to create containers and request GPU resources for the containers. For example, do not run thedocker run --gpus all
ordocker run -e NVIDIA_VISIBLE_DEVICES=all
command and run GPU-heavy applications.Do not add the
NVIDIA_VISIBLE_DEVICES=all
orNVIDIA_VISIBLE_DEVICES=<GPU ID>
environment variable to theenv
section in the pod YAML file. Do not use theNVIDIA_VISIBLE_DEVICES
environment variable to request GPU resources for pods and run GPU-heavy applications.Do not set
NVIDIA_VISIBLE_DEVICES=all
and run GPU-heavy applications when you build container images if theNVIDIA_VISIBLE_DEVICES
environment variable is not specified in the pod YAML file.Do not add
privileged: true
to thesecurityContext
section in the pod YAML file and run GPU-heavy applications.
The following potential risks may exist when you use the preceding methods to request GPU resources for your application:
If you use one of the preceding methods to request GPU resources on a node but do not specify the details in the device resource ledger of the scheduler, the actual GPU resource allocation information may be different from that in the device resource ledger of the scheduler. In this scenario, the scheduler can still schedule certain pods that request the GPU resources to the node. As a result, your applications may compete for resources provided by the same GPU, such as requesting resources from the same GPU, and some applications may fail to start up due to insufficient GPU resources.
Using the preceding methods may also cause other unknown issues, such as the issues reported by the NVIDIA community.
Procedure
Run the following command to query information about GPU sharing in your cluster:
kubectl inspect cgpu
NAME IPADDRESS GPU0(Allocated/Total) GPU1(Allocated/Total) GPU Memory(GiB) cn-shanghai.192.168.0.4 192.168.0.4 0/7 0/7 0/14 --------------------------------------------------------------------- Allocated/Total GPU Memory In Cluster: 0/14 (0%)
NoteTo query detailed information about GPU sharing, run the kubectl inspect cgpu -d command.
Deploy a sample application that has GPU sharing enabled and request 3 GiB of GPU memory for the application.
apiVersion: batch/v1 kind: Job metadata: name: gpu-share-sample spec: parallelism: 1 template: metadata: labels: app: gpu-share-sample spec: containers: - name: gpu-share-sample image: registry.cn-hangzhou.aliyuncs.com/ai-samples/gpushare-sample:tensorflow-1.5 command: - python - tensorflow-sample-code/tfjob/docker/mnist/main.py - --max_steps=100000 - --data_dir=tensorflow-sample-code/data resources: limits: # The pod requests 3 GiB of GPU memory in total. aliyun.com/gpu-mem: 3 # Specify the requested amount of GPU memory. workingDir: /root restartPolicy: Never
Run the following command to query the memory usage of the GPU:
kubectl inspect cgpu
Expected output
NAME IPADDRESS GPU0(Allocated/Total) GPU Memory(GiB) cn-beijing.192.168.1.105 192.168.1.105 3/14 3/14 --------------------------------------------------------------------- Allocated/Total GPU Memory In Cluster: 3/14 (21%)
The output shows that the total GPU memory of the
cn-beijing.192.168.1.105
node is 14 GiB and 3 GiB of GPU memory is allocated.
Verify GPU memory isolation
You can use the following method to check whether GPU memory isolation is enabled for the node.
Log on to the control plane.
Run the following command to print the log of the deployed application to check whether GPU memory isolation is enabled:
kubectl logs gpu-share-sample --tail=1
Expected output:
2023-08-07 09:08:13.931003: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2832 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:07.0, compute capability: 7.5)
The output indicates that 2,832 MiB of GPU memory is requested by the container.
Run the following command to log on to the container and view the amount of GPU memory that is allocated to the container:
kubectl exec -it gpu-share-sample nvidia-smi
Expected output:
Mon Aug 7 08:52:18 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 418.87.01 Driver Version: 418.87.01 CUDA Version: 10.1 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla T4 On | 00000000:00:07.0 Off | 0 | | N/A 41C P0 26W / 70W | 3043MiB / 3231MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| +-----------------------------------------------------------------------------+
The output indicates that the amount of GPU memory allocated to the container is 3,231 MiB.
Run the following command to query the total GPU memory of the GPU-accelerated node where the application is deployed.
nvidia-smi
Expected output:
Mon Aug 7 09:18:26 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 418.87.01 Driver Version: 418.87.01 CUDA Version: 10.1 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla T4 On | 00000000:00:07.0 Off | 0 | | N/A 40C P0 26W / 70W | 3053MiB / 15079MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 8796 C python3 3043MiB | +-----------------------------------------------------------------------------+
The output indicates that the total GPU memory of the node is 15,079 MiB and 3,053 MiB of GPU memory is allocated to the container.