Examples of using cGPU to share GPUs - Container Service for Kubernetes

This topic describes how to deploy a YAML file to create containers that share one GPU. After you deploy the file, you can use GPU sharing to isolate the GPU memory that is allocated to each container. This improves GPU resource utilization.

Prerequisites
Procedure
Verify GPU memory isolation

Prerequisites

ack-ai-installer and the GPU inspection tool are installed.

Usage notes

For GPU nodes that are managed in Container Service for Kubernetes (ACK) clusters, you need to pay attention to the following items when you request GPU resources for applications and use GPU resources.

Do not run GPU-heavy applications directly on nodes.
Do not use tools, such as Docker, Podman, or nerdctl, to create containers and request GPU resources for the containers. For example, do not run the docker run --gpus all or docker run -e NVIDIA_VISIBLE_DEVICES=all command and run GPU-heavy applications.
Do not add the NVIDIA_VISIBLE_DEVICES=all or NVIDIA_VISIBLE_DEVICES=<GPU ID> environment variable to the env section in the pod YAML file. Do not use the NVIDIA_VISIBLE_DEVICES environment variable to request GPU resources for pods and run GPU-heavy applications.
Do not set NVIDIA_VISIBLE_DEVICES=all and run GPU-heavy applications when you build container images if the NVIDIA_VISIBLE_DEVICES environment variable is not specified in the pod YAML file.
Do not add privileged: true to the securityContext section in the pod YAML file and run GPU-heavy applications.

The following potential risks may exist when you use the preceding methods to request GPU resources for your application:

If you use one of the preceding methods to request GPU resources on a node but do not specify the details in the device resource ledger of the scheduler, the actual GPU resource allocation information may be different from that in the device resource ledger of the scheduler. In this scenario, the scheduler can still schedule certain pods that request the GPU resources to the node. As a result, your applications may compete for resources provided by the same GPU, such as requesting resources from the same GPU, and some applications may fail to start up due to insufficient GPU resources.
Using the preceding methods may also cause other unknown issues, such as the issues reported by the NVIDIA community.

Procedure

Run the following command to query information about GPU sharing in your cluster:

kubectl inspect cgpu

NAME                     IPADDRESS    GPU0(Allocated/Total)  GPU1(Allocated/Total)  GPU Memory(GiB)
cn-shanghai.192.168.0.4  192.168.0.4  0/7                    0/7                    0/14
---------------------------------------------------------------------
Allocated/Total GPU Memory In Cluster:
0/14 (0%)

Note

To query detailed information about GPU sharing, run the kubectl inspect cgpu -d command.

Deploy a sample application that has GPU sharing enabled and request 3 GiB of GPU memory for the application.

apiVersion: batch/v1
kind: Job
metadata:
  name: gpu-share-sample
spec:
  parallelism: 1
  template:
    metadata:
      labels:
        app: gpu-share-sample
    spec:
      containers:
      - name: gpu-share-sample
        image: registry.cn-hangzhou.aliyuncs.com/ai-samples/gpushare-sample:tensorflow-1.5
        command:
        - python
        - tensorflow-sample-code/tfjob/docker/mnist/main.py
        - --max_steps=100000
        - --data_dir=tensorflow-sample-code/data
        resources:
          limits:
            # The pod requests 3 GiB of GPU memory in total. 
            aliyun.com/gpu-mem: 3 # Specify the requested amount of GPU memory. 
        workingDir: /root
      restartPolicy: Never

Run the following command to query the memory usage of the GPU:

kubectl inspect cgpu

Expected output

NAME                      IPADDRESS      GPU0(Allocated/Total)  GPU Memory(GiB)
cn-beijing.192.168.1.105  192.168.1.105  3/14                   3/14
---------------------------------------------------------------------
Allocated/Total GPU Memory In Cluster:
3/14 (21%)

The output shows that the total GPU memory of the cn-beijing.192.168.1.105 node is 14 GiB and 3 GiB of GPU memory is allocated.

Verify GPU memory isolation

You can use the following method to check whether GPU memory isolation is enabled for the node.

Log on to the control plane.

Run the following command to print the log of the deployed application to check whether GPU memory isolation is enabled:

kubectl logs gpu-share-sample --tail=1

Expected output:

2023-08-07 09:08:13.931003: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2832 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:07.0, compute capability: 7.5)

The output indicates that 2,832 MiB of GPU memory is requested by the container.

Run the following command to log on to the container and view the amount of GPU memory that is allocated to the container:

kubectl exec -it gpu-share-sample nvidia-smi

Expected output:

Mon Aug 7 08:52:18 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.01    Driver Version: 418.87.01    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00000000:00:07.0 Off |                    0 |
| N/A   41C    P0    26W /  70W |   3043MiB /  3231MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

The output indicates that the amount of GPU memory allocated to the container is 3,231 MiB.

Run the following command to query the total GPU memory of the GPU-accelerated node where the application is deployed.

nvidia-smi

Expected output:

Mon Aug  7 09:18:26 2023 
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.01    Driver Version: 418.87.01    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00000000:00:07.0 Off |                    0 |
| N/A   40C    P0    26W /  70W |   3053MiB / 15079MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      8796      C   python3                                     3043MiB |
+-----------------------------------------------------------------------------+

The output indicates that the total GPU memory of the node is 15,079 MiB and 3,053 MiB of GPU memory is allocated to the container.

Container Service for Kubernetes:Examples of using GPU sharing to share GPUs

Table of contents

Prerequisites

Usage notes

Procedure

Verify GPU memory isolation