How to enable GPU sharing in ACK Lingjun managed clusters - Container Service for Kubernetes

This topic describes how to use the GPU sharing feature to schedule and isolate GPU resources on the Lingjun nodes in a Container Service for Kubernetes (ACK) Lingjun managed cluster.

Prerequisites

An ACK Lingjun managed cluster is created and the cluster contains GPU-accelerated Lingjun nodes. For more information, see Create a Lingjun cluster with ACK activated.

Note

By default, the GPU sharing component is installed in ACK Lingjun managed clusters. You can add specific labels to GPU-accelerated nodes to enable GPU sharing. For more information, see Labels for enabling GPU scheduling policies.

Work with GPU sharing

GPU sharing is applicable in the following scenarios:

Enable GPU sharing without GPU memory isolation: In this scenario, multiple pods can share the same GPU, and the GPU memory allocated to a pod is not isolated from the GPU memory allocated to other pods. In this case, GPU memory contention is not addressed or may be addressed by upper-layer applications.
Enable GPU sharing with GPU memory isolation: In this scenario, multiple pods can share the same GPU, but the GPU memory allocated to a pod is isolated from the GPU memory allocated to other pods. This addresses GPU memory contention among pods.

Scenario 1: Enable GPU sharing with GPU memory isolation

In some scenarios, you may require GPU sharing without GPU memory isolation. Some workloads provide GPU memory isolation. For example, when you launch a Java application, you can configure Java options to specify the maximum amount of GPU memory that the application can use. In this case, if you install the GPU isolation module, resource contention may occur. To avoid this issue, you can enable GPU sharing without installing the GPU isolation module on some nodes.

Step 1: Enable GPU sharing for a node

Check whether the /etc/lingjun_metadata file exists on the node.
- If the file exists, run the nvidia-smi command. If an output is returned, the node is a Lingjun node. You can proceed to the next step.
- If the file does not exist, the node is not a Lingjun node. You cannot enable GPU sharing for the node. To enable GPU sharing in this case, you need to create a Lingjun node. For more information, see Overview of Lingjun node pools.
Run the following command to add the ack.node.gpu.schedule label to the node to enable GPU sharing:

kubectl label node <NODE_NAME> ack.node.gpu.schedule=share

Step 2: Use shared GPU resources

Create a file named tensorflow.yaml and copy the following code block to the file:

apiVersion: batch/v1
kind: Job
metadata:
  name: tensorflow-mnist-share
spec:
  parallelism: 1
  template:
    metadata:
      labels:
        app: tensorflow-mnist-share
    spec:     # The YAML template specifies a job that uses a TensorFlow MNIST dataset. The job creates one pod and the pod requests 4 GiB of GPU memory. 
      containers:
      - name: tensorflow-mnist-share
        image: registry.cn-beijing.aliyuncs.com/ai-samples/gpushare-sample:tensorflow-1.5
        command:
        - python
        - tensorflow-sample-code/tfjob/docker/mnist/main.py
        - --max_steps=100000
        - --data_dir=tensorflow-sample-code/data
        resources:     # If you want the pod to request 4 GiB of GPU memory, specify aliyun.com/gpu-mem: 4 in the resources.limits parameter of the pod configurations. 
          limits:
            aliyun.com/gpu-mem: 4  # The pod requests 4 GiB of GPU memory. 
        workingDir: /root
      restartPolicy: Never

Run the following command to submit the job:
```
kubectl apply -f tensorflow.yaml
```

Step 3: Verify GPU sharing without GPU memory isolation

Query the pod created by the job and run the following command:

kubectl get pod | grep tensorflow

kubectl exec -ti tensorflow-mnist-share-xxxxx -- nvidia-smi

Expected output:

Wed Jun 14 06:45:56 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.105.01   Driver Version: 515.105.01   CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000000:00:09.0 Off |                    0 |
| N/A   35C    P0    59W / 300W |    334MiB / 16384MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

The output indicates that the pod can use all memory provided by the GPU, which is 16,384 MiB in size. This means that GPU sharing is implemented without GPU memory isolation. If the GPU isolation module is installed, the memory size displayed in the output equals the amount of memory requested by the pod.

Important

In this example, a V100 GPU is used and 4 GiB of GPU memory is requested by the pod. The actual configurations depend on your environment.

The pod determines the amount of GPU memory that it can use based on the following environment variables:

ALIYUN_COM_GPU_MEM_CONTAINER=4 # The GPU memory available for the pod. 
ALIYUN_COM_GPU_MEM_DEV=16 # The memory size of each GPU.

To calculate the ratio of the GPU memory that the pod can use to the total GPU memory, use the following formula:

percetange = ALIYUN_COM_GPU_MEM_CONTAINER / ALIYUN_COM_GPU_MEM_DEV = 4 / 16 = 0.25

Scenario 2: Enable GPU sharing with GPU memory isolation

To ensure container stability, you must isolate the GPU resources that are allocated to each container. When you run multiple containers on one GPU, GPU resources are allocated to each container as requested. However, if one container occupies excessive GPU resources, the performance of other containers may be affected. To resolve this issue, many solutions are provided in the computing industry. For example, NVIDIA vGPU, Multi-Process Service (MPS), vCUDA, and eGPU enable fine-grained sharing of GPUs. The following section describes how to use eGPU.

Step 1: Enable GPU sharing for a node

Check whether the /etc/lingjun_metadata file exists on the node.
- If the file exists, run the nvidia-smi command. If an output is returned, the node is a Lingjun node. You can proceed to the next step.
- If the file does not exist, the node is not a Lingjun node. You cannot enable GPU sharing for the node. To enable GPU sharing in this case, you need to create a Lingjun node. For more information, see Overview of Lingjun node pools.
Run the following command to add the ack.node.gpu.schedule label to the node to enable GPU sharing:
```
kubectl label node <NODE_NAME> ack.node.gpu.schedule=egpu_mem
```

Note

If you set the value of the label to egpu_mem, only GPU memory isolation is enabled. In the preceding example, the value of the label is set to egpu_mem.
If you set the value of the label to egpu_core_mem, both GPU memory isolation and computing power isolation are enabled.
GPU computing power must be requested together with GPU memory. You can request only GPU memory separately.

Step 2: Use shared GPU resources

Wait until the GPU information is reported by the node.
Run the following command to query the resources provided by the node:

kubectl get node <NODE_NAME> -oyaml

Expected output:

  allocatable:
    aliyun.com/gpu-count: "1"
    aliyun.com/gpu-mem: "80"
    ...
    nvidia.com/gpu: "0"
    ...
  capacity:
    aliyun.com/gpu-count: "1"
    aliyun.com/gpu-mem: "80
    ...
    nvidia.com/gpu: "0"
    ...

The output indicates that the aliyun.com/gpu-mem resource is available and the node provides 1 GPU with 80 GB of memory.

Note

If you want to allocate an entire GPU to a pod, add the ack.gpushare.placement=require-whole-device label to the pod and specify the requested amount of GPU memory in gpu-mem. Then, a GPU that can provide the requested amount of GPU memory is automatically allocated to the pod.

Step 3: Run a job to verify GPU sharing

Use the following YAML file to submit a benchmarking job:

apiVersion: batch/v1
kind: Job
metadata:
  name: benchmark-job
spec:
  parallelism: 1
  template:
    spec:
      containers:
      - name: benchmark-job
        image: registry.cn-beijing.aliyuncs.com/ai-samples/gpushare-sample:benchmark-tensorflow-2.2.3
        command:
        - bash
        - run.sh
        - --num_batches=500000000
        - --batch_size=8
        resources:
          limits:
            aliyun.com/gpu-mem: 10  # The job requests 10 GB of memory. 
        workingDir: /root
      restartPolicy: Never
      hostNetwork: true
      tolerations:
        - operator: Exists

Run the following command to submit the job:
```
kubectl apply -f benchmark.yaml
```
After the pod runs, run the following command to access the pod:
```
kubectl exec -ti benchmark-job-xxxx bash
```

Run the following command in the pod to query the GPU isolation information:

vgpu-smi

Expected output:

+------------------------------------------------------------------------------+
|    VGPU_SMI 460.91.03     DRIVER_VERSION: 460.91.03     CUDA Version: 11.2   |
+-------------------------------------------+----------------------------------+
| GPU  Name                Bus-Id           |        Memory-Usage     GPU-Util |
|===========================================+==================================|
|   0  xxxxxxxx            00000000:00:07.0 |  8307MiB / 10782MiB   100% /  100% |
+-------------------------------------------+----------------------------------+

The output indicates that 10 GB of GPU memory is allocated to the pod.

FAQ

How do I check whether the GPU sharing component is installed in my cluster?

Run the following command to check whether the eGPU-based GPU sharing component is installed:

kubectl get ds -nkube-system | grep gpushare

Expected output:

NAME                                 DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                    AGE
gpushare-egpu-device-plugin-ds       0         0         0       0            0           <none>
gpushare-egpucore-device-plugin-ds   0         0         0       0            0           <none>

The output indicates that the GPU sharing component is installed.