All Products
Search
Document Center

Container Service for Kubernetes:Work with GPU sharing

Last Updated:Dec 25, 2025

This topic describes how to use the GPU sharing feature to schedule and isolate GPU resources on the Lingjun nodes in a Container Service for Kubernetes (ACK) Lingjun managed cluster.

Prerequisites

An ACK Lingjun managed cluster is created and the cluster contains GPU-accelerated Lingjun nodes. For more information, see Create a Lingjun cluster with ACK activated.

Note

By default, the GPU sharing component is installed in ACK Lingjun managed clusters. You can add specific labels to GPU-accelerated nodes to enable GPU sharing. For more information, see Labels for enabling GPU scheduling policies.

Using shared GPU scheduling

GPU sharing is applicable in the following scenarios:

  • Enable GPU sharing without GPU memory isolation: In this scenario, multiple pods can share the same GPU, and the GPU memory allocated to a pod is not isolated from the GPU memory allocated to other pods. In this case, GPU memory contention is not addressed or may be addressed by upper-layer applications.

  • Enable GPU sharing with GPU memory isolation: In this scenario, multiple pods can share the same GPU, but the GPU memory allocated to a pod is isolated from the GPU memory allocated to other pods. This addresses GPU memory contention among pods.

Scenario 1: Sharing without isolation

In some scenarios, you may require GPU sharing without GPU memory isolation. Some workloads provide GPU memory isolation. For example, when you launch a Java application, you can configure Java options to specify the maximum amount of GPU memory that the application can use. In this case, if you install the GPU isolation module, resource contention may occur. To avoid this issue, you can enable GPU sharing without installing the GPU isolation module on some nodes.

Step 1: Enable GPU sharing for a node

  1. Check if the /etc/lingjun_metadata file exists.

    • If the file exists, run the nvidia-smi command. If the output is normal, the node is a Lingjun node and you can proceed to the next step.

    • If the file does not exist, the node is not a Lingjun node. You cannot enable GPU sharing for the node. To enable GPU sharing in this case, you need to create a Lingjun node. For more information, see Overview of Lingjun node pools.

  2. Run the following command to add the ack.node.gpu.schedule label to the node to enable GPU sharing.

    kubectl label node <NODE_NAME> ack.node.gpu.schedule=share

Step 2: Use shared GPU resources

  1. Create a tensorflow.yaml file using the following example.

    apiVersion: batch/v1
    kind: Job
    metadata:
      name: tensorflow-mnist-share
    spec:
      parallelism: 1
      template:
        metadata:
          labels:
            app: tensorflow-mnist-share
        spec:     # The YAML template specifies a job that uses a TensorFlow MNIST dataset. The job creates one pod and the pod requests 4 GiB of GPU memory. 
          containers:
          - name: tensorflow-mnist-share
            image: registry.cn-beijing.aliyuncs.com/ai-samples/gpushare-sample:tensorflow-1.5
            command:
            - python
            - tensorflow-sample-code/tfjob/docker/mnist/main.py
            - --max_steps=100000
            - --data_dir=tensorflow-sample-code/data
            resources:     # If you want the pod to request 4 GiB of GPU memory, specify aliyun.com/gpu-mem: 4 in the resources.limits parameter of the pod configurations. 
              limits:
                aliyun.com/gpu-mem: 4  # The pod requests 4 GiB of GPU memory. 
            workingDir: /root
          restartPolicy: Never
  2. Run the following command to submit the job:

    kubectl apply -f tensorflow.yaml

Step 3: Verify GPU sharing without GPU memory isolation

Query the pod created by the job and run the following command:

kubectl get pod | grep tensorflow

kubectl exec -ti tensorflow-mnist-share-xxxxx -- nvidia-smi

Expected output:

Wed Jun 14 06:45:56 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.105.01   Driver Version: 515.105.01   CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000000:00:09.0 Off |                    0 |
| N/A   35C    P0    59W / 300W |    334MiB / 16384MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

The output indicates that the pod can use all memory provided by the GPU, which is 16,384 MiB in size. This means that GPU sharing is implemented without GPU memory isolation. If the GPU isolation module is installed, the memory size displayed in the output equals the amount of memory requested by the pod.

Important

In this example, a V100 GPU is used and 4 GiB of GPU memory is requested by the pod. The actual configurations depend on your environment.

The application needs to read the value of its usable GPU memory from two environment variables.

ALIYUN_COM_GPU_MEM_CONTAINER=4 # The GPU memory available for the pod. 
ALIYUN_COM_GPU_MEM_DEV=16 # The memory size of each GPU.

You can calculate the percentage of total GPU memory used by the application from the two environment variables mentioned above.

percetange = ALIYUN_COM_GPU_MEM_CONTAINER / ALIYUN_COM_GPU_MEM_DEV = 4 / 16 = 0.25

Scenario 2: Enable GPU sharing with GPU memory isolation

To ensure container stability, you must isolate the GPU resources that are allocated to each container. When you run multiple containers on one GPU, GPU resources are allocated to each container as requested. However, if one container occupies excessive GPU resources, the performance of other containers may be affected. To resolve this issue, many solutions are provided in the computing industry. For example, NVIDIA vGPU, Multi-Process Service (MPS), vCUDA, and eGPU enable fine-grained sharing of GPUs. The following section describes how to use eGPU.

Step 1: Enable GPU sharing for a node

  1. Check if the /etc/lingjun_metadata file exists.

    • If the file exists, run the nvidia-smi command. If the output is normal, the node is a Lingjun node and you can proceed to the next step.

    • If the file does not exist, the node is not a Lingjun node. You cannot enable GPU sharing for the node. To enable GPU sharing in this case, you need to create a Lingjun node. For more information, see Overview of Lingjun node pools.

  2. Run the following command to add the ack.node.gpu.schedule label to the node to enable GPU sharing.

    kubectl label node <NODE_NAME> ack.node.gpu.schedule=egpu_mem
Note
  • If you set the value of the label to egpu_mem, only GPU memory isolation is enabled. In the preceding example, the value of the label is set to egpu_mem.

  • If you set the value of the label to egpu_core_mem, both GPU memory isolation and computing power isolation are enabled.

  • GPU computing power must be requested together with GPU memory. You can request only GPU memory separately.

Step 2: Use shared GPU resources

  1. Wait until the GPU information is reported by the node.

  2. Run the following command to query the resources provided by the node:

kubectl get node <NODE_NAME> -oyaml

Expected output:

  allocatable:
    aliyun.com/gpu-count: "1"
    aliyun.com/gpu-mem: "80"
    ...
    nvidia.com/gpu: "0"
    ...
  capacity:
    aliyun.com/gpu-count: "1"
    aliyun.com/gpu-mem: "80"
    ...
    nvidia.com/gpu: "0"
    ...

The expected output shows that aliyun.com/gpu-mem exists in the node's resource list. The node has one GPU card with a total of 80 GB of GPU memory.

Note

If a pod needs to be scheduled to and use a whole device, add the ack.gpushare.placement=require-whole-device label to the pod. Then, specify the amount of GPU memory using gpu-mem. The pod is then scheduled to a whole GPU that has the specified amount of memory.

Step 3: Run a job to verify GPU sharing

  1. Use the following YAML file to submit a benchmarking job:

    apiVersion: batch/v1
    kind: Job
    metadata:
      name: benchmark-job
    spec:
      parallelism: 1
      template:
        spec:
          containers:
          - name: benchmark-job
            image: registry.cn-beijing.aliyuncs.com/ai-samples/gpushare-sample:benchmark-tensorflow-2.2.3
            command:
            - bash
            - run.sh
            - --num_batches=500000000
            - --batch_size=8
            resources:
              limits:
                aliyun.com/gpu-mem: 10  # The job requests 10 GB of memory. 
            workingDir: /root
          restartPolicy: Never
          hostNetwork: true
          tolerations:
            - operator: Exists
  2. Run the following command to submit the job:

    kubectl apply -f benchmark.yaml
  3. After the pod runs, run the following command to access the pod:

    kubectl exec -ti benchmark-job-xxxx bash
  4. Run the following command in the pod to query the GPU isolation information:

    vgpu-smi

    Expected output:

    +------------------------------------------------------------------------------+
    |    VGPU_SMI 460.91.03     DRIVER_VERSION: 460.91.03     CUDA Version: 11.2   |
    +-------------------------------------------+----------------------------------+
    | GPU  Name                Bus-Id           |        Memory-Usage     GPU-Util |
    |===========================================+==================================|
    |   0  xxxxxxxx            00000000:00:07.0 |  8307MiB / 10782MiB   100% /  100% |
    +-------------------------------------------+----------------------------------+

    The output indicates that 10 GB of GPU memory is allocated to the pod.

FAQ

How do I check whether the GPU sharing component is installed in my cluster?

Run the following command to check whether the eGPU-based GPU sharing component is installed:

kubectl get ds -nkube-system | grep gpushare

Expected output:

NAME                                 DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                    AGE
gpushare-egpu-device-plugin-ds       0         0         0       0            0           <none>
gpushare-egpucore-device-plugin-ds   0         0         0       0            0           <none>

The output indicates that the GPU sharing component is installed.