This topic introduces the GPU sharing solution provided by Alibaba Cloud, describes the benefits of GPU Sharing Professional Edition, and compares the features and use scenarios of GPU Sharing Basic Edition and GPU Sharing Professional Edition. This helps you better understand and use GPU sharing.
Background information
GPU sharing allows you to run multiple containers on one GPU. After Container Service for Kubernetes (ACK) makes GPU sharing open source, you can use GPU sharing in Kubernetes clusters that run on Alibaba Cloud, AWS, or Google Compute Engine (GCE) or run GPU sharing in self-managed Kubernetes clusters. GPU sharing reduces the expenses on GPUs. However, when you run multiple containers on one GPU, the stability of the containers cannot be guaranteed.
To ensure container stability, you must isolate the GPU resources that are allocated to each container. When you run multiple containers on one GPU, GPU resources are allocated to each container as requested. However, if one container occupies excessive GPU resources, the performance of other containers may be affected. To solve this issue, many solutions are provided in the computing industry. For example, NVIDIA vGPU, Multi-Process Service (MPS), and vCUDA enable fine-grained sharing of GPUs.
ACK provides the GPU sharing solution to meet the preceding requirements. GPU sharing enables a GPU to be shared by multiple tasks. GPU sharing also allows you to isolate the GPU memory that is allocated to each application and partition the computing capacity of the GPU.
Features and benefits
The GPU sharing solution uses the server kernel driver that is developed by Alibaba Cloud to provide more efficient use of the underlying drivers of NVIDIA GPUs. GPU sharing provides the following features:
High compatibility: GPU sharing is compatible with standard open source solutions, such as Kubernetes and NVIDIA Docker.
Ease of use: GPU sharing provides excellent user experience. To replace a Compute Unified Device Architecture (CUDA) library of an AI application, you do not need to recompile the application or create a new container image.
Stability: GPU sharing provides stable underlying operations on NVIDIA GPUs. API operations on CUDA libraries and some private API operations on CUDA Deep Neural Network (cuDNN) are difficult to call.
Resource isolation: GPU sharing ensures that the allocated GPU memory and computing power do not affect each other.
GPU sharing provides a cost-effective, reliable, and user-friendly solution that allows you to enable GPU scheduling and memory isolation.
Benefit | Description |
Supports GPU sharing, scheduling, and memory isolation. |
|
Supports flexible GPU sharing and memory isolation policies. |
|
Supports comprehensive monitoring of GPU resources. | Supports monitoring of both exclusive GPUs and shared GPUs. |
Billing
GPU sharing is a charged service. You need to activate the cloud-native AI suite before you can use GPU sharing. For more information about the billing details, see Billing of the cloud-native AI suite.
Usage notes
GPU sharing supports only ACK Pro clusters. For more information about how to install and use GPU sharing, see the following topics:
You can also use the following advanced features provided by GPU sharing:
Terms
Share mode and exclusive mode
The share mode allows multiple pods to share one GPU, as shown in the following figure.
The exclusive mode allows a pod to occupy one or more GPUs exclusively, as shown in the following figure.
GPU memory isolation
GPU sharing can only ensure that multiple pods run on one GPU but cannot prevent resource contention among the pods when GPU memory isolation is disabled. The following section shows an example.
Pod 1 requests 5 GiB of GPU memory and Pod 2 requests 10 GiB of GPU memory. When GPU memory isolation is disabled, Pod 1 can use up to 10 GiB of GPU memory, including the 5 GiB of GPU memory requested by Pod 2. Consequently, Pod 2 fails to launch due to insufficient GPU memory. After GPU memory isolation is enabled, when Pod 1 attempts to use GPU memory greater than the requested value, the GPU memory isolation module forces Pod 1 to fail.
GPU scheduling policies: binpack and spread
If a node with the GPU sharing feature enabled has multiple GPUs, you can choose one of the following GPU selection policies:
Binpack: By default, the binpack policy is used. The scheduler allocates all resources of a GPU to pods before you switch to another GPU. This helps prevent GPU fragments.
Spread: The scheduler attempts to spread pods to different GPUs on the node in case business interruptions occur when a GPU is faulty.
In this example, a node has two GPUs. Each GPU provides 15 GiB of memory. Pod1 requests 2 GiB of memory and Pod2 requests 3 GiB of memory.
Single GPU sharing and multiple GPU sharing
Single GPU sharing: A pod can request GPU resources that are allocated by only one GPU.
Multiple GPU sharing: A pod can request GPU resources that are evenly allocated by multiple GPUs.