In machines with multi-core CPUs, Message Passing Interface (MPI) processes are typically distributed across different CPU cores to facilitate parallel processing. However, if these processes use Compute Unified Device Architecture (CUDA) kernels for computation acceleration, they compete for GPU access, leading to inappropriate allocation or inefficient use of GPU resources. Multi-Process Service (MPS) can be used to manage requests from multiple CUDA applications or MPI processes on NVIDIA GPUs, allowing efficient GPU sharing and memory isolation for AI applications. This is achieved by configuring specific labels for node pools in the Container Service for Kubernetes (ACK) console based on the MPS pattern.
Feature introduction
CPU core parallelization based on MPI allows for balanced resource allocation between CPU-intensive tasks, which ensures that multiple computational tasks can run concurrently and accelerates the overall computational process. However, when you use CUDA kernels to accelerate MPI processes, the workload allocated to each MPI process may not fully use the GPU. This can lead to increased speed for each MPI process, but results in low overall GPU usage. Idle resources exist when an application does not send enough tasks to the GPU. In this case, we recommend that you use MPS, which is a technology designed to run multiple CUDA applications on NVIDIA GPUs.
MPS allows concurrent execution of different applications on the same GPU device, which improves the usage of cluster GPU resources. It operates on a Client-Server architecture, ensures binary compatibility, and requires no significant modifications to existing CUDA applications. The MPS components include:
Control Daemon Process: manages the start-up and shutdown of the MPS server. It also orchestrates the connection between the client and the MPS server to ensure smooth requests and usage of GPU resources.
Client Runtime: embedded within the CUDA driver library. It allows developers to use MPS without significant modifications to their CUDA application code. During GPU operations, it automatically manages interactions with the MPS server to achieve secure and efficient GPU sharing among applications.
Server Process: handles requests from multiple clients and efficiently schedules them on a GPU device to achieve concurrency among clients.
Prerequisites
An ACK Pro cluster that runs Kubernetes versions later than 1.20 is created. For more information, see Create an ACK managed cluster and Manually upgrade a cluster.
The GPU sharing component version 1.9.13 or later is installed. For more information, see Configure the GPU sharing component.
Limits
When memory isolation in MPS mode is enabled on a node, the Control Daemon Process uses Exclusive GPU mode to monopolize the GPU for the entire node. In the Client-Server architecture of MPS, clients request GPU resources from the MPS Server.
The GPU scheduling in MPS mode only supports the shared memory mode. The shared computing power mode is not supported.
Procedure
Create a node pool in the console, add the ack.node.gpu.schedule:mps
label to nodes within the pool, and enable GPU sharing and memory isolation in MPS mode. This setting applies to all nodes in the pool.
Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side navigation pane, choose .
Click Create Node Pool, configure node labels, and other necessary settings, then click Confirm Order.
The following table describes the key parameters. For more information about the complete list of parameters, see Create a node pool.
Parameter
Description
Expected Nodes
The initial number of nodes in the node pool. If you do not want to create nodes in the node pool, set this parameter to 0.
Node Label
Assign the Key as ack.node.gpu.schedule and the Value as mps.
Once configured, you can follow the steps in Examples of using GPU sharing to share GPUs to declare the required memory resources (
aliyun.com/gpu-mem
) in the pod that runs the application.
References
For more information about how to configure the YAML file to create a container for which GPU sharing is enabled, see Examples of using GPU sharing to share GPUs.