All Products
Search
Document Center

Container Service for Kubernetes:Use MPS for GPU sharing and memory isolation

Last Updated:Sep 13, 2024

In machines with multi-core CPUs, Message Passing Interface (MPI) processes are typically distributed across different CPU cores to facilitate parallel processing. However, if these processes use Compute Unified Device Architecture (CUDA) kernels for computation acceleration, they compete for GPU access, leading to inappropriate allocation or inefficient use of GPU resources. Multi-Process Service (MPS) can be used to manage requests from multiple CUDA applications or MPI processes on NVIDIA GPUs, allowing efficient GPU sharing and memory isolation for AI applications. This is achieved by configuring specific labels for node pools in the Container Service for Kubernetes (ACK) console based on the MPS pattern.

Feature introduction

CPU core parallelization based on MPI allows for balanced resource allocation between CPU-intensive tasks, which ensures that multiple computational tasks can run concurrently and accelerates the overall computational process. However, when you use CUDA kernels to accelerate MPI processes, the workload allocated to each MPI process may not fully use the GPU. This can lead to increased speed for each MPI process, but results in low overall GPU usage. Idle resources exist when an application does not send enough tasks to the GPU. In this case, we recommend that you use MPS, which is a technology designed to run multiple CUDA applications on NVIDIA GPUs.

MPS allows concurrent execution of different applications on the same GPU device, which improves the usage of cluster GPU resources. It operates on a Client-Server architecture, ensures binary compatibility, and requires no significant modifications to existing CUDA applications. The MPS components include:

  • Control Daemon Process: manages the start-up and shutdown of the MPS server. It also orchestrates the connection between the client and the MPS server to ensure smooth requests and usage of GPU resources.

  • Client Runtime: embedded within the CUDA driver library. It allows developers to use MPS without significant modifications to their CUDA application code. During GPU operations, it automatically manages interactions with the MPS server to achieve secure and efficient GPU sharing among applications.

  • Server Process: handles requests from multiple clients and efficiently schedules them on a GPU device to achieve concurrency among clients.

Prerequisites

Limits

  • When memory isolation in MPS mode is enabled on a node, the Control Daemon Process uses Exclusive GPU mode to monopolize the GPU for the entire node. In the Client-Server architecture of MPS, clients request GPU resources from the MPS Server.

  • The GPU scheduling in MPS mode only supports the shared memory mode. The shared computing power mode is not supported.

Procedure

Create a node pool in the console, add the ack.node.gpu.schedule:mps label to nodes within the pool, and enable GPU sharing and memory isolation in MPS mode. This setting applies to all nodes in the pool.

  1. Log on to the ACK console. In the left-side navigation pane, click Clusters.

  2. On the Clusters page, find the cluster that you want to manage and click its name. In the left-side navigation pane, choose Nodes > Node Pools.

  3. Click Create Node Pool, configure node labels, and other necessary settings, then click Confirm Order.

    The following table describes the key parameters. For more information about the complete list of parameters, see Create a node pool.

    Parameter

    Description

    Expected Nodes

    The initial number of nodes in the node pool. If you do not want to create nodes in the node pool, set this parameter to 0.

    Node Label

    Assign the Key as ack.node.gpu.schedule and the Value as mps.

    Once configured, you can follow the steps in Examples of using GPU sharing to share GPUs to declare the required memory resources (aliyun.com/gpu-mem) in the pod that runs the application.

References

For more information about how to configure the YAML file to create a container for which GPU sharing is enabled, see Examples of using GPU sharing to share GPUs.