Container Service for Kubernetes provides the GPU sharing feature that allows multiple prediction models to share one GPU and supports GPU memory isolation based on the NVIDIA kernel mode driver. This topic describes how to install the ack-cgpu component, which can be used to share GPUs, isolate GPU memory, and query GPU allocation information.
Prerequisites
An ACK dedicated cluster that contains GPU-accelerated nodes is created. For more information, see Create an ACK cluster with GPU-accelerated nodes.
A kubectl client is connected to the cluster. For more information, see Obtain the kubeconfig file of a cluster and use kubectl to connect to the cluster.
Limits
Do not set the CPU policy to static
for nodes that enable GPU sharing.
The following table describes other limits.
Item | Requirement |
Kubernetes | Kubernetes 1.12.6 or later |
Operating system | Operating systems supported by ACK except Windows |
GPU model | For more information about the GPU model supported by ACK, see GPU-accelerated ECS instance types supported by ACK |
Step 1: Add labels to GPU-accelerated nodes
Log on to the ACK console. In the left-side navigation pane, click Cluster.
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, choose .
In the upper-right corner of the Nodes page, click Manage Labels and Taints.
On the Labels tab of the Manage Labels and Taints page, select the nodes that you want to manage and click Add Label.
In the Add dialog box, configure the Name and Value parameters, and then click OK.
To enable cGPU, you must set the Name parameter to cgpu and Value parameter to true.
To disable cGPU, set the Name parameter to cgpu and Value parameter to false. You cannot disable cGPU by deleting the cgpu label.
Step 2: Install the ack-cgpu component on the labeled nodes
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, choose .
In the upper-left corner of the Helm page, click Deploy. In the Deploy panel, select ack-cgpu for the Chart parameter and configure other parameters based on the on-screen instructions to install the ack-cgpu component.
On the Helm page, if the status of ack-cgpu changes to Deployed, ack-cgpu is deployed.