All Products
Search
Document Center

:Install the ack-cgpu component

更新時間:Jun 07, 2024

Container Service for Kubernetes provides the GPU sharing feature that allows multiple prediction models to share one GPU and supports GPU memory isolation based on the NVIDIA kernel mode driver. This topic describes how to install the ack-cgpu component, which can be used to share GPUs, isolate GPU memory, and query GPU allocation information.

Prerequisites

Limits

Do not set the CPU policy to static for nodes that enable GPU sharing.

The following table describes other limits.

Item

Requirement

Kubernetes

Kubernetes 1.12.6 or later

Operating system

Operating systems supported by ACK except Windows

GPU model

For more information about the GPU model supported by ACK, see GPU-accelerated ECS instance types supported by ACK

Step 1: Add labels to GPU-accelerated nodes

  1. Log on to the ACK console. In the left-side navigation pane, click Cluster.

  2. On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, choose Nodes > Nodes.

  3. In the upper-right corner of the Nodes page, click Manage Labels and Taints.

  4. On the Labels tab of the Manage Labels and Taints page, select the nodes that you want to manage and click Add Label.

  5. In the Add dialog box, configure the Name and Value parameters, and then click OK.

    To enable cGPU, you must set the Name parameter to cgpu and Value parameter to true.

Important

To disable cGPU, set the Name parameter to cgpu and Value parameter to false. You cannot disable cGPU by deleting the cgpu label.

Step 2: Install the ack-cgpu component on the labeled nodes

  1. On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, choose Applications > Helm.

  2. In the upper-left corner of the Helm page, click Deploy. In the Deploy panel, select ack-cgpu for the Chart parameter and configure other parameters based on the on-screen instructions to install the ack-cgpu component.

    On the Helm page, if the status of ack-cgpu changes to Deployed, ack-cgpu is deployed.