ack-ai-installer is a collection of device plug-ins that are used to enhance the scheduling capabilities of Container Service for Kubernetes (ACK) Pro clusters and ACK Edge Pro clusters. ack-ai-installer can be used together with the ACK scheduler to schedule heterogeneous computing resources based on GPU sharing and topology-aware GPU scheduling. The ACK scheduler is a scheduling system developed based on the Kubernetes scheduling framework to schedule different elastic resources to different workloads. This topic introduces ack-ai-installer and describes the usage notes and release notes for ack-ai-installer.
Introduction
ack-ai-installer can work with the ACK scheduler to implement GPU sharing (including GPU memory isolation) and topology-aware GPU scheduling. ack-ai-installer consists of the following components.
gpushare-device-plugin and cgpu-installer
By default, the ACK scheduler used by ACK Pro clusters and ACK Edge Pro clusters schedules exclusive GPU resources. You can use ack-ai-installer (gpushare-device-plugin) with the ACK scheduler to implement GPU sharing and GPU memory isolation. GPU sharing allows multiple applications or processes to share the same GPU in order to improve the resource utilization. ack-ai-installer (cgpu-installer) also works with cGPU, a GPU virtualization and sharing service of Alibaba Cloud, to implement GPU memory isolation. GPU memory isolation can isolate different applications or processes in GPU memory to prevent mutual interference and improve the overall performance and efficiency of the system. In addition, ack-ai-installer (cgpu-installer) supports GPU computing power isolation and provides different scheduling policies, including fair-share scheduling, preemptive scheduling, and weight-based preemptive scheduling, to schedule and use GPU computing power in a more fine-grained manner. For more information about GPU sharing and GPU memory isolation, such as the installation procedure and use scenarios, see Configure the GPU sharing component and Use cGPU to allocate computing power.
gputopo-device-plugin
gputopo-device-plugin works with the ACK scheduler to implement topology-aware GPU scheduling and select the optimal combination of GPUs to accelerate training jobs. For more information about topology-aware GPU scheduling, such as the installation procedure and use scenarios, see GPU topology-aware scheduling.
Usage notes
You can install ack-ai-installer only from the AI Developer Console page of an ACK Pro cluster or ACK Edge Pro cluster that runs Kubernetes 1.18 or later. ack-ai-installer is pre-installed as a component in an ACK Lingjun cluster that runs Kubernetes 1.18 or later.
Description
December 2023
Version | Description | Release date | Impact |
1.8.7 |
| 2023-12-20 | No impact on workloads. |
August 2023
Version | Description | Release date | Impact |
1.8.2 |
| 2023-08-29 | No impact on workloads. |
April 2023
Version | Description | Release date | Impact |
1.7.6 |
| 2023-04-26 | No impact on workloads. |