All Products
Search
Document Center

Container Service for Kubernetes:Introduction and release notes for the ack-ai-installer component

Last Updated:Aug 08, 2025

ack-ai-installer is a collection of Device Plugins that enhances the scheduling capabilities of ACK Managed Cluster Pro and ACK Edge Cluster Pro. It works with ACK Scheduler to perform complex scheduling for heterogeneous computing resources, such as shared GPU scheduling and GPU topology-aware scheduling. ACK Scheduler is a unified scheduling system based on the Kubernetes Scheduling Framework extension mechanism. It is designed for various workloads and elastic resources. This topic describes the basic information, usage notes, and release history of the ack-ai-installer component.

Component introduction

ack-ai-installer works with ACK Scheduler to provide scheduling features such as shared GPU scheduling with isolation and GPU topology-aware scheduling. ack-ai-installer currently includes the following components.

gpushare-device-plugin and cgpu-installer

By default, ACK Scheduler in ACK Managed Cluster Pro and ACK Edge Cluster Pro supports dedicated GPU scheduling. ack-ai-installer (gpushare-device-plugin) works with ACK Scheduler to enable shared GPU scheduling and isolation. Shared GPU scheduling allows multiple applications or processes to share a single GPU card, which improves system resource utilization. Building on shared GPU scheduling, ack-ai-installer (cgpu-installer) also integrates with cGPU, Alibaba Cloud's GPU container sharing technology, to support GPU memory isolation. This isolates different applications or processes from each other in the GPU memory, prevents interference between tasks, and improves overall system performance and efficiency. In addition, ack-ai-installer (cgpu-installer) supports computing power isolation and provides various allocation policies, such as average, preemption, and weight. This allows for more fine-grained scheduling and utilization of GPU computing power resources. For more information about shared GPU scheduling and isolation, such as installation methods and scenarios, see Manage the shared GPU scheduling component and Allocate computing power using shared GPU scheduling.

gputopo-device-plugin

Working with ACK Scheduler, ack-ai-installer (gputopo-device-plugin) enables GPU topology-aware scheduling. This feature selects the GPU combination on a node that provides the optimal training speed. For more information about GPU topology-aware scheduling, such as the installation procedure and scenarios, see GPU topology-aware scheduling.

Usage notes

  • You can install the ack-ai-installer component only on ACK Managed Cluster Pro and ACK Edge Cluster Pro from the Cloud-native AI Suite page in the console. The component is pre-installed in ACK Lingjun managed clusters.

  • If the ack-ai-installer component version is earlier than 1.12.0, cluster versions 1.18.8 and later are supported.

  • If the ack-ai-installer component version is 1.12.0 or later, only cluster versions 1.20 and later are supported.

Release notes

August 2025

Version

Changes

Modification Time

Impact

1.12.8

Updates in cGPU 1.5.20:

  • Fixed a rare cGPU instance ID conflict issue that occurred during concurrent pod creation.

August 4, 2025

This upgrade does not affect existing services.

July 2025

Version

Changes

Release date

Impact

1.12.7

  • Upgraded cGPU to 1.5.19.

  • gpushare-device-plugin: Fixed an issue where the plugin could not retry after an NVML call failed during startup.

July 17, 2025

This upgrade does not affect existing services.

1.12.6

Updates in cGPU 1.5.19:

  • Added support for Alibaba Cloud Linux 3 container-optimized OS images.

  • Added support for modifying computing power allocation using time slicing (policy 5).

  • Fixed an issue where multi-GPU pods failed to be created in a cgroup v2 environment.

  • Added support for computing power allocation (policies 0-4) for ebmgn9t.

July 16, 2025

This upgrade does not affect existing services.

June 2025

Version

Changes

Modification Time

Impact

1.12.5

  • Upgraded cGPU to 1.5.18.

  • Fixed an issue where the first GPU pod on a cGPU node failed to start in some scenarios.

June 23, 2025

This upgrade does not affect existing services.

1.12.4

  • Upgraded cGPU to 1.5.17, which supports vLLM 0.6.6 and earlier.

  • cgpu-installer: Added support for installation on CentOS 7 and Alibaba Cloud Linux 2.

June 19, 2025

This upgrade does not affect existing services.

May 2025

Version

Changes

Modification Time

Impact

1.12.3

  • Upgraded cGPU to 1.5.16.

  • cgpu-installer: Added an installation retry feature.

May 14, 2025

This upgrade does not affect existing services.

March 2025

Version

Changes

Modification Time

Impact

1.12.2

  • Upgraded cGPU to 1.5.15.

  • cgpu-installer: Added node affinity to prevent scheduling to Node Lingjun.

March 17, 2025

This upgrade does not affect existing services.

February 2025

Version

Changes

Release date

Impact

1.12.1

  • Upgraded cGPU to 1.5.15.

  • gpushare-device-plugin: Added a node resource health check feature.

February 18, 2025

This upgrade does not affect existing services.

January 2025

Version

Changes

Release date

Impact

1.12.0

  • Released cGPU 1.5.15, which supports containerized installation of cGPU.

  • Restricted the privileged permissions of the cgpu-installer container.

  • Added a precheck before cGPU installation. If the precheck fails, a `CGPUInstallFailed` Kubernetes event is reported.

  • Starting from this version, the ack-ai-installer component supports only cluster versions 1.20 and later.

January 3, 2025

This upgrade does not affect existing services.

November 2024

Version

Changes

Last Modified

Impact

1.11.1

Released cGPU 1.5.13. This version fixes a rare kernel crash issue that could be caused by residual container processes.

November 19, 2024

This upgrade does not affect existing services.

1.10.1

Released cGPU 1.5.12. This version fixes an issue where GPU memory isolation failed for some CUDA APIs with new driver versions such as 535.

November 7, 2024

This upgrade does not affect existing services.

September 2024

Version

Changes

Modification Time

Impact

1.9.16

  • Upgraded cGPU to 1.5.11.

  • Moved the cGPU installation process to an init container.

September 26, 2024

This upgrade does not affect existing services.

1.9.15

Released cGPU 1.5.11. This version fixes decoding-related issues.

September 19, 2024

This upgrade does not affect existing services.

August 2024

Version

Changes

Modification Time

Impact

1.9.14

  • Fixed some issues related to the use of MPS Daemon.

  • Released cGPU 1.5.10. This version adds Policy 6 for proportional splitting of computing power and GPU memory.

August 21, 2024

This upgrade does not affect existing services.

1.9.14

Released cGPU 1.5.9. This version adds Policy 6 for proportional splitting of computing power and GPU memory.

August 13, 2024

This upgrade does not affect existing services.

May 2024

Version

Changes

Modification Time

Impact

1.9.11

Released cGPU 1.5.7. This version adds support for L-series GPUs and GPU drivers of version 550 and later.

May 14, 2024

This upgrade does not affect existing services.

1.9.10

Released cGPU 1.5.7. This version fixes an issue where cgpu policy set was invalid.

May 9, 2024

This upgrade does not affect existing services.

January 2024

Version

Changes

Last Modified

Impact

1.8.8

Released cGPU 1.5.6. This version introduces a new cGPU License Server policy.

January 4, 2024

This upgrade does not affect existing services.

December 2023

Version

Changes

Last Modified

Impact

1.8.7

  • Upgraded cGPU to 1.5.5.

  • Added support for shared GPU scheduling with MPS.

December 20, 2023

This upgrade does not affect existing services.

November 2023

Version

Changes

Modification Time

Impact

1.8.5

Released cGPU 1.5.5. This version fixes a Kernel Panic issue triggered by cgpu-procfs.

November 23, 2023

This upgrade does not affect existing services.

August 2023

Version

Changes

Modification Time

Impact

1.8.2

  • Upgraded cGPU to 1.5.3.

  • Added support for dynamic multi-instance GPU (MIG) partitioning.

  • Fixed an issue where device-plugin-recover repeatedly restarted.

August 29, 2023

This upgrade does not affect existing services.

July 2023

Version

Changes

Modification Time

Impact

1.7.7

  • Released cGPU 1.5.3.

  • Fixed an issue with incorrect symbolic links for nvidia-container-toolkit and nvidia-container-runtime-hook.

  • Fixed an incompatibility issue with later driver versions (470.182.03, 515.105.01, 525.105.17, and later).

July 4, 2023

This upgrade does not affect existing services.

April 2023

Version

Changes

Modification Time

Impact

1.7.6

  • Released cGPU 1.5.2. This version fixes an issue with incorrect systemd cgroup permissions.

  • Resolved compatibility issues for cGPU with driver versions 5xx and later.

  • Resolved support issues for cGPU with nvidia-container-runtime versions 1.10 and later.

  • Fixed support issues for cGPU 1.5.1 on containerd.

April 26, 2023

This upgrade does not affect existing services.

1.7.5

Released cGPU 1.5.2.

April 18, 2023

This upgrade does not affect existing services.