All Products
Search
Document Center

Container Service for Kubernetes:ACK release notes 2024

Last Updated:Feb 14, 2025

This topic describes the release notes for Container Service for Kubernetes (ACK) and provides links to the relevant references.

Background information

  • For more information about the Kubernetes versions supported by Container Service for Kubernetes (ACK), see Support for Kubernetes versions.

  • The following operating systems are supported by Container Service for Kubernetes (ACK): Container OS, Alibaba Cloud Linux 3, Alibaba Cloud Linux 3 (ARM), Alibaba Cloud Linux (UEFI 3), Windows, Red Hat, and Ubuntu. For more information, see OS images.

December 2024

Product

Feature

Description

Region

References

Container Service for Kubernetes

OCI artifact signing and signature verification based on Notation and Ratify

The notation-alibabacloud-secret-manager component is provided by Alibaba Cloud to allow you to sign Open Container Initiative (OCI) artifacts hosted in Container Registry by using keys managed by Key Management Service (KMS). You can install Ratify in your cluster to verify the signatures of images in your cluster. This helps you block images with invalid signatures and improve system security.

All regions

Use Notation and Ratify for OCI artifact signing and signature verification

Storage monitoring

Storage monitoring is provided based on Managed Service for Prometheus. After you enable Managed Service for Prometheus for your cluster, you can view monitoring information about the storage resources in the cluster, on nodes, and in pods. You can also view monitoring information about external storage resources that are mounted to the cluster as volumes. The storage monitoring dashboards display usage information about the storage resources used by your cluster in real time.

All regions

View storage monitoring information

Workload stability and performance analysis supported by cost insights

Workload stability and performance analysis are supported by cost insights. You can enable the cost insights feature to quickly identify risks related to the stability, performance, and costs of your workloads in ACK clusters. Cost insights sorts pods in the cluster by resource utilization and provides detailed data views for pods whose QoS classes are Burstable and BestEffort to facilitate resource configuration monitoring.

All regions

Use cost insights to identify risks for cluster workloads

Multi-dimensional cost aggregation and idle cost processing policies supported by the cost API

New parameters are supported by the cost API. You can use the parameters to filter or aggregate cost data by pod label or node name. This allows you to flexibly manage and optimize costs. You can customize the processing logic of idle cluster costs. For example, you can use different cost allocation policies and dimensions. This allows you to flexibly manage and optimize costs.

All regions

Call the Cost V2 API

GPU fault alerting and solutions

To resolve GPU faults in ACK clusters, ACK provides monitoring, diagnostics, alerting, and recovery mechanisms from various perspectives.

All regions

Configure GPU fault alerting and solutions

Batch task orchestration

Argo Workflows is a Kubernetes-native workflow engine. It allows you to use YAML or Python to orchestrate concurrent jobs in order to simplify the automation and management of containerized applications. It is suitable for CI/CD pipelines, data processing, and machine learning. You can install the Argo Workflows component to enable batch job orchestration. Then, you can use the Argo CLI or console to create and manage workflows.

All regions

Enable batch task orchestration

ACK One

Geo-disaster recovery based on ALB multi-cluster gateways of ACK One

Geo-disaster recovery is supported by Distributed Cloud Container Platform for Kubernetes (ACK One) based on Application Load Balancer (ALB) multi-cluster gateways. You can implement geo-disaster recovery to protect data against region-level disasters, such as floods and earthquakes. However, this may increase the response latency, resource costs, and maintenance costs of your business. The related topic describes the architecture and use scenarios of geo-disaster recovery based on ALB multi-cluster gateways of ACK One.

All regions

Use ALB multi-cluster gateways of ACK One to implement geo-disaster recovery

ACK Edge

Virtual nodes

Virtual nodes are supported. When you use an ACK cluster, you may need to launch a large number of pods within a short period of time. If you choose to create on-cloud Elastic Compute Service (ECS) instances for the pods, the creation process can be time-consuming. If you choose to reserve ECS instances, the instances are idle before pod creation and after pod termination, resulting in resource waste. By using virtual nodes, you do not need to reserve or maintain node pools. You can directly schedule pods to elastic container instances that function as virtual nodes to ensure elasticity and reduce resource costs.

All regions

Virtual node management

P2P acceleration

P2P acceleration is supported by ACK Edge clusters. You can enable P2P acceleration to accelerate image pulls and reduce the time required for deploying applications.

All regions

Install a P2P acceleration agent in an ACK cluster

Support for Kubernetes 1.30

Kubernetes 1.30 is supported by ACK Edge clusters.

All regions

Release notes for ACK Edge of Kubernetes 1.30

ACK Lingjun

Image acceleration supported

The aliyun-acr-acceleration-suite component is provided by Alibaba Cloud to enable on-demand image loading. ACK collaborates with Container Registry to enable accelerated images. After you deploy aliyun-acr-acceleration-suite in a cluster, the system automatically converts a source image into an accelerated image. When an image pull request is received, the system decompresses data based on your business requirements without interrupting services. In this case, the system does not have to download or decompress full data. This accelerates application deployment and enables on-demand image loading.

All regions

aliyun-acr-acceleration-suite

November 2024

Product

Feature

Description

Region

References

Container Service for Kubernetes

eRDMA supported

ACK eRDMA Controller is provided by Alibaba Cloud to support elastic Remote Direct Memory Access (RDMA). ACK eRDMA Controller enables eRDMA interface (ERI) management for your clusters and allows you to specify eRDMA settings in pod configurations.

All regions

ACK eRDMA Controller

New releases of ack-secret-manager and secrets-store-csi-driver-provider-alibaba-cloud

New versions of ack-secret-manager and secrets-store-csi-driver-provider-alibaba-cloud are released. You can install the new versions on the Marketplace page in the ACK console.

All regions

RSS supported by Spark jobs based on Celeborn

Apache Celeborn can be used to enable Remote Shuffle Service (RSS) for Spark jobs. Apache Celeborn is used to process intermediate data, such as shuffle data and spilled data, for big data compute engines. Celeborn can efficiently improve the performance, stability, and flexibility of big data compute engines. RSS provides an efficient method to shuffle a large number of datasets. The related topic describes how to deploy Celeborn in an ACK cluster and how to use Celeborn to enable RSS for a Spark job.

All regions

Use Celeborn to enable RSS for Spark jobs

Log management supported for Spark jobs

The related topic describes how to use Simple Log Service to manage the logs of Spark jobs in an ACK cluster.

All regions

Use Simple Log Service to collect the logs of Spark jobs

ossfs troubleshooting supported

ossfs troubleshooting is supported. Object Storage Service (OSS) volumes are Filesystem in Userspace (FUSE) file systems mounted by using ossfs. You can analyze the debug logs or pod logs to troubleshoot ossfs exceptions. The related topic describes common ossfs exceptions and provides examples on how to troubleshoot common ossfs exceptions based on the mode in which ossfs runs.

All regions

Troubleshoot OSSFS exceptions

ACK Serverless

Custom configurations for CoreDNS

When you deploy a containerized application to a cluster, you must access external services or interfaces in addition to the internal services of the cluster. In this case, external domain name resolution is required. You can specify a DNS server for the external domain name to improve the DNS resolution speed. For domain names that are mapped to static IP addresses, you can also add the mappings to the local hosts file.

All regions

Configure custom parameters for managed CoreDNS

ACK One

Network policies supported by Elastic Container Instance-based pods in registered clusters

Kubernetes network policies are supported by Elastic Container Instance-based pods in registered clusters. You can use network policies to implement policy-based access control. If you want to control network traffic to specific applications based on IP addresses or ports, you can configure network policies.

All regions

Use network policies on elastic container instances

Data migration from self-managed ArgoCD to ACK One GitOps supported

Data migration from self-managed ArgoCD to ACK One GitOps is supported. When you migrate data from self-managed ArgoCD to ACK One GitOps, it takes a long time to manually migrate data one by one because a large number of clusters, repositories, and applications are involved. In this case, you can use onectl to perform fast data migration from self-managed ArgoCD to ACK One.

All regions

Migrate data from self-managed ArgoCD to ACK One GitOps

Preemptible elastic container instance creation supported by registered clusters

Preemptible elastic container instances are supported. You can use preemptible elastic container instances to run short-term jobs or stateless applications that feature high scalability and fault tolerance to reduce the cost.

All regions

Create a preemptible elastic container instance

Hybrid disaster recovery based on ALB multi-cluster gateways supported by ACK One

ALB multi-cluster gateways can be used to implement hybrid disaster recovery in ACK One. To build an active-zone redundancy system to improve the availability of business deployed in Kubernetes clusters running in data centers or third-party platforms, you can use ACK One to centrally manage traffic, applications, and clusters. ACK One can be used to route traffic across clusters and seamlessly perform failovers.

All regions

Use MSE multi-cluster gateways to implement hybrid disaster recovery in ACK One

Zone-disaster recovery based on ALB multi-cluster gateways supported by ACK One

ALB multi-cluster gateways can be used to implement zone-disaster recovery in ACK One. The ALB multi-cluster gateways of ACK One can be used together with ACK One GitOps or the multi-cluster application distribution feature to quickly implement zone-disaster recovery. This allows you to ensure the high availability of your business and automatically switch traffic in a seamless manner when a fault occurs.

All regions

Zone-disaster recovery based on ALB multi-cluster gateways of ACK One

Cloud-native AI suite

FUSE client monitoring supported by Fluid JindoRuntime

FUSE client monitoring is supported by Fluid JindoRuntime. Fluid collects metrics of multiple JindoRuntimes (JindoCache engines) and displays the metrics in out-of-the-box JindoRuntime dashboards. The metrics collected by Fluid include caching engine server metrics and FUSE client metrics.

All regions

Enable and use the Fluid JindoRuntime FUSE client for monitoring

ACK Edge

High-performance container networks supported

High-performance container networks are supported. You can deploy Terway Edge as a CNI plug-in to create an underlay network for communication in ACK Edge clusters.

All regions

Terway Edge

October 2024

Product

Feature

Description

Region

References

Container Service for Kubernetes

Release of CCM v2.10.0

Cloud Controller Manager (CCM) v2.10.0 is released. The readiness gates feature is supported. The service.beta.kubernetes.io/alibaba-cloud-loadbalancer-additional-resource-tags annotation is supported for existing instances to modify tags.

All regions

Cloud Controller Manager

Use elastic container instances to run Spark jobs

This example describes how to use elastic container instances to run Spark jobs in an ACK cluster. You can configure scheduling policies to schedule pods to elastic container instances. This way, you can create Elastic Container Instance-based pods and pay only for the resources used by the pods. This reduces idle resources and prevents incurring unexpected costs. In addition, the cost-effectiveness and efficiency of Spark jobs are improved.

All regions

Use elastic container instances to run Spark jobs

ACK Serverless

Custom parameter configurations for managed CoreDNS

You can configure DNS settings for managed CoreDNS by defining a CustomDNSConfig CustomResource (CR).

All regions

Configure custom parameters for managed CoreDNS

ACK One

Serverless computing in self-managed Kubernetes clusters

ACK Virtual Node enables you to create serverless pods in your self-managed Kubernetes clusters and access elastic compute resources in the cloud, including both CPUs and GPUs.

All regions

Use ACK Virtual Node for serverless computing in self-managed Kubernetes clusters

Support for ALB multi-cluster gateways

The Application Load Balancer (ALB) multi-cluster gateways provided by Distributed Cloud Container Platform for Kubernetes (ACK One) are the multi-cluster mode of ALB Ingress. In most cases, ALB multi-cluster gateways can be used in the same manner as the single-cluster mode of ALB Ingress, except for several differences.

All regions

Overview of ALB multi-cluster gateways

Cloud-native AI suite

Optimize model inference performance by using TensorRT

When you use TensorRT to optimize a model, the model trained with a framework such as PyTorch or TensorFlow is first compiled into the TensorRT format. Then, run the model in the TensorRT inference engine. This improves the speed of running the model on NVIDIA GPUs.

All regions

None

ACK Edge

Support for RRSA

You can use the RAM Roles for Service Accounts (RRSA) feature to enforce access control on different pods that are deployed in an ACK cluster. This achieves fine-grained API permission control on pods and reduces security risks.

All regions

Configure RRSA for service accounts to isolate permissions among pods

Support for managed node pools

If you want to manage nodes in groups and simplify node O&M, you can enable the managed node pool feature of ACK for your cluster to automate node O&M tasks, such as OS Common Vulnerabilities and Exposures (CVE) patching, kubelet updates, and node restarts. Compared with regular node pools, managed node pools provide custom O&M capabilities.

All regions

Overview of managed node pools

Support for alert configurations

ACK provides the alert management feature to allow you to centrally manage alerts that are triggered in different scenarios. You can configure alert rules to get notified when a service exception occurs or one of the following metrics exceeds the threshold: key metrics of basic cluster resources, metrics of core cluster components, and application metrics.

All regions

None

Release of Kubernetes 1.28

If you want to update the Kubernetes version from 1.26 to 1.28, submit a ticket to contact the ACK technical team. Other Kubernetes versions cannot be updated.

All regions

Update an ACK Edge cluster

ACK Lingjun

Network topology-aware scheduling

In Lingjun clusters, network topology-aware scheduling can assign pods to the same Layer 1 or Layer 2 forwarding domains. This approach reduces network latency and accelerates job completion.

All regions

Work with network topology-aware scheduling

September 2024

Product

Feature

Description

Region

References

Container Service for Kubernetes

Support for Kubernetes 1.31

Kubernetes 1.31 is supported. You can create ACK clusters that run Kubernetes 1.31 or update ACK clusters from earlier Kubernetes versions to Kubernetes 1.31.

All regions

Kubernetes 1.31

Deletion protection supported by namespaces and Services

After you enable the policy governance feature, deletion protection can be enabled for namespaces or Services that involve businesses-critical and sensitive data to avoid incurring maintenance costs caused by accidental namespace or Service deletion.

All regions

Related operations: Enable deletion protection for a namespace or a Service

Tracing supported by the NGINX Ingress controller

The trace data of the NGINX Ingress controller can be reported to Managed Service for OpenTelemetry. Managed Service for OpenTelemetry persists the trace data and aggregates and computes the trace data in real time to generate monitoring data, which includes trace details and real-time topology. You can troubleshoot and diagnose issues based on the monitoring data.

All regions

Enable tracing for the NGINX Ingress controller

Cost insights for Knative Services

The cost insights feature of ACK can be enabled for Knative Services. This feature helps the finance department analyze resource usage and allocate costs from multiple dimensions. This feature also offers suggestions on cost savings. You can enable the cost insights feature for a Knative Service. This way, you can view the estimated cost of the Knative Service in real time.

All regions

Enable the cost insights feature in Knative Service

Risk identification based on cost insights for cluster workloads

The cost insights feature can be used to identify risks in cluster workloads. You can enable this feature to quickly identify risks related to stability, performance, and cost in cluster workloads. This feature can track the utilization of cluster resources, provide detailed information about the resource configurations of Burstable pods, and identify risks in BestEffort pods.

All regions

Use cost insights to identify risks for cluster workloads

Spark Operator supported for running Spark jobs

Spark Operator can be used to run Spark jobs in ACK clusters. This helps data engineers quickly and efficiently run and manage big data processing jobs.

All regions

Use Spark Operator to run Spark jobs

ACK One

Argo CD alerting

Argo CD alerting is supported. The Fleet monitoring feature provided by ACK One uses Managed Service for Prometheus to collect metrics and display monitoring information about Fleet instances on a dashboard. You can customize alert rules to enable real-time monitoring based on custom metrics.

All regions

Configure ACK One Argo CD alerts

Application distribution

The application distribution feature of ACK One is supported. You can use this feature to distribute an application from a Fleet instance to multiple clusters that are associated with the Fleet instance. This feature allows you to configure distribution policies on a fleet instance. You can use the policies to efficiently distribute eligible Kubernetes resources to clusters that match the policies. In addition, you can configure differentiated policies to meet the deployment requirements of different clusters and applications. Compared with GitOps, this distribution method does not require Git repositories.

All regions

Application distribution overview

Access to Alibaba Cloud DNS PrivateZone supported

Access to Alibaba Cloud DNS PrivateZone is supported. Alibaba Cloud DNS PrivateZone is a VPC-based resolution and management service for private domain names. After a virtual border router (VBR), an IPsec-VPN connection, or a Cloud Connect Network (CCN) instance is connected to a transit router, the on-premises networks that are connected to these network instances can use the transit router to access Alibaba Cloud DNS PrivateZone.

All regions

Manage access to Alibaba Cloud DNS PrivateZone

Statically provisioned NAS volumes supported by registered clusters

Statically provisioned NAS volumes can be mounted to registered clusters. File Storage NAS (NAS) is a distributed file system that supports shared access, elastic scaling, high reliability, and high performance. You can mount statically provisioned NAS volumes to registered clusters to persist and share data.

All regions

Mount a statically provisioned NAS volume

Cloud-native AI suite

Auto recovery for FUSE mount targets

During the lifecycle of an application pod, the Filesystem in Userspace (FUSE) daemon may unexpectedly crash. As a result, the application pod can no longer use the FUSE file system to access data. After you enable the auto recovery feature for the mount targets of a FUSE file system, access to application data can be restored without the need to restart the application pods.

All regions

Enable the auto recovery feature for FUSE mount targets

Cross-namespace dataset sharing

Datasets can be shared across namespaces. Fluid supports data access and cache sharing across namespaces. With Fluid, you need to cache your data only once when you need to share data among multiple teams. This greatly improves data utilization efficiency and data management flexibility, and facilitates collaboration between R&D teams.

All regions

Share datasets across namespaces

ACK Edge

ENS management

The Edge Node Service (ENS) management feature is supported. ACK Edge clusters allow you to run containers on nodes. You can manage ENS instances deployed across multiple regions and Internet service providers (ISPs) in a unified and containerized manner. You can create ENS disks and Edge Load Balancer instances to provide cloud-native storage and networking capabilities.

All regions

ENS management

Service topology management supported by node pools

Service topology management is supported by node pools. The backend endpoints of Kubernetes Services are randomly distributed across nodes. Consequently, when Service requests are distributed to nodes across node groups, these requests may fail to reach the nodes or may not be answered promptly. You can configure a Service topology to expose an application on an edge node only to the current node or nodes in the same edge node pool.

All regions

Configure a Service topology

August 2024

Product

Feature

Description

Region

References

Container Service for Kubernetes

Inventory health status monitoring supported by node instant scaling

Inventory health status monitoring is supported by node instant scaling. The node instant scaling feature can dynamically select instance types and zones based on the inventory status of ECS instances. To monitor the inventory health status of the instance types configured for a node pool and obtain suggestions for the instance types, check the ConfigMap for inventory health status. This allows you to assess the inventory health status of the instance types configured for the node pool and proactively analyze and adjust instance types.

All regions

View the health status of node instant scaling

Multiple update frequencies supported by auto cluster update

The following update frequencies are supported by auto cluster update: Latest Patch Version (patch), Second-Latest Minor Version (stable), and Latest Minor Version (rapid).

All regions

Automatically update a cluster

GPU sharing and memory isolation based on MPS

GPU sharing and memory isolation are supported based on Multi-Process Service (MPS). You can use MPS to manage Compute Unified Device Architecture (CUDA) applications that run on multiple NVIDIA GPUs or Message Passing Interface (MPI) requests. This allows you to share GPU resources. You can add specific labels to node pools in the ACK console to enable GPU sharing and GPU memory isolation in MPS mode for AI applications.

All regions

Use MPS for GPU sharing and memory isolation

Knative 1.12.5

Knative 1.12.5-aliyun.7 is supported. This version is compatible with Kourier 1.12 and supports Container Registry Enterprise Edition and the dashboard for preemptible ECS instances.

All regions

Knative release notes

ACK One

Multi-cluster applications

Multi-cluster applications are supported. You can use an Argo CD ApplicationSets to automatically create one or more applications from one orchestration template.

All regions

Create a multi-cluster application

Elastic node pools using custom images supported by registered clusters

Custom images pre-installed with the required software packages can be used to greatly reduce the amount of time required by an on-cloud node to reach the Ready state and accelerate system startup.

All regions

Build an elastic node pool with a custom image

Argo Workflows SDK for Python supported for large-scale workflow creation

Large-scale workflows can be created by using Argo Workflows SDK for Python. A new topic is added to the workflow cluster best practices to describe how to use Argo Workflows SDK for Python to create large-scale workflows. Hera is an Argo Workflows SDK for Python. Hera is an alternative to YAML and provides an easy method to orchestrate and test complex workflows in Python. In addition, Hera is seamlessly integrated with the Python ecosystem and simplifies workflow design.

All regions

Use Argo Workflows SDK for Python to create large-scale workflows

Event-driven CI pipelines can be created based on EventBridge.

Event-driven Continuous Integration (CI) pipelines based on EventBridge are supported. A new topic is added to the workflow cluster best practices to describe how to build event-driven automated CI pipelines. You can build efficient, fast, and cost-effective event-driven automated CI pipelines based on EventBridge and the distributed Argo workflows to simplify and accelerate application delivery.

All regions

Event-driven CI pipelines based on EventBridge

Cloud-native AI suite

Dify supported for creating AI-powered Q&A assistants

Dify can be used to create AI-powered Q&A assistants. Dify is a platform that can integrate enterprise or individual knowledge bases with large language model (LLM) applications. You can use Dify to design customized AI-assisted Q&A solutions and apply the solutions to your business. This helps facilitate business development and management.

All regions

Use Dify to create a customized AI-powered Q&A assistant for a website

Flowise installation and management

The Flowise component can be installed in ACK clusters. A new topic is added to describe how to install and manage Flowise in ACK clusters. The topic also provides answers to some frequently asked questions about Flowise. In most cases, an LLM application is optimized through multiple iterations during the development process. Flowise provides a drag-and-drop UI to enable quick iterations in a low-code manner. This accelerates the transition from the testing environment to the production environment.

All regions

None

TensorRT-LLM supported for deploying Qwen2 models as inference services

TensorRT-LLM can be used to deploy Qwen2 models as inference services. A new topic is added to describe how to use Triton and TensorRT-LLM to deploy a Qwen2 model as an inference service in ACK. In this topic, the Qwen2-1.5B-Instruct model and the A10 GPUs as an example. In this topic, Fluid Dataflow is used to prepare data during the model deployment and Fluid is used to accelerate model loading.

All regions

Use TensorRT-LLM to deploy a Qwen2 model as an inference service

ACK Edge

Cloud-native AI suite

The cloud-native AI suite can be deployed in ACK Edge clusters. The cloud-native AI suite provides AI Dashboard and AI Developer Console to allow you to view the status of your cluster and quickly submit training jobs.

All regions

Deploy the cloud-native AI suite

July 2024

Product

Feature

Description

Region

References

Container Service for Kubernetes

Tracing supported by the NGINX Ingress controller

NGINX Ingress controller v1.10.2-aliyun.1 is released and supports the tracing feature by using Managed Service for OpenTelemetry.

All regions

Enable tracing for the NGINX Ingress controller

Global network policies supported by Poseidon

Poseidon is a component that supports network policies for ACK clusters. Poseidon v0.5.0 introduces cluster-level global network policies, which allow you to manage network connectivity across namespaces.

All regions

Use ACK GlobalNetworkPolicy

Release of ContainerOS 3.3

ContainerOS is an operating system provided by Alibaba Cloud. It is vertically optimized for container scenarios. ContainerOS provides enhanced security, faster startup, and simplified system services and software packages. The kernel version of ContainerOS 3.3 is updated to 5.10.134-17.0.2.lifsea8. By default, cgroup v2 is used to isolate container resources. Vulnerabilities and defects are fixed.

All regions

Release notes for ContainerOS images

Custom worker RAM roles for node pools

A Container Service for Kubernetes (ACK) managed cluster automatically creates a default worker Resource Access Management (RAM) role shared by all nodes. If you authorize an application using this default worker RAM role, the permissions are shared among all nodes in the cluster, which may unintentionally grant more permissions than necessary. You can assign a custom worker RAM role to a node pool upon creation. By assigning specific roles to different node pools, you can isolate the permissions of each node pool, thereby reducing the risk of all nodes in the cluster sharing the same permissions.

All regions

Use custom worker RAM roles

New security policy added to the security policy library

The ACKBlockVolumeTypes policy is added. You can use this policy to specify the volumes that cannot be used by pods in the specified namespaces.

All regions

ACKBlockVolumeTypes

New NVIDIA GPU driver version

NVIDIA GPU driver 550.90.07 is supported.

All regions

NVIDIA driver versions supported by ACK

Best practices for using LMDeploy to deploy a Qwen model as an inference service

The Qwen1.5-4B-Chat model and A10 GPU are used to demonstrate how to use the LMDeploy framework to deploy the Qwen model as an inference service in ACK.

All regions

Use LMDeploy to deploy the Qwen model inference service

Best practices for using KServe to deploy inference services that share a GPU

In some scenarios, you may want multiple inference tasks to share the same GPU to improve GPU utilization. The Qwen1.5-0.5B-Chat model and the V100 GPU are used to describe how to use KServe to deploy inference services that share a GPU.

All regions

Deploy inference services that share a GPU

ACK One

Best practices for event-driven CI pipelines based on EventBridge

You can build efficient, fast, and cost-effective event-driven automated CI pipelines based on EventBridge and the distributed Argo workflows to significantly simplify and accelerate application delivery.

All regions

Event-driven CI pipelines based on EventBridge

Multi-cluster application orchestration through GitOps

You can orchestrate multi-cluster applications in the GitOps console and use Git repositories as application sources to implement version management, multi-cluster distribution, and Continuous Deployment (CD) for applications that use multiple orchestration methods, such as YAML manifests, Helm charts, and Kustomize.

All regions

Use an ApplicationSet to create multiple applications

Elastic node pools using custom images in registered clusters

Custom images pre-installed with the required software packages can be used to greatly reduce the amount of time required by an on-cloud node to reach the Ready state and accelerate system startup.

All regions

Build an elastic node pool with a custom image

Cloud-native AI suite

Filesystem in Userspace (FUSE) mount target auto repair

Fluid supports polling check and periodic automatic repair for FUSE mount targets to improve the stability of access to business data.

All regions

None

ACK Edge

Support for Kubernetes 1.28

You can create ACK Edge clusters that run Kubernetes 1.28.9-aliyun.1.

All regions

Release notes for ACK Edge of Kubernetes 1.28

Support for the Container Storage Interface (CSI) plug-in

This topic describes the types of storage medium supported by the volume plug-ins for ACK Edge clusters and the limits of the volume plug-ins based on node types and integration methods.

All regions

Storage overview

Support for the cloud-native AI suite

ACK Edge clusters support all features of the cloud-native AI suite in on-cloud environments, but some features are not supported in on-premises environments. The capabilities and limits of the cloud-native AI suite supported by different node types and network types are different.

All regions

Cloud-native AI suite

Best practices for using Ingresses

This topic describes the usage notes for deploying an Ingress controller in an edge node pool and the differences between deploying an Ingress controller in an on-cloud node pool and deploying an Ingress controller in an edge node pool.

All regions

June 2024

Product

Feature

Description

Region

References

Container Service for Kubernetes

Support for Kubernetes 1.30

Kubernetes 1.30 is supported. You can create ACK clusters that run Kubernetes 1.30 or update ACK clusters from earlier Kubernetes versions to Kubernetes 1.30.

All regions

Node pool OS parameter customization

If the default parameter settings of the node OS, such as Linux, do not meet your business requirements, you can customize the OS parameters of your node pools to improve the OS performance.

All regions

Customize the OS parameters of a node pool

Support for Ubuntu

Ubuntu 22.04 is supported. You can use Ubuntu 22.04 as the node OS of ACK clusters that run Kubernetes 1.30 or later.

All regions

OS images

Descheduling enhanced

Descheduling is a process of evicting specific pods from one node to another node. In scenarios where the resource utilization among nodes is imbalanced, nodes are overloaded, or new scheduling policies are required, you can use the scheduling feature to resolve issues or meet your requirements. The Koordinator Descheduler module of the ack-koordinator component is enhanced in terms of the following capabilities: descheduling policies, pod eviction methods, and eviction traffic control.

All regions

Network Load Balancer (NLB) instances configurable by using Services in the ACK console.

Services can be created and managed in the ACK console to configure NLB instances. NLB is a Layer 4 load balancing service intended for the Internet of Everything (IoE). NLB offers ultra-high performance and can automatically scale on demand. An NLB instance supports up to 100 million concurrent connections.

All regions

New release of csi-provisioner

csi-provisioner allows you to automatically create volumes. A new version of csi-provisioner is released and the managed version of csi-provisioner, which does not consume node resources, is also released. NAS file systems can be mounted on Alibaba Cloud Linux 3 by using the Transport Layer Security (TLS) protocol. Ubuntu nodes are supported by csi-provisioner.

All regions

csi-provisioner

ACK One

Fleet monitoring enhanced

Fleet monitoring is supported by ACK One and global monitoring for clusters associated with Fleet instances is enhanced. A dashboard is provided to display monitoring information about the Fleet instances, including metrics of key components and the GitOps system. The global monitoring feature collects metrics from different clusters and displays global monitoring information and cost insights data about these clusters on a dashboard.

All regions

Fleet monitoring

Cloud-native AI suite

Cloud-native AI suite free of charge

The cloud-native AI suite is free of charge. You can use all features provided by the cloud-native AI suite to build customized AI production systems on ACK and implement full-stack optimizations for AI and machine learning (ML) applications and systems. The cloud-native AI suite allows you to experience the benefits of the cloud-native AI technology and helps facilitate business innovation and intelligent transformation.

All regions

[Free Component Notice] Cloud-native AI suite is free of charge

ACK Edge

Disk storage supported by on-cloud node pools

The CSI used in ACK managed clusters is supported by ACK Edge clusters. The CSI component installed in on-cloud node pools of ACK Edge clusters provides the same features as the CSI component installed in ACK managed clusters. You can mount disks to on-cloud node pools by configuring persistent volumes (PVs) and persistent volume claims (PVCs).

All regions

None

Access to workloads in data centers supported by using Express Connect circuits

Computing devices in data centers and edge devices can be connected to ACK. The API server of an ACK cluster can use Express Connect circuits to access pods or Services deployed at the edge. This feature is implemented based on the edge controller manager (ECM). The ECM is responsible for automating routing configuration for access from VPCs to pods deployed at the edge.

All regions

Network management

May 2024

Product

Feature

Description

Region

References

Container Service for Kubernetes

Reuse of NLB instances across VPCs supported by cloud-controller-manager

cloud-controller-manager v2.9.1 is released, which supports reuse of NLB instances across VPCs and NLB server group weights. It supports scenarios where an NLB instance is connected to both ECS instances and pods. This version also optimizes support for NLB IPv6.

All regions

Cloud Controller Manager

Custom routing rules visualized for ALB Ingresses

Custom routing rules can be created in a visualized manner for ALB Ingresses. You can specify routing conditions to route requests based on paths, domain names, and request headers, and specify actions to route requests to specific Services or return fixed responses.

All regions

Customize the routing rules of an ALB Ingress

NVMe disk multi-instance mounting and reservation

An NVMe disk can be mounted to multiple instances and the reservation feature is supported. You can mount an NVMe disk to at most 16 instances and further use the reservation feature that complies with the NVMe specifications. These features help ensure data consistency for applications such as databases and enable you to perform failovers much faster.

All regions

Use the multi-attach and NVMe reservation features of NVMe disks

ossfs version switching by using the feature gate

In CSI 1.30.1 and later, you can enable the corresponding feature gate to switch to ossfs 1.91 or later to improve the performance of file systems. If you require high file system performance, we recommend that you switch the ossfs version to 1.91 or later.

All regions

ACK One

CI pipelines created based on workflow clusters for Golang projects

ACK One workflow clusters are developed based on open source Argo Workflows. With the benefits of ultrahigh elasticity, auto scaling, and zero O&M costs, hosted Argo Workflows can help you quickly create CI pipelines with low costs. This best practice describes how to create CI pipelines for Golang projects based on workflow clusters.

All regions

Create CI pipelines for Golang projects in workflow clusters

Cloud-native AI suite

Dataset mount target dynamic mounting supported by Fluid

Fluid supports dynamically mounting dataset mount target. Fluid can automatically update and dynamically mount the dataset mount target corresponding to the PV and PVC within the container.

All regions

N/A

April 2024

Product

Feature

Description

Region

References

Container Service for Kubernetes

Anomaly diagnostics for ACK clusters supported by ACK AI Assistant

Anomaly diagnostics for ACK clusters is supported by ACK AI Assistant. You can use ACK AI assistant to analyze and diagnose failed tasks, error logs, and component update failures in ACK clusters. This simplifies your O&M work.

All regions

Use ACK AI Assistant to help troubleshoot issues and find answers to your questions

RRSA authentication for OSS volumes

RRSA authentication can be configured for PVs to limit the permissions to perform API operations on specific Object Storage Service (OSS) volumes. This enables you to regulate access to cloud resources in fine-grained manner and enhance cluster security.

All regions

Use RRSA authentication to mount a statically provisioned OSS volume

EIPs with Anti-DDoS (Enhanced) enabled for pods

ACK Extend Network Controller v0.9.0 can create and manage VPC resources such as NAT gateways and elastic IP addresses (EIPs), and bind EIPs with Anti-DDoS (Enhanced Edition) enabled to pods. This version is suitable for scenarios where you want to enable Anti-DDoS protection for pods that are exposed to the Internet.

All regions

Associate an exclusive EIP with a pod

New predefined security policies added to policy governance

The following predefined security policies are added to the policy governance module: ACKServicesDeleteProtection,

ACKPVSizeConstraint,

and ACKPVCConstraint.

All regions

Predefined security policies of ACK

ACK Edge

Offline O&M tool for edge nodes

In most cloud-edge collaboration scenarios, edge nodes are usually offline due to network instability. When a node is offline, you cannot perform O&M operations on the node, such as business updates and configuration changes. ACK Edge clusters provide the offline O&M tool that you can use to perform O&M operations on edge nodes in emergency scenarios.

All regions

Offline O&M tool for edge nodes

ACK One

Multi-cluster gateway visualized management

Microservices Engine (MSE) cloud-native gateways can serve as multi-cluster gateways based on the MSE Ingress controller hosted in ACK One. MSE cloud-native gateways allow you to view topologies in a visualized manner and create MSE Ingresses to manage north-south traffic. You can use MSE cloud-native gateways to implement active zone-redundancy, multi-cluster load balancing, and traffic routing to specific clusters based on request headers.

All regions

Manage gateways

Access from Kubernetes clusters for distributed Argo workflows to OSS optimized

A variety of features are added to ACK One Argo Workflows, including multipart uploading for ultra-large files, artifact auto garbage collection, and artifact transmission in streaming mode. These features allow you to manage OSS objects in an efficient and secure manner.

All regions

Configure artifacts

Cloud-native AI suite

Quick MLflow deployment in ACK clusters

MLflow can be deployed in ACK clusters with a few clicks. You can use MLflow to track model training, and manage and deploy machine learning models. The cloud-native AI suite also supports lifecycle management for models in MLflow Model Registry.

All regions

March 2024

Product

Feature

Description

Region

References

Container Service for Kubernetes

Kubeconfig file deletion and kubeconfig recycle bin supported

Alibaba Cloud accounts, RAM users, or RAM roles with certain permissions can be used to view and manage the status of issued kubeconfig files. You can delete kubeconfig files or revoke permissions provided by kubeconfig files that may pose security risks. You can also use the kubeconfig recycle bin to restore kubeconfig files that are deleted within the previous 30 days.

All regions

GPU device isolation

In ACK exclusive GPU scheduling scenarios, ACK provides a mechanism to allow you to isolate a faulty device on a GPU node in order to avoid scheduling new GPU devices to the faulty device.

All regions

GPU Device Plugin-related operations

Practices for collecting the metrics of the specified virtual node

In a cluster that has multiple virtual nodes, you can specify a virtual node to collect its metrics. This reduces the amount of data collected at a time. When a large number of containers are deployed on virtual nodes, this solution can efficiently reduce the loads of the monitoring trace.

All regions

Collect the metrics of the specified virtual node

February 2024

Product

Feature

Description

Region

References

Container Service for Kubernetes

ACK Virtual Node 2.11.0 released

ACK Virtual Node 2.11.0 supports Windows instances and its scheduling semantics support Windows nodes. This version also allows you to enable the System Operations & Maintenance (SysOM) feature for elastic container instances to monitor resources such as the kernel. In addition, certificates can be generated more efficiently during the creation of Elastic Container Instance-based pods.

All regions

Distributed Container Platform ACK One

Knative supported by registered clusters

Knative is a Kubernetes-based serverless framework. The purpose of Knative is to create a cloud-native and cross-platform orchestration standard for serverless applications. Knative integrates the creation of containers, workload management, and event models, to help you create an enterprise-level serverless platform to deploy and manage serverless workloads.

All regions

Knative overview

ACK One-based zone-disaster recovery in hybrid cloud environments

If your businesses run in Kubernetes clusters in data centers or on third-party public clouds and you want to use cloud computing to implement zone-disaster recovery for business high availability, you can use ACK One. ACK One allows you to centrally manage traffic, applications, and clusters, route traffic across clusters, and seamlessly perform traffic failovers.

ACK One uses the managed MSE Ingress controller to manage MSE cloud-native gateways that serve as multi-cluster gateways and uses the Ingress API to define traffic routing rules. ACK One can manage Layer 7 north-south traffic in multi-cloud, multi-cluster, and hybrid cloud scenarios. Compared with traditional DNS-based solutions, the zone-disaster recovery system developed based on ACK One multi-cluster gateways reduces the architecture complexity, usage costs, and management costs. It also supports millisecond-level seamless migration and Layer 7 routing.

All regions

Use MSE multi-cluster gateways to implement hybrid disaster recovery in ACK One

Support for AI scenarios improved and access acceleration to objects in OSS bucket by using Fluid supported

Fluid is an open source, Kubernetes-native distributed dataset orchestrator and accelerator for data-intensive applications in cloud-native scenarios, such as big data applications and AI applications. You can use Fluid to accelerate access to OSS files in registered clusters.

All regions

Use Fluid to accelerate access to OSS objects

DingTalk chatbots for receiving notifications about GitOps application updates

In the multi-cluster GitOps continuous delivery scenario, the high availability deployment of applications and multi-cluster distribution of system components enable you to use a diversity of notification services. In this case, you can use a DingTalk chatbot to receive notifications about GitOps application updates.

All regions

Use a DingTalk chatbot to receive notifications about GitOps application updates

Cloud-native AI suite

Best practices for Ray clusters

You can quickly create a Ray cluster in an ACK cluster and integrate the Ray cluster with Simple Log Service, Managed Service for Prometheus, and ApsaraDB for Redis to optimize log management, observability, and availability. The Ray autoscaler can work with the ACK autoscaler to improve the efficiency of computing resource scaling and increase resource utilization.

All regions

Best practices for Ray clusters

January 2024

Product

Feature

Description

Region

References

Container Service for Kubernetes

ACK AI Assistant released

Container Service for Kubernetes (ACK) AI Assistant is developed by the ACK team based on a large language model. ACK AI Assistant is empowered by technology expertise and years of experience of the ACK team in the Kubernetes and cloud-native technology sectors, the observability of the ACK O&M system, and rich experience provided by experts in ACK diagnostics. It can help you find answers to your questions and diagnose issues related to ACK and Kubernetes based on the large language model.

All regions

Use ACK AI Assistant to help troubleshoot issues and find answers to your questions

OS kernel-level container monitoring capabilities available

Alibaba Cloud provides the Tracing Analysis service that offers developers of distributed applications various features, such as trace mapping, request statistics, and trace topology. The Tracing Analysis service helps you quickly analyze and diagnose performance bottlenecks in a distributed application architecture and improves the efficiency of development and diagnostics for microservices applications. You can install the Application Load Balancer (ALB) Ingress controller and enable the Xtrace feature in a cluster. After the Xtrace feature is enabled, you can view the tracing data.

All regions

Use AlbConfigs to enable Tracing Analysis based on Xtrace

ACK Edge

Support for Kubernetes 1.26

Kubernetes 1.26 is released for ACK Edge clusters. This version optimizes and adds features, such as edge node autonomy and edge node access.

All regions

Release notes for ACK Edge of Kubernetes 1.26

Cloud-edge communication solution updated

ACK Edge clusters that run Kubernetes 1.26 and later support network communication between the on-cloud node pools and edge node pools. Compared with the original solution, the updated solution provides high availability, auto scaling, and cloud-edge container O&M. Raven provides the proxy mode and tunnel mode for cloud-edge communication. The proxy mode allows cross-domain HTTP communication among hosts, and the tunnel mode allows cross-domain communication among containers.

ACK One

Access to the GitOps console through a custom domain name

To access the GitOps console of Distributed Cloud Container Platform for Kubernetes (ACK One) through a custom domain name, you can create a CNAME record to map the custom domain name to the default domain name of GitOps, and configure an SSL certificate. Then, you can use a CloudSSO account to access the GitOps console through https://${your-domain}.

All regions

Access the GitOps console through a custom domain name

Disaster recovery architectures and solutions based on Kubernetes clusters

This practice combines Kubernetes clusters (including Container Service for Kubernetes clusters, clusters on third-party cloud platforms, and clusters in data centers) with networking, database, middleware, and observability cloud services of Alibaba Cloud to help you design disaster recovery architectures and solutions. This allows you to build a more resilient business system.

All regions

Disaster recovery architectures and solutions based on Kubernetes container clusters

Historical releases

To view the historical release notes for ACK, see Historical release notes (before 2024).