All Products
Search
Document Center

Container Service for Kubernetes:ack-arena

最終更新日:Jun 06, 2024

The ack-arena component is a collection of lifecycle management tools for AI jobs provided by the cloud-native AI suite. The component abstracts and standardizes key components throughout AI production, which reduces the complexity of underlying resource and environment management and simplifies the procedure for submitting and running AI jobs. This topic describes the basic information, usage notes, and release notes of ack-arena.

Introduction

The cloud-native AI suite provides an abstraction of data preparation and management, model development, model training, model evaluation, model inference services, and online O&M. Arena is a command-line tool that can help you manage these key components in AI DevOps. Arena simplifies the management of underlying resources and environments, job scheduling, and GPU allocation and monitoring. Arena is compatible with mainstream AI frameworks and tools, including TensorFlow, PyTorch, Horovod, Spark, JupyterLab, TF-Serving, and Triton. Arena also provides SDKs for Golang, Java, and Python.

ack-arena is optimized to simplify operations in open source Arena. You can install ack-arena in the Container Service for Kubernetes (ACK) console with a few clicks.

Usage notes

The ack-arena component can be installed only in ACK Pro clusters, ACK Serverless Pro clusters, and ACK Edge Pro clusters. The Kubernetes versions of the clusters must be 1.18 or later. For more information about how to install and use the ack-arena component, see Configure the Arena client.

Release notes

April 2024

Version

Image address

Description

Release date

Impact

0.9.14

registry.cn-hangzhou.aliyuncs.com/acs/arena-deploy-manager:0.9.14-adb43b8

The model management feature is supported.

2024-04-11

No impact on workloads.

March 2024

Version

Image address

Description

Release date

Impact

0.9.13

registry.cn-hangzhou.aliyuncs.com/acs/arena-deploy-manager:0.9.13-5ac396c

  • The backend parameter is added to the Triton inference service.

  • The directory mounted to a KServe inference service can be updated.

2024-03-18

No impact on workloads.

February 2024

Version

Image address

Description

Release date

Impact

0.9.12

registry.cn-hangzhou.aliyuncs.com/acs/arena-deploy-manager:0.9.12-a707f81

  • The base image of the Triton Inference Server is updated.

  • The component is compatible with the training-operator custom resource definition (CRD).

2024-02-04

No impact on workloads.

November 2023

Version

Image address

Description

Release date

Impact

0.9.11

registry.cn-hangzhou.aliyuncs.com/acs/arena-deploy-manager:0.9.11-ce87d10

  • KServe inference services can be deployed.

  • The livenessProbe and readinessProbe parameters can be configured for an inference service.

2023-11-17

No impact on workloads.

August 2023

Version

Image address

Description

Release date

Impact

0.9.10

registry.cn-hangzhou.aliyuncs.com/acs/arena-deploy-manager:0.9.10-4b5c18c

  • An SSH secret can be created when an elastic or DeepSpeed training job is submitted.

  • By default, permissions to the et-operator Secret are removed and can be manually granted.

2023-08-02

No impact on workloads.

June 2023

Version

Image address

Description

Release date

Impact

0.9.9

registry.cn-beijing.aliyuncs.com/acs/arena-deploy-manager:0.9.9-ce4a78d

  • DeepSpeed is added to support the submission of DeepSpeed distributed training jobs.

  • The imgePullPolicy parameter is supported.

2023-06-29

No impact on workloads.

May 2023

Version

Image address

Description

Release date

Impact

0.9.8

registry.cn-hangzhou.aliyuncs.com/acs/arena-deploy-manager:0.9.7-d51fe2e

  • SDKs can be used to specify the cleanup time for jobs that are completed.

  • Role-Based Access Control (RBAC) permissions are limited.

2023-05-23

No impact on workloads.

April 2023

Version

Image address

Description

Release date

Impact

0.9.7

registry.cn-hangzhou.aliyuncs.com/acs/arena-deploy-manager:0.9.7-d51fe2e

The completion time of scheduled jobs can be specified.

2023-04-11

No impact on workloads.

0.9.6

registry.cn-hangzhou.aliyuncs.com/acs/arena-deploy-manager:0.9.6-b3c2c7f

  • The et-operator image is updated.

  • The ownerreference parameter can be configured when you submit a TensorFlow or PyTorch training job.

2023-04-04

No impact on workloads.

March 2023

Version

Image address

Description

Release date

Impact

0.9.5

registry.cn-hangzhou.aliyuncs.com/acs/arena-deploy-manager:0.9.5-c3948e2

  • The running-timeout, starting-timeout, and ttl-after-finished parameters can be configured when you submit a TensorFlow training job by using Arena.

  • The running-timeout and ttl-after-finished parameters can be configured when you submit a PyTorch training job by using Arena.

  • jobsupervisor charts are supported.

  • SDK for Java is updated to 1.0.4.

  • The issue that the gang pod label is not standardized is fixed.

  • The images of tf-operator, pytorch-operator, and et-operator are updated.

2023-03-16

No impact on workloads.