With the cloud-native AI suite of Container Service for Kubernetes (ACK), you can easily and efficiently run AI jobs in your ACK clusters. First, with foundational capabilities such as the Arena CLI and AI workload scheduling, you can perform model training, testing, and performance analysis. Then, through elastic dataset acceleration and heterogeneous GPU resource management, you can deploy model inference services. This topic describes the AI jobs that are supported by the cloud-native AI suite and provides links to the relevant references.
The following table describes the AI jobs that are supported by the cloud-native AI suite.
AI job | Description | References |
Model training | You can use Arena to submit various types of training jobs, such as standalone training, distributed training, and elastic training jobs. |
|
Model management | You can manage and associate training jobs and the models generated by training jobs. | |
Model analysis and optimization | Before deploying a model as a service, you can use Arena to submit model performance analysis and optimization jobs to ensure the model meets the business requirements. |