The cloud-native AI suite is a Container Service for Kubernetes (ACK) solution that leverages cloud-native AI technologies and products. The cloud-native AI suite can help you fully utilize cloud-native architectures and technologies to quickly develop an AI-assisted production system in ACK. It also provides full-stack optimization for AI or machine learning applications and systems. ACK Edge clusters retain the full functionality of the AI suite available in cloud environments, while certain features may be limited in edge environments. This topic describes the capabilities and usage limits of the AI suite on ACK Edge clusters across different nodes and network types.
Usage limits
Item | Limit |
AI suite components | Be aware of the usage limits for the specific components that you use, including the cluster version and NVIDIA driver version. For more information, see Component introduction and release notes. |
ACK Edge clusters | Only specific operating systems and GPU models on edge nodes are supported. For more information, see Add an edge node. |
Capability overview
ACK Edge clusters and ACK Pro clusters have two differences:
Network connectivity: ACK Pro clusters require all nodes within the cluster to be in the same virtual private cloud (VPC) and connected, while ACK Edge clusters have a more complex network configuration based on node pools. AI suite capabilities may vary under different network conditions.
On-cloud node pool: The network configuration for the on-cloud node pool is the same as that of the ACK Pro clusters. It manages connected Elastic Compute Service (ECS) nodes within the same VPC.
Dedicated edge node pool: The dedicated edge node pool manages edge nodes connected to the cloud over the Express Connect circuit. It achieves network connections between data centers and the cloud.
Basic edge node pool: The basic edge node pool manages edge nodes connected over the Internet. The network connectivity between edge nodes cannot be ensured.
Node environment: ACK Edge clusters are mainly used to manage your on-premises resources. In comparison to ECS instances, the node environment is more complex and includes information such as GPU model, GPU driver, and OS version. Additionally, GPU memory isolation is not supported.
AI Suite capability | Corresponding component name | Cloud environment | Edge environment | References | |
On-cloud node pool | Dedicated edge node pool | Basic edge node pool | |||
Elasticity | ack-alibaba-cloud-metrics-adapter | Supported | Supported | Supported | |
Acceleration | Supported | Supported | Supported | ||
Scheduling (batch task scheduling, GPU sharing, and GPU topology awareness) | Supported | Supported except GPU memory isolation | Supported except GPU memory isolation | ||
Scheduling (task queue) | Supported | Supported | Supported | ||
Interaction mode (Arena) | Supported | Supported | Supported | ||
Interaction mode (console) | ack-ai-dashboard ack-mysql | Supported | Supported | Supported | |
Workflow | Supported | Supported | Supported | ||
Monitoring | ack-arena-exporter | Supported | Supported | Supported |
In edge node pools, the acceleration capability of the AI suite can only be used in edge node pools with network connectivity between nodes.
Usage method
Based on the cloud-edge architecture of ACK Edge clusters, we recommend that you manage different resources through node pools when using the AI suite.
Management node pool: the on-cloud node pool used for deploying the management components of the AI suite.
This type of node pool does not need GPU resources.
By default, the on-cloud node pool default-nodepool automatically created by ACK Edge clusters is used as the management node pool.
To utilize all features of the AI suite, the node pool must be scaled out to at least 4 nodes to ensure sufficient resources for proper component operations. For more information, see Create and scale out a node pool.
Elastic node pool: the on-cloud node pool with auto scaling enabled.
For elastic inference, you can use this type of node pool to achieve dynamic server scaling based on your business requirements.
Edge node pool: manage different types of nodes in data centers.
We recommend that you use edge node pools to manage related nodes based on their properties. For example, you can categorize them into AMD node pools and ARM-based node pools according to CPU architecture, or into node pools that use Express Connect circuits and node pools for the Internet based on network conditions.