This topic describes the prerequisites, limits, and core functions of ACS pod. The core functions includes security isolation, the configuration of CPU/Memory/GPU resources and specification, image pulling, storage, network, and log collection.
Compute classes
ACS currently provides three compute classes. The general computing provides two classes, and the heterogeneous computing provides one class. Different compute classes have different resources for different business scenarios.
Compute classes | Label | Features |
General-purpose (default) | general-purpose | This class meets the needs of most stateless microservices applications, Java Web applications, and compute tasks. |
Performance | performance | This class meets the needs of high-performance business scenarios, such as CPU-based AI/machine learning (ML) training and inference, HPC batch processing. |
GPU | gpu | This class meets the needs of heterogeneous computing scenarios such as AI/High Performance Computing (HPC), GPU single-card and multi-card inference, and GPU parallel computing. |
You can specify the compute class of an instance by using the alibabacloud.com/compute-class
label on the pod. The following example of an Nginx application, which specifies the compute class as general-purpose
.
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
replicas: 2
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
alibabacloud.com/compute-class: general-purpose
spec:
containers:
- name: nginx
image: registry.cn-hangzhou.aliyuncs.com/acs-sample/nginx:latest
Compute QoS
ACS currently provides two types of compute Quality of Service (QoS). Different compute QoS have different resource for different business scenarios.
Compute quality | Label | Features | Typical application scenarios |
Default | default |
|
|
BestEffort | best-effort |
|
|
You can specify the compute quality of an instance by using the alibabacloud.com/compute-qos
label on the pod. The following example of an Nginx application, which specifies the compute quality as default
.
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
replicas: 2
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
alibabacloud.com/compute-qos: default
spec:
containers:
- name: nginx
image: registry.cn-hangzhou.aliyuncs.com/acs-sample/nginx:latest
The definitions of compute QoS for ACS are different from Kubernetes native QoS types. Currently, the compute QoS of the default class corresponds to a QoS class of Guaranteed in Kubernetes (K8s) native.
Correspondence between compute classes and compute QoS
Compute classes (label) | Supported compute QoS (label) |
General-purpose (general-purpose) | Default (default), BestEffort (best-effort). |
Performance (performance) | Default (default). |
GPU (gpu) | Default (default). |
High-performance network GPU (gpu-hpn) | Default (default). |
K8s application limits
ACS seamlessly integrates with K8s through virtual nodes. ACS pod instances do not run on a centralized real node but are distributed across Alibaba Cloud's resource pool. ACS currently does not support some K8s features such as HostPath and DaemonSet due to the security of public cloud and the limitations of virtual nodes. The specific limitations are shown in the table below.
Limit | Description | Handling strategy when validation fails | Recommended alternative |
DaemonSet | Limitations of using DaemonSet workloads. | Pod runs but cannot work normally. | Deploy multiple containers in the pod in the form of a Sidecar. |
HostPath | Limitations of mounting local host files to containers. | Submission rejected. | Use emptyDir, disk, or NAS file system. |
type=NodePort Service | Limitations of mapping host ports to containers. | Submission rejected. | Use load balancer with |
HostNetwork | Limitations of mapping host ports to containers. | Rewrite as | No need to use. |
HostIPC | Limitations of communicating between container processes and host processes. | Rewrite as | No need to use. |
HostPID | Limitations of the container visibility to host PID space. | Rewrite as | No need to use. |
HostUsers | Limitations of using user namespaces. | Rewrite as | No need to use. |
Linux capabilities | Limitations of using Linux system privileges (securityContext.capabilities), submissions outside the allowed values are rejected. Note Supported privileges:
| Submission rejected. | Use allowed values. |
Sysctl | Limitations of using kernel parameters (securityContext.sysctls). Note
| Submission rejected. | Use allowed values. |
PrivilegeEscalation | Limitations of container privilege escalation (securityContext.allowPrivilegeEscalation). | Submission rejected. | Use default configurations. |
Privileged Container | Limitations of containers with privileged permissions. | Submission rejected. | Use Security Context to add allowed capabilities or sysctls to the pod. |
ImagePullPolicy | Limitations of the image download policy. | Rewrite as | Use allowed values. |
DNSPolicy | Limitations of using specific DNSPolicy. Note
|
| Use allowed values. |
Core functions
Function | Description |
Security isolation | As a secure and reliable serverless container runtime environment, each ACS pod instance is completely isolated at the underlying level through lightweight security sandbox technology to ensure that instances do not affect each other. Additionally, instances are distributed across different physical machines during scheduling to further ensure high availability. |
CPU/Memory/GPU/EphemeralStorage resources or specification configurations |
|
Image | By default, ACS pod uses the associated VPC to pull container images from remote locations after it starts. If the image is a public image, you need to enable the NAT Gateway of the VPC. We recommended that you can store container images in Alibaba Cloud's image repository (ACR) to reduce image pulling time through the VPC network. Additionally, for private images on ACR, ACS provides a Pull images from a Container Registry instance without using Secrets feature for your convenience. |
Storage | ACS supports two types of persistent storage: disk and NAS.
|
Network | By default, ACS pod uses an independent pod IP, occupying an elastic network interface card of the vSwitch. In a ACS cluster environment, ACS pods can communicate with each other in the following ways:
|
Log collection | You can directly configure environment variables for pods to collect |
Resource specifications
General-Purpose and Performance compute classes
vCPU | Memory (GiB) | Supported Memory Stride (GiB) | Theoretical upper limit of network bandwidth (in+out) (Gbits/s) | Storage |
0.25 | 0.5, 1, 2 | N/A | 0.08 | 30~512 GiB, additional storage space can be expanded by mounting Network Attached Storage (NAS) and other storage volumes. |
0.5 | 1 ~ 4 | N/A | 0.08 | |
1 | 1 ~ 8 | N/A | 0.1 | |
2 | 2 ~ 16 | 2 | 1 | |
4 | 4 ~ 32 | 2 | 1.5 | |
6 | 6 ~ 48 | 2 | 1.5 | |
8 | 8 ~ 64 | 2 | 2.5 | |
12 | 12 ~ 96 | 1 | 2.5 | |
16 | 16 ~ 128 | 1 | 3 |
If no specification is specified, the default resource for a single pod is 0.25 vCPU and 0.5 GiB memory.
ACS automatically standardizes specifications that are not supported. After standardization, the resources.requests
of the container do not change, but the pod specification is marked through alibabacloud.com/pod-use-spec
. When the resource limit specified by the container (resources.limits
) exceeds the pod specification, ACS sets the resource limits for containers based on the pod specifications.
ACS standardization logic: If the total resources of all containers add up to 2 vCPU and 3.5 GiB memory, ACS automatically standardizes the pod to 2 vCPU and 4 GiB memory. The adjusted extra resources will be applied to the first container. The pod will be marked with alibabacloud.com/pod-use-spec=2-4Gi
. If a single container in the pod specifies a resource limit of 3 vCPU and 5 GiB memory, resource limits of the container is enforced to 2 vCPU and 4 GiB.
The following example shows a resource declaration:
apiVersion: apps/v1
kind: Deployment
...
template:
metadata:
labels:
app: nginx
alibabacloud.com/compute-class: general-purpose
alibabacloud.com/compute-qos: default
spec:
containers:
- name: nginx
resources:
requests:
cpu: 2 # Declare the CPU as 2 vCPUs.
memory: "4Gi" # Declare the memory as 4 GiB.
ephemeral-storage: "30Gi" # Declare the storage space as 30 GiB.
GPU compute class
GPU | vCPU | Memory (GiB) | Supported Memory Stride (GiB) | Theoretical upper limit of network bandwidth (in+out) (Gbits/s) | Storage |
1 | 2 | 2~16 | 1 | 2 | 30~500 GiB, additional storage space can be expanded by mounting NAS and other storage volumes. |
4 | 4~32 | 1 | 4 | ||
6 | 6~48 | 1 | 6 | ||
8 | 8~64 | 1 | 8 | ||
10 | 10~80 | 1 | 10 | ||
12 | 12~96 | 1 | 12 | ||
14 | 14~112 | 1 | 14 | ||
16 | 16~128 | 1 | 16 | ||
2 | 16 | 16~128 | 1 | 16 | |
32 | 32, 64, 128, 230 | N/A | 32 | ||
4 | 32 | 32, 64, 128, 256 | N/A | 32 | |
64 | 64, 128, 256, 460 | N/A | 64 | ||
8 | 64 | 64, 128, 256, 512 | N/A | 64 | |
128 | 128, 256, 512, 920 | N/A | 100 |
If no specification is specified, the GPU container pod selects the smallest specification pod based on the GPU type (as shown in the above table, the smallest specification is 2 vCPUs, 2 GiB of memory, and 1 GPU).
ACS automatically standardizes specifications that are not supported. After standardization, the resources.requests
of the container do not change, but the pod specifications are disclosed through the annotationalibabacloud.com/pod-use-spec
. When the resource limit specified by the container (resources.limits
) exceeds the pod specification, ACS sets the resource limits for containers based on the pod specifications.
Standardization logic of CPU and Memory: If the total resources of all containers add up to 2 vCPU and 3.5 GiB memory, ACS automatically standardizes the pod to 2 vCPU and 4 GiB memory. The adjusted extra resources are applied to the first container. The pod is disclosed through the annotation
alibabacloud.com/pod-use-spec=2-4Gi
. If a single container in the pod specifies a resource limit of 3 vCPU and 5 GiB memory, resource limits of the container is enforced to 2 vCPU and 5 GiB.GPU standardization logic: When the number of GPU requested by a pod is not listed in the table, the submission of the pod fails.
Port usage description
The following table shows the ports usage by ACS. Please avoid using the following ports when deploying services.
Port | Description |
111, 10250, 10255 | Ports used by the ACS cluster, used by exec, logs, metrics, and other interfaces. |