All Products
Search
Document Center

:ACS pod overview

Last Updated:Oct 31, 2024

This topic describes the prerequisites, limits, and core functions of ACS pod. The core functions includes security isolation, the configuration of CPU/Memory/GPU resources and specification, image pulling, storage, network, and log collection.

Compute classes

ACS currently provides three compute classes. The general computing provides two classes, and the heterogeneous computing provides one class. Different compute classes have different resources for different business scenarios.

Compute classes

Label

Features

General-purpose (default)

general-purpose

This class meets the needs of most stateless microservices applications, Java Web applications, and compute tasks.

Performance

performance

This class meets the needs of high-performance business scenarios, such as CPU-based AI/machine learning (ML) training and inference, HPC batch processing.

GPU

gpu

This class meets the needs of heterogeneous computing scenarios such as AI/High Performance Computing (HPC), GPU single-card and multi-card inference, and GPU parallel computing.

You can specify the compute class of an instance by using the alibabacloud.com/compute-class label on the pod. The following example of an Nginx application, which specifies the compute class as general-purpose.

apiVersion: apps/v1 
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
        alibabacloud.com/compute-class: general-purpose 
    spec:
      containers:
      - name: nginx
        image: registry.cn-hangzhou.aliyuncs.com/acs-sample/nginx:latest

Compute QoS

ACS currently provides two types of compute Quality of Service (QoS). Different compute QoS have different resource for different business scenarios.

Compute quality

Label

Features

Typical application scenarios

Default

default

  • Some compute disturbance.

  • No forced instance eviction, instance faults are handled by hot migration or user-triggered eviction.

  • Microservices applications.

  • Web applications

  • Compute tasks.

BestEffort

best-effort

  • Some compute disturbance.

  • Forced instance preemption and eviction, with an event notification 5 minutes before eviction.

  • Big data computing.

  • Audio and video transcoding.

  • Batch processing tasks.

You can specify the compute quality of an instance by using the alibabacloud.com/compute-qos label on the pod. The following example of an Nginx application, which specifies the compute quality as default.

apiVersion: apps/v1 
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
        alibabacloud.com/compute-qos: default
    spec:
      containers:
      - name: nginx
        image: registry.cn-hangzhou.aliyuncs.com/acs-sample/nginx:latest 
Note

The definitions of compute QoS for ACS are different from Kubernetes native QoS types. Currently, the compute QoS of the default class corresponds to a QoS class of Guaranteed in Kubernetes (K8s) native.

Correspondence between compute classes and compute QoS

Compute classes (label)

Supported compute QoS (label)

General-purpose (general-purpose)

Default (default), BestEffort (best-effort).

Performance (performance)

Default (default).

GPU (gpu)

Default (default).

High-performance network GPU (gpu-hpn)

Default (default).

K8s application limits

ACS seamlessly integrates with K8s through virtual nodes. ACS pod instances do not run on a centralized real node but are distributed across Alibaba Cloud's resource pool. ACS currently does not support some K8s features such as HostPath and DaemonSet due to the security of public cloud and the limitations of virtual nodes. The specific limitations are shown in the table below.

Limit

Description

Handling strategy when validation fails

Recommended alternative

DaemonSet

Limitations of using DaemonSet workloads.

Pod runs but cannot work normally.

Deploy multiple containers in the pod in the form of a Sidecar.

HostPath

Limitations of mounting local host files to containers.

Submission rejected.

Use emptyDir, disk, or NAS file system.

type=NodePort Service

Limitations of mapping host ports to containers.

Submission rejected.

Use load balancer with type=LoadBalancer.

HostNetwork

Limitations of mapping host ports to containers.

Rewrite as HostNetwork=false.

No need to use.

HostIPC

Limitations of communicating between container processes and host processes.

Rewrite as HostIPC=false.

No need to use.

HostPID

Limitations of the container visibility to host PID space.

Rewrite as HostPID=false.

No need to use.

HostUsers

Limitations of using user namespaces.

Rewrite as HostUsers=true.

No need to use.

Linux capabilities

Limitations of using Linux system privileges (securityContext.capabilities), submissions outside the allowed values are rejected.

Note

Supported privileges:

  • CHOWN

  • DAC_OVERRIDE

  • FOWNER

  • FSETID

  • KILL

  • SETGID

  • SETUID

  • SETPCAP

  • NET_BIND_SERVICE

  • NET_RAW

  • NET_ADMIN

  • SYS_CHROOT

  • MKNOD

  • AUDIT_WRITE

  • SETFCAP

Submission rejected.

Use allowed values.

Sysctl

Limitations of using kernel parameters (securityContext.sysctls).

Note
  • kernel.shm*

    • (kernel.shm_rmid_forced not allowed)

  • kernel.msg*

  • kernel.sem*

  • fs.mqueue.*

  • net.*

    • (net.ipv4.tcp_syncookies not allowed)

Submission rejected.

Use allowed values.

PrivilegeEscalation

Limitations of container privilege escalation (securityContext.allowPrivilegeEscalation).

Submission rejected.

Use default configurations.

Privileged Container

Limitations of containers with privileged permissions.

Submission rejected.

Use Security Context to add allowed capabilities or sysctls to the pod.

ImagePullPolicy

Limitations of the image download policy.

Rewrite as ImagePullPolicy=Always.

Use allowed values.

DNSPolicy

Limitations of using specific DNSPolicy.

Note
  • None

  • Default

  • ClusterFirst

  • Configuration of ClusterFirstWithHostNet is rewritten as ClusterFirst.

  • Other policies are rejected.

Use allowed values.

Core functions

Function

Description

Security isolation

As a secure and reliable serverless container runtime environment, each ACS pod instance is completely isolated at the underlying level through lightweight security sandbox technology to ensure that instances do not affect each other. Additionally, instances are distributed across different physical machines during scheduling to further ensure high availability.

CPU/Memory/GPU/EphemeralStorage resources or specification configurations

  • Specify CPU, Memory, EphemeralStorage and GPU of the container: You can configure the reserved CPU, Memory, EphemeralStorage and GPU for a single container through the standard method of K8s (resources.requests). The resources of an ACS pod are the total resources required by all containers in the pod. ACS can automatically standardize the resource specifications of the pod.

  • Specify CPU, Memory, EphemeralStorage and GPU limits of the container: You can limit the CPU, Memory, EphemeralStorage and GPU of a single container through the standard method of K8s (resources.limits). If not specified, the default resource limit for a single container is the total reserved resources of all containers in the standardized pod.

Image

By default, ACS pod uses the associated VPC to pull container images from remote locations after it starts. If the image is a public image, you need to enable the NAT Gateway of the VPC. We recommended that you can store container images in Alibaba Cloud's image repository (ACR) to reduce image pulling time through the VPC network. Additionally, for private images on ACR, ACS provides a Pull images from a Container Registry instance without using Secrets feature for your convenience.

Storage

ACS supports two types of persistent storage: disk and NAS.

Network

By default, ACS pod uses an independent pod IP, occupying an elastic network interface card of the vSwitch.

In a ACS cluster environment, ACS pods can communicate with each other in the following ways:

Log collection

You can directly configure environment variables for pods to collect stdout or file logs and collect them into Alibaba Cloud's Simple Log Service (SLS).

Resource specifications

General-Purpose and Performance compute classes

vCPU

Memory (GiB)

Supported Memory Stride (GiB)

Theoretical upper limit of network bandwidth (in+out) (Gbits/s)

Storage

0.25

0.5, 1, 2

N/A

0.08

30~512 GiB,

additional storage space can be expanded by mounting Network Attached Storage (NAS) and other storage volumes.

0.5

1 ~ 4

N/A

0.08

1

1 ~ 8

N/A

0.1

2

2 ~ 16

2

1

4

4 ~ 32

2

1.5

6

6 ~ 48

2

1.5

8

8 ~ 64

2

2.5

12

12 ~ 96

1

2.5

16

16 ~ 128

1

3

If no specification is specified, the default resource for a single pod is 0.25 vCPU and 0.5 GiB memory.

ACS automatically standardizes specifications that are not supported. After standardization, the resources.requests of the container do not change, but the pod specification is marked through alibabacloud.com/pod-use-spec. When the resource limit specified by the container (resources.limits) exceeds the pod specification, ACS sets the resource limits for containers based on the pod specifications.

Note

ACS standardization logic: If the total resources of all containers add up to 2 vCPU and 3.5 GiB memory, ACS automatically standardizes the pod to 2 vCPU and 4 GiB memory. The adjusted extra resources will be applied to the first container. The pod will be marked with alibabacloud.com/pod-use-spec=2-4Gi. If a single container in the pod specifies a resource limit of 3 vCPU and 5 GiB memory, resource limits of the container is enforced to 2 vCPU and 4 GiB.

The following example shows a resource declaration:

apiVersion: apps/v1 
kind: Deployment
...
  template:
    metadata:
      labels:
        app: nginx
        alibabacloud.com/compute-class: general-purpose
        alibabacloud.com/compute-qos: default
    spec:
      containers:
      - name: nginx
        resources:
          requests:
            cpu: 2 # Declare the CPU as 2 vCPUs.
            memory: "4Gi" # Declare the memory as 4 GiB.
            ephemeral-storage: "30Gi" # Declare the storage space as 30 GiB.

GPU compute class

GPU

vCPU

Memory (GiB)

Supported Memory Stride (GiB)

Theoretical upper limit of network bandwidth (in+out) (Gbits/s)

Storage

1

2

2~16

1

2

30~500 GiB,

additional storage space can be expanded by mounting NAS and other storage volumes.

4

4~32

1

4

6

6~48

1

6

8

8~64

1

8

10

10~80

1

10

12

12~96

1

12

14

14~112

1

14

16

16~128

1

16

2

16

16~128

1

16

32

32, 64, 128, 230

N/A

32

4

32

32, 64, 128, 256

N/A

32

64

64, 128, 256, 460

N/A

64

8

64

64, 128, 256, 512

N/A

64

128

128, 256, 512, 920

N/A

100

If no specification is specified, the GPU container pod selects the smallest specification pod based on the GPU type (as shown in the above table, the smallest specification is 2 vCPUs, 2 GiB of memory, and 1 GPU).

ACS automatically standardizes specifications that are not supported. After standardization, the resources.requests of the container do not change, but the pod specifications are disclosed through the annotationalibabacloud.com/pod-use-spec. When the resource limit specified by the container (resources.limits) exceeds the pod specification, ACS sets the resource limits for containers based on the pod specifications.

Note
  • Standardization logic of CPU and Memory: If the total resources of all containers add up to 2 vCPU and 3.5 GiB memory, ACS automatically standardizes the pod to 2 vCPU and 4 GiB memory. The adjusted extra resources are applied to the first container. The pod is disclosed through the annotationalibabacloud.com/pod-use-spec=2-4Gi. If a single container in the pod specifies a resource limit of 3 vCPU and 5 GiB memory, resource limits of the container is enforced to 2 vCPU and 5 GiB.

  • GPU standardization logic: When the number of GPU requested by a pod is not listed in the table, the submission of the pod fails.

Port usage description

The following table shows the ports usage by ACS. Please avoid using the following ports when deploying services.

Port

Description

111, 10250, 10255

Ports used by the ACS cluster, used by exec, logs, metrics, and other interfaces.