ECS specification recommendations for ACK clusters

0.0.201

To ensure the stability and reliability of the cluster, we recommend that you select appropriate Elastic Compute Service (ECS) specifications as cluster nodes. This topic describes the recommended configurations of ECS specifications when you create an ACK cluster.

Cluster planning

If you choose small-sized ECS instances when you create an ACK cluster, the following issues may occur:

Network issues: Small-sized ECS instances can use only limited network resources.
Capacity issues: To ensure the stability and reliability of the cluster, the system reserves node resources, including CPUs, memory, and disks, to manage the cluster and run infrastructure components. Small-sized ECS instances may adversely affect the performance and availability of the cluster. For more information about reservation policy of CPU and memory resources on nodes in ACK clusters, see Resource reservation policy.
Resource fragment issues: When the system allocates node resources, if a container occupies a small-sized ECS instance, the remaining idle resources on the ECS instance cannot be used to create or restore containers. This results in resource waste. For example, a node can allocate only an integer number of CPUs but the application pods on the node require only a small amount of CPU resources. Consequently, the remaining CPU resources are wasted.

We recommend that you use large-sized ECS instances, which provide the following benefits:

Network benefits: For bandwidth-heavy applications, large bandwidth means high resource utilization. In addition, containers on the ECS instance can communicate with each other more easily, which reduces network transmission.
Image pulling benefits: Images are pulled more efficiently. To create multiple containers from an image, you need to pull the image only once. If you choose small-sized ECS instances, you need to pull the image multiple times. More time is required if you also need to scale out the number of ECS instances in order to create containers. As a result, the response latency of your application increases.

For more information about how to choose ECS specifications, see the following content.

Choose ECS specifications to create worker nodes

The node specifications must be 4 vCPUs and 8 GB of memory or higher.
Determine the total vCPUs and availability requirements for the cluster's daily operations.
For example, if the ACK cluster requires 160 vCPUs and can tolerate a failure ratio of 10%, we recommend that you select at least 10 ECS instances with 16 vCPUs for each instance. This ensures that at least 144 vCPUs can be consumed during peak hours. This limit is based on the following calculation: 160 vCPUs × 90% = 144 vCPUs. If the ACK cluster supports a maximum failure rate of 20%, we recommend that you select at least five ECS instances with 32 vCPUs for each instance. This ensures that at least 128 vCPUs can be consumed during peak hours. This limit is based on the following calculation: 160 vCPUs × 80% = 128 vCPUs. This way, when an ECS instance is down, the remaining ECS instances can continue to serve your workloads and ensure high availability.
If the ACK cluster requires 1,000 vCPUs, you can choose ECS Bare Metal instances. For more information, see Use scenarios and benefits of ECS Bare Metal instances.
Determine the vCPU-to-memory ratio based on the resource requests of pods, such as 1:2 or 1:4. For applications that require high memory usage, such as Java applications, we recommend that you select the vCPU-to-memory ratio of 1:8.

Choose ECS specifications to create master nodes

The master nodes run core components such as etcd, kube-apiserver, and kube-controller when an ACK cluster is created. For ACK dedicated clusters that are deployed in the production environment, you must select an appropriate master node specification to avoid affecting the stability of the cluster. The specifications of master nodes depend on the size of the ACK cluster. A larger ACK cluster requires higher specifications.

Note

The size of the ACK cluster is determined based on multiple factors, such as the number of nodes, the number of pods, the deployment frequency, and the number of requests. In this topic, the cluster size is determined based on the number of nodes.

For individual tests and learning, we recommend that you use small-sized ECS instances. The following table describes the ECS specifications suggested for master nodes deployed in production environments. The suggested specifications can maintain the loads of master nodes at a low level.

Number of nodes	Suggested master node specification

Number of nodes	Suggested master node specification
1 to 5 nodes	4 vCPUs, 8 GB (the 2 vCPUs, 4 GB specification or less is not recommended)
6 to 20 nodes	4 vCPUs, 16 GB
21 to 100 nodes	8 vCPUs, 32 GB
100 to 200 nodes	16 vCPUs, 64 GB
200 to 500 nodes (you need to estimate the blast radius)	64 vCPUs, 128 GB

Use scenarios and benefits of ECS Bare Metal instances

ECS Bare Metal Instance is an innovative computing service developed by Alibaba Cloud based on state-of-the-art virtualization 2.0 technology. Virtualization 2.0 endows ECS bare metal instances with the elasticity of virtual machines (ECS instances), the performance and features of physical machines, and full support for nested virtualization.

ECS Bare Metal instance has advantages in exclusive computing resources, encrypted computing, and building new hybrid clouds. For more information about CS Bare Metal instances and the supported instance families, see Overview.

The usage scenarios of ECS Bare Metal instance include but are not limited to:

The ACK cluster requires up to 1,000 vCPUs to serve workloads. Each ECS Bare Metal instance has at least 96 vCPUs. You can use 10 or 11 ECS Bare Metal instances to create an ACK cluster in large-scale clusters scenarios.
Scale out within a short period of time. In scenarios such as e-commerce promotional events, the performance of ECS Bare Metal instances is superior to that of physical machines with the same configuration. ECS Bare Metal instances provide millions of vCPUs to handle traffic spikes.

Unsupported ECS specifications

General limits

To ensure the stability and security of clusters, you cannot use the instance specifications listed in the following table to create worker nodes or master nodes.

Unsupported instance family or instance family category	Unsupported instance specification	Description	Note

Unsupported instance family or instance family category	Unsupported instance specification	Description	Note
t5, burstable instance family	ecs.t5-lc2m1.nano	The performance of the instance is unstable, which may cause cluster instability.	None
t6, burstable instance family	ecs.t6-c4m1.large	The performance of the instance is unstable, which may cause cluster instability.	None
Specifications of instances with fewer than 4 vCPU cores	ecs.g6.large	The performance of the instance is low, which may cause cluster instability.	You can go to the Quota Center console to create clusters and node pools that support low-specification ECS instances.
c6t, security-enhanced compute-optimized instance family	ecs.c6t.large	Not supported	None
g6t, security-enhanced general-purpose instance family	ecs.g6t.large	Not supported	None
Super Computing Cluster (SCC) instance families	ecs.sccg7.32xlarge	Not supported	None

Note

For more information about the GPU-accelerated ECS instance types supported by ACK, see GPU-accelerated ECS instance types supported by ACK.

Limits on the Terway network plug-in

In Terway mode, the maximum number of pods on a node is calculated based on the number of elastic network interfaces (ENIs) provided by the ECS instance. Different Terway modes support different ECS specifications. For more information, see Work with Terway.

Shared ENI mode or Shared ENI + Trunk ENI: The upper limit of pods on a node must be greater than 11. (EniQuantity - 1) × EniPrivateIpAddressQuantity > 11. EniQuantity is the number of ENIs provided by an ECS instance type, and EniPrivateIpAddressQuantity is the number of private IP addresses provided by an ENI.
For example, the ecs.g6.large instance provides 2 ENIs, and each ENI provides 6 private IPv4 addresses. The maximum number of pods on a node is (2 - 1) × 6 = 6. The ecs.g6.large instance cannot be used in shared ENI mode and shared ENI + Trunk ENI mode.
Exclusive ENI: The upper limit of pods on a node must be greater than 6. EniQuantity - 1 > 6.
For example, the ecs.g6.xlarge instance provides 3 ENIs. The maximum number of pods on a node is (3 - 1) = 2. The ecs.g6.xlarge instance cannot be used in exclusive ENI mode.