This topic describes how to choose Elastic Compute Service (ECS) specifications when you create a Container Service for Kubernetes (ACK) cluster.
Cluster planning
If you choose small-sized ECS instances when you create an ACK cluster, the following issues may occur:
Network issues: Small-sized ECS instances can use only limited network resources.
Capacity issues: To ensure the stability and reliability of the cluster, the system reserves node resources, including CPUs, memory, and disks, to manage the cluster and run infrastructure components. Small-sized ECS instances may adversely affect the performance and availability of the cluster.
Resource fragment issues: When the system allocates node resources, if a container occupies a small-sized ECS instance, the remaining idle resources on the ECS instance cannot be used to create or restore containers. This results in resource waste. For example, a node can allocate only an integer number of CPUs but the application pods on the node require only a small amount of CPU resources. Consequently, the remaining CPU resources are wasted.
We recommend that you use large-sized ECS instances, which provide the following benefits:
Network benefits: For bandwidth-heavy applications, large bandwidth means high resource utilization. In addition, containers on the ECS instance can communicate with each other more easily, which reduces network transmission.
Image pulling benefits: Images are pulled more efficiently. To create multiple containers from an image, you need to pull the image only once. If you choose small-sized ECS instances, you need to pull the image multiple times. More time is required if you also need to scale out the number of ECS instances in order to create containers. As a result, the response latency of your application increases.
For more information about how to choose ECS specifications, see Choose ECS specifications to create master nodes and Choose ECS specifications to create worker nodes.
Choose ECS specifications to create master nodes
The master nodes of ACK clusters run core components such as etcd, kube-apiserver, and kube-controller. These components determine the stability of the ACK clusters. Therefore, you need to choose ECS specifications that best suit master nodes. The specifications of master nodes depend on the size of the ACK cluster. A larger ACK cluster requires higher specifications.
The size of the ACK cluster is determined based on multiple factors, such as the number of nodes, the number of pods, the deployment frequency, and the number of requests. In this topic, the cluster size is determined based on the number of nodes.
For individual tests and learning, we recommend that you use small-sized ECS instances. The following table describes the ECS specifications suggested for master nodes deployed in production environments. The suggested specifications can maintain the loads of master nodes at a low level.
Number of nodes | Suggested master node specification |
1 to 5 nodes | 4 vCPUs, 8 GB (the 2 vCPUs, 4 GB specification is not recommended) |
6 to 20 nodes | 4 vCPUs, 16 GB |
21 to 100 nodes | 8 vCPUs, 32 GB |
100 to 200 nodes | 16 vCPUs, 64 GB |
200 to 500 nodes (you need to estimate the blast radius) | 64 vCPUs, 128 GB |
Choose ECS specifications to create worker nodes
You must select instance types that provide at least 4 vCPUs and 8 GiB of memory.
Determine the specifications of worker nodes in an ACK cluster based on the total number of vCPUs used by regular workloads and the failure ratio tolerated by the ACK cluster.
For example, if the ACK cluster requires 160 vCPUs and can tolerate a failure ratio of 10%, we recommend that you select at least 10 ECS instances each of which provides 16 vCPUs. Make sure that no more than 144 (160 × 90%) vCPUs are used during peak hours. If the ACK cluster can tolerate a failure ratio of 20%, we recommend that you select at least five ECS instances each of which provides 32 vCPUs. Make sure that no more than 128 (160 × 80%) vCPUs are used during peak hours. This way, when an ECS instance is down, the remaining ECS instances can continue to serve your workloads and ensure high availability.
If the ACK cluster requires 1,000 vCPUs, you can choose ECS Bare Metal instances. For more information, see Use scenarios and benefits of ECS Bare Metal instances.
Determine the vCPU-to-memory ratio based on the resource requests of pods, such as 1:2 or 1:4. For memory-heavy applications such as Java applications, we recommend that you select the vCPU-to-memory ratio to 1:8.
Use scenarios and benefits of ECS Bare Metal instances
Use scenarios
The ACK cluster requires up to 1,000 vCPUs to serve workloads. Each ECS Bare Metal instance has at least 96 vCPUs. You can use 10 or 11 ECS Bare Metal instances to create an ACK cluster.
Scale out within a short period of time. In scenarios such as e-commerce promotional events, you can add ECS Bare Metal instances to your ACK cluster to handle traffic spikes. Each ECS Bare Metal instance can host multiple containers.
Benefits
Ultra-high network performance. Remote direct memory access (RDMA) is supported. The Terway plug-in can be used to maximize hardware performance. This allows containers to reach each other across hosts with bandwidth higher than 9 Gbit/s.
Optimal computing performance without jitters: ECS Bare Metal instances use chips that are developed by Alibaba Cloud to replace hypervisors. No resources are consumed to create or run virtual machines. This helps avoid resource contention.
High security: ECS Bare Metal instances support encryption at the physical layer and encryption based on Intel® Software Guard Extensions (Intel® SGX). This type of instance also provides trusted computing environments to support blockchain applications.
For more information, see Overview.