The performance and availability of Container Service for Kubernetes (ACK) clusters depend on the amount of cluster resources, resource access frequency, and access mode. The loads and performance of the API server may also vary based on different variations. A large ACK Pro cluster usually contains more than 500 nodes or more than 10,000 pods. The cluster administrator needs to plan and use large ACK clusters based on the actual business scenario and pay close attention to monitoring metrics to ensure the stability and availability of the clusters.
Usage notes for large ACK clusters
Compared with using multiple clusters, using a single large cluster can efficiently simplify cluster O&M and improve resource utilization. In complex business scenarios, we recommend that you split your service into multiple clusters by business logic or service demand. For example, you can split a service into non-production (testing) services and production (development) services or decouple database services from front-end applications.
We recommend that you use multiple clusters instead of creating a large cluster if you have the following requirements.
Requirement | Description |
Isolation | You can use multiple clusters to isolate the testing environment from the production environment in case all of your businesses are interrupted when the cluster where your businesses are deployed is down. This reduces the impacts of single points of failure. |
Location | Some services must be deployed in a location that is close to the end users to improve service availability and reduce the response latency. In this scenario, we recommend that you deploy multiple clusters across regions. |
Cluster size | ACK managed control planes automatically adapt to the cluster size through auto scaling and cluster component optimization. However, the Kubernetes architecture has a performance bottleneck. The availability and performance of ultra-large clusters are not guaranteed. Before you use large clusters, read the Kubernetes scalability thresholds and Kubernetes scalability and performance SLIs/SLOs defined by the Kubernetes community and log on to the Quota Center console to view and increase quotas related to Container Service for Kubernetes. If your businesses exceed the limits of Kubernetes and ACK, split your businesses into multiple clusters. |
If you require multi-cluster management, such as application deployment, traffic management, job distribution, and global monitoring, we recommend that you enable Multi-cluster Fleets.
About this topic
This topic is intended for the developers and administrators of ACK Pro clusters. This topic lists suggestions on planning and using large ACK clusters. Your actual cluster environment and business requirements shall prevail.
The shared responsibility model defines that ACK clusters are responsible for the default security of control plane components (including Kubernetes control plane components and etcd) and Alibaba Cloud infrastructure related to cluster Services. You are responsible for the security of applications deployed on the cloud and the security configuration and updates of cloud resources. For more information, see Shared responsibility model.
Use new version clusters
As new Kubernetes versions are released, the list of Kubernetes versions supported by ACK also changes. New Kubernetes versions will be added to the list and outdated Kubernetes versions will be discontinued. ACK stops releasing new features, feature patches, or security patches for outdated Kubernetes versions. ACK only provides limited technical support for these versions.
You can learn information about new updates through documentation, console information, and internal messages, and read update notes for the desired Kubernetes version before you update your clusters. This helps you update your clusters at the earliest opportunity to mitigate security risks and fix stability issues. For more information about cluster updates, see Manually upgrade ACK clusters and Automatically upgrade a cluster. For more information about Kubernetes versions supported by ACK, see Support for Kubernetes versions.
Pay attention to cluster resource limits
The following table describes the limits for ensuring the availability, stability, and performance of large ACK clusters and the corresponding solutions.
Limit | Description | Suggested solution |
Maximum etcd size (DB size) | The maximum size of the etcd is 8 GB. When the etcd is excessively large, its performance is compromised, including the read and write latency, system resource usage, and election latency. Consequently, service and data restoration becomes difficult and time-consuming. | Make sure that the etcd size is smaller than 8 GB.
|
Total size of each resource type in the etcd | If large numbers of resource objects exist, an excessive amount of system resources is consumed when a client accesses all resource objects. This may even cause the initialization of the API server or custom controller to fail. | Limit the total size of each resource type to less than 800 MB.
|
Connections and bandwidth of the CLB instance used by an API server | Only Classic Load Balancer (CLB) instances are supported by API servers in ACK clusters. The maximum number of connections and bandwidth supported by a CLB instance are limited. For more information about the maximum number of connections supported by a CLB instance, see Overview of CLB instances. The maximum bandwidth of a CLB instance is 5,120 Mbit/s. When the connection or bandwidth limit of the CLB instance is exceeded, nodes enter the Not Ready state. | If your cluster contains more than 1,000 nodes, we recommend that you use pay-as-you-go CLB instances. Note To accelerate connection establishment and increase the bandwidth, use Elastic Network Interfaces (ENIs) to expose Services in the default namespace of a large cluster. By default, ENIs are used to expose Services in ACK clusters that are created after February 2023 and run Kubernetes versions later than 1.20. For other clusters, submit a ticket to use ENIs to expose Services. For more information, see Kube API Server. |
Number of Services per namespace | The kubelet stores Service information in environment variables and injects them into pods that run on the node. This allows pods to discover and communicate with the Services. If a namespace contains an excessive number of Services, the number of environment variables injected into pods will be big. Consequently, pods may require a long period of time to launch or even fail to launch. | We recommend that you limit the number of Services per namespace to less than 5,000. You can choose not to specify these environment variables and set |
Total number of Services in a cluster | If you create an excessive number of Services, kube-proxy needs to handle large numbers of network rules. This compromises the performance of kube-proxy. When the number of LoadBalancer Services grows, the synchronization latency between LoadBalancer Services and Server Load Balancer (SLB) instances also increases. The latency may even reach more than one minute. | We recommend that you limit the total number of Services to less than 10,000. We recommend that you limit the number of LoadBalancer Services to less than 500. |
Maximum number of endpoints per Service | The kube-proxy component runs on each node to watch Service-related updates so that it can update network rules on the node at the earliest opportunity. When a Service has an excessive number of endpoints, a large number of Endpoints objects exist. Each Endpoints object update involves high-volume data transfer between kube-apiserver and kube-proxy. When the size of the cluster grows, more data needs to be updated and the impacts become larger. Note To resolve this issue, kube-proxy uses EndpointSlices to improve performance by default in ACK clusters that run Kubernetes versions later than 1.19. | We recommend that you limit the number of backend pods associated with an Endpoints object to less than 3,000.
|
Total number of Service endpoints | If a cluster contains an excessive number of endpoints, the API server may be overloaded and the network performance may be compromised. | We recommend that you limit the total number of Service endpoints to less than 64,000. |
Number of pending pods | If an excessive number of pending pods exist, newly submitted pods may wait a long period of time before they can be scheduled. During the waiting time, the scheduler periodically generates events and creates an event storm. | We recommend that you limit the total number of pending pods to less than 10,000. |
Number of Secrets in a cluster that uses KMS to encrypt Kubernetes Secrets | When Key Management Service (KMS) v1 is used to encrypt data, each encryption generates a data encryption key (DEK). When a Kubernetes cluster starts up, the cluster needs to access and decrypt the Secrets stored in the etcd. If the cluster has an excessive number of secrets, the cluster needs to decrypt large amounts of data during startups or updates. This compromises the performance of the cluster. | We recommend that you limit the number of Secrets in a cluster that uses KMS v1 to encrypt Secrets to less than 2,000. |
Configure control plane component parameters properly
ACK Pro clusters allow you to customize the parameters of control plane components. You can customize the parameters of key managed components, such as kube-apiserver, kube-controller-manager, and kube-scheduler. In large clusters, you need to configure the throttling parameters of the control plane components properly.
kube-apiserver
To prevent large numbers of requests from overloading the control planes, kube-apiserver limits the number of concurrent requests that can be processed within a period of time. When the upper limit is exceeded, the API server triggers request throttling and returns HTTP status code 429 to the client. The status code indicates that an excessive number of requests are received and the client has to try again later. If no throttling is configured for the server, the control planes may be overloaded by requests. Consequently, the stability and availability of the entire service cluster are affected. Therefore, we recommend that you configure request throttling on the server to protect the control planes.
Request throttling methods
kube-apiserver supports the following request throttling methods:
Versions earlier than v1.18: kube-apiserver can limit only the maximum concurrency. Requests are classified into read requests and write requests. kube-apiserver uses the boot parameters
--max-requests-inflight
and--max-mutating-requests-inflight
to limit the maximum concurrency of read and write requests. This method does not handle requests based on their priorities. Slow requests with low priorities may occupy large amounts of resources and cause API server requests to accumulate. In this scenario, requests with high priorities or urgent requests cannot be handled promptly.ACK Pro clusters allow you to customize the max-requests-inflight and max-mutating-requests-inflight parameters of kube-apiserver. For more information, see Customize the parameters of control plane components in ACK Pro clusters.
v1.18 and later: The API Priority and Fairness (APF) feature is introduced to manage requests in a more fine-grained manner. This feature can classify and isolate requests based on predefined rules and priorities to ensure that important and urgent requests are prioritized. This feature also uses a fair queuing algorithm to ensure that different types of requests are fairly handled. This feature reaches the Beta stage in Kubernetes 1.20 and is enabled by default.
Request throttling monitoring and suggested solutions
The client can determine whether the server triggers request throttling based on the status code 429 or the apiserver_flowcontrol_rejected_requests_total
metric. When request throttling is triggered, use the following solutions.
Monitor the resource usage of the API server: When the resource usage is low, modify the sum of the
max-requests-inflight
andmax-mutating-requests-inflight
parameters to increase the total concurrency limit.For a cluster that contains more than 500 nodes, we recommend that set the sum to a value between 2000 and 3000. For a cluster that contains more than 3,000 nodes, we recommend that you set the sum to a value between 3000 and 5000.
Reconfigure PriorityLevelConfigurations:
Requests with high priorities: Create a FlowSchema to match requests that you do not want to throttle to a high-priority PriorityLevelConfiguration. Example:
workload-high
orexempt
. Take note that requests with theexempt
priority level are exempted from APF. Proceed with caution. You can configure a new PriorityLevelConfiguration to allocate a larger share of the concurrency budget to requests with high priorities.Requests with low priorities: When the resource usage of the API server is high or the API server responds slowly due to slow requests, you can create a FlowSchema to match these requests to a low-priority PriorityLevelConfiguration.
The kube-apiserver component is a managed component in ACK Pro clusters. By default, kube-apiserver uses at least two replicas deployed across zones to ensure high availability. When the resource usage of control planes increases, the number of replicas are scaled to at most six.
Concurrency limit of kube-apiserver = Number of replicas × Concurrency limit of each replica.
Modifying the custom parameters of kube-apiserver triggers an API server rolling update. This may cause the client controller to reperform the List-Watch operation. In large clusters, the API server may be overloaded. If this issue occurs, your service becomes temporarily unavailable.
kube-controller-manager and kube-scheduler
kube-controller-manager uses the kubeAPIQPS and kubeAPIBurst parameters and kube-scheduler uses the connectionQPS and connectionBurst parameters to control the QPS of communication with the API server. For more information, see Customize the parameters of control plane components in ACK Pro clusters and Custom parameters of kube-scheduler.
kube-controller-manager: For a cluster that contains more than 1,000 nodes, we recommend that you set kubeAPIQPS to a value greater than 300 and kubeAPIBurst to a value greater than 500.
kube-scheduler: No modification is needed in most cases. When the pod QPS exceeds 300/s, we recommend that you set connectionQPS to 800 and connectionBurst to 1000.
kubelet
The default values of the kube-api-burst and kube-api-qps
parameters of the kubelet are 5 and 10. No modification is needed in most cases. When the status of pods in your cluster is updated slowly, pods are scheduled with a latency, or volumes are mounted slowly, we recommend that you increase the values of the parameters. For more information, see Customize the kubelet parameters of a node pool.
Increasing the values of the kubelet parameters also increases the QPS of the kubelet for communicating with the API server. When the kubelet sends large numbers of requests, the loads of the API server may increase. We recommend that you increase the values progressively and pay attention to the performance and resource usage of the API server to ensure the stability of the control planes.
You need to control the frequency of kubelet updates. To ensure the stability of the control planes during kubelet updates, ACK limits the maximum concurrency of each batch to 10 when you update the kubelet on nodes in a node pool.
Plan the cluster resource scaling frequency
Typically, when a large cluster runs stably, the control planes in the cluster do not have stress. When the cluster initiates an operation on a large scale, such as creating or deleting large amounts of resources or scaling out or scaling in large numbers of nodes, the control planes may be overloaded. As a result, the cluster performance is compromised, the response latency increases, and your services may be interrupted.
For example, a cluster contains 5,000 nodes. If a large number of pods run stably in the cluster for long-term businesses, the loads of the control planes do not increase. However, if the cluster contains 1,000 nodes and you want to create 10,000 temporary jobs or add 2,000 nodes within 1 minute, the loads of the control planes spike.
Therefore, when you perform resource update operations in a large cluster, you need to limit the update frequency based on the status of the cluster to ensure the stability of the cluster and control planes.
We recommend that you perform update operations in the following ways.
The numbers in the following suggestions are only for reference due to factors such as control planes. Increase the update frequency progressively. Make sure that the control planes can respond as normal and then increase the update frequency to the next level.
Node scaling: For a cluster that contains more than 2,000 nodes, we recommend that you limit the number of nodes in each batch to 100 or less when you manually scale a node pool, and limit the number of nodes in each batch to 300 or less when you manually scale multiple node pools.
Application pod scaling: If your application is associated with a Service, Endpoint and EndpointSlice updates are pushed to all nodes during scaling activities. The data to be updated increases with the number of nodes in the cluster. If the cluster contains large numbers of nodes, a cluster storm occurs. For a cluster that contains more than 5,000 nodes, we recommend that you limit the update QPS of pods that are not associated with endpoints to 300/s or lower, and limit the update QPS of pods that are associated with endpoints to 10/s or lower. For example, when you claim a pod rolling update policy in a Deployment, we recommend that you set
maxUnavailable
andmaxSurge
to small values to reduce the pod update frequency.
Optimize the mode in which clients access the cluster
In a Kubernetes cluster, clients such as applications or kubectl clients obtain cluster resource information from the API server. When the amount of resources in the cluster grows but the client still sends requests at the same frequency, the requests may overload the control planes. Consequently, the control planes may respond slowly or even crash. Therefore, you need to plan the size of resources to be accessed and the access frequency before you access resources deployed in a Kubernetes cluster. We recommend that you read the following suggestions when you want to access resources in a large cluster.
Preferably use informers to access the local cache
Preferably use client-go informers to obtain resources. Retrieve data from the local cache instead of sending LIST requests to the API server. This reduces the loads of the API server.
Optimize the method used to retrieve resources from the API server
Requests that do not hit the local cache are still sent to the API server to retrieve resources. In this scenario, read the following suggestions.
Specify
resourceVersion=0
in LIST requests.resourceVersion
indicates the resource version. When the value is0
, cache data is retrieved from the API server instead of the etcd. This reduces the frequency of communication between the API server and etcd. LIST requests can be handled much faster. Example:k8sClient.CoreV1().Pods("").List(metav1.ListOptions{ResourceVersion: "0"})
Avoid listing all resources.
To reduce the volume of the returned data, use a filter to limit the scope of LIST requests. For example, use lable-selector (filter based on resource labels) or field-selector (filter based on resource fields) to filter LIST requests.
Noteetcd is a key-value storage system. etcd cannot filter data by label or field. The API server filters all requests based on the specified filter conditions. When you use filters, we recommend that you set
resourceVersion
to0
for LIST requests. The requested data is retrieved from the cache on the API server instead of the etcd, which reduces the loads of the etcd.Use protobuf (not JSON) to access non-CRD resources.
The API server can return resource objects in different formats to clients, including JSON and protobuf. By default, when a client sends a Kubernetes API request, Kubernetes returns a serialized JSON object. The content type (Content-Type) of the object is
application/json
. The client can request Kubernetes to return data in the protobuf format. Protobuf outperforms JSON in memory usage and data transfer.However, not all API resource types support the protobuf format. You can specify multiple content types in the
Accept
request header, such asapplication/json
andapplication/vnd.kubernetes.protobuf
. This way, resources in JSON format are returned when the protobuf format is not supported. For more information, see Alternate representations of resources. Example:Accept: application/vnd.kubernetes.protobuf, application/json
Use centralized controllers
You need to avoid creating a separate controller on each node to watch the cluster data. Otherwise, when the controllers start up, they send large numbers of LIST requests to the API server at the same time to synchronize the cluster status. This increases the loads of the control planes, compromises service stability, or even causes service interruptions.
To avoid this issues, we recommend that you use centralized controllers. You can create a controller instance that runs on one node or a group of controller instances that run across a few number of nodes for centralized management. The centralized controllers can listen for and handle LIST requests by launching once or a few times. In addition, the centralized controllers need only to maintain a small number of watch connections, which greatly reduces the loads of the API server.
Plan large workloads properly
Disable the feature of automatically mounting the default service account
To ensure that Secrets in pods are synchronously updated, the kubelet creates a watch persistent connection for each Secret. The watch mechanism allows the kubelet to receive Secret update notifications in real time. When an excessive number of watches are created, the watch connections may affect the performance of the control planes.
In Kubernetes versions earlier than 1.22: When you create a pod, if you do not specify a service account, Kubernetes automatically mounts the default service account as the Secret of the pod. Applications in the pod can use the service account to securely communicate with the API server.
For pods of a batch system or pods that do not need to access the API server, we recommend that you explicitly forbid auto service account token mounting. This way, relevant Secrets and watches are not created. For more information, see automountServiceAccountToken. In a large cluster, this operation helps avoid creating unnecessary Secrets and API server watch connections, which reduces the loads of the control planes.
In Kubernetes 1.22 and later: You can use the TokenRequest API to obtain a temporary and automatically-rotated token, and use a projected volume to mount the token. This operation not only enhances the security of Secrets but also reduces watch connections established by the kubelet for the Secret of each service account. This way, the performance of the cluster is guaranteed.
For more information about how to enable serviceAccountToken projected volumes, see Use ServiceAccount token volume projection.
Control the number and size of Kubernetes objects
Delete idle Kubernetes resources, such as ConfigMaps, Secrets, and persistent volume claims (PVCs), at your earliest convenience to reduce system resource usage and ensure the cluster performance. We recommend that you read the following suggestions.
Limit the number of historical Deployment ReplicaSets: revisionHistoryLimit claims the number of historical ReplicaSets kept for a Deployment. If the value is large, Kubernetes keeps an excessive number of historical ReplicaSets. This increases the loads of kube-controller-manager. In a large cluster, if you need to frequently update a large number of Deployments, you can decrease the value of revisionHistoryLimit for Deployments and delete historical ReplicaSets. The default value of revisionHistoryLimit for Deployments is 10.
Delete jobs and pods that you no longer need: If your cluster contains a large number of Job objects created by CronJobs or other mechanisms, use ttlSecondsAfterFinished to automatically delete pods that are created for the jobs within the previous cycle.
Allocate resources to Informer components properly
Informer components are typically used to monitor and synchronize the status of resources in Kubernetes clusters. Informer components establish watch connections to watch the status of API server resources and maintain a local cache for each resource object. This way, changes in resource status can be quickly synchronized.
The memory usage of Informer components, such as controllers and kube-scheduler, depends on the size of their Watch resources. In a large cluster, pay attention to the memory usage of Informer components to avoid Out-of-Memory (OOM) issues. If an Informer component frequently encounters OOM issues, resource listening errors may occur. If an Informer component frequently restarts, each List-Watch operation increases the loads of the control planes (especially the API server).
Pay attention to the metrics of control planes
You can view the metrics of key control plane components and analyze abnormal metrics in the control plane component dashboards. In large clusters, you need to pay close attention to the following metrics. For more information about the usage notes and descriptions of the metrics, see Monitor control plane components.
Resource usage of control plane components
The following table describes the resource usage metrics of control planes components.
Metric | PromQL | Description |
Memory Usage | memory_utilization_byte{container="kube-apiserver"} | The memory usage of kube-apiserver. Unit: bytes. |
CPU Usage | cpu_utilization_core{container="kube-apiserver"}*1000 | The CPU usage of kube-apiserver. Unit: millicores. |
kube-apiserver
For more information about how to view the metrics and their descriptions, see Metrics of kube-apiserver.
Number of resource objects
Metric
PromQL
Description
Number of resource objects
max by(resource)(apiserver_storage_objects)
max by(resource)(etcd_object_counts)
The metric name is apiserver_storage_objects if your ACK cluster runs Kubernetes 1.22 or later.
The metric name is etcd_object_counts if your ACK cluster runs Kubernetes 1.22 or earlier.
NoteDue to compatibility issues, both the apiserver_storage_objects and etcd_object_counts metrics exist in Kubernetes 1.22.
Request latency
Metric
PromQL
Description
GET read request delay P[0.9]
histogram_quantile($quantile, sum(irate(apiserver_request_duration_seconds_bucket{verb="GET",resource!="",subresource!~"log|proxy"}[$interval])) by (pod, verb, resource, subresource, scope, le))
The response time of GET requests displayed based on the following dimensions: API server pods, GET verb, resources, and scope.
LIST read request delay P[0.9]
histogram_quantile($quantile, sum(irate(apiserver_request_duration_seconds_bucket{verb="LIST"}[$interval])) by (pod_name, verb, resource, scope, le))
The response time of LIST requests displayed based on the following dimensions: API server pods, LIST verb, resources, and scope.
Write request delay P[0.9]
histogram_quantile($quantile, sum(irate(apiserver_request_duration_seconds_bucket{verb!~"GET|WATCH|LIST|CONNECT"}[$interval])) by (cluster, pod_name, verb, resource, scope, le))
The response time of Mutating requests displayed based on the following dimensions: API server pods, verbs such as GET, WATCH, LIST, and CONNECT, resources, and scope.
Request throttling
Metric
PromQL
Description
Request Limit Rate
sum(irate(apiserver_dropped_requests_total{request_kind="readOnly"}[$interval])) by (name)
sum(irate(apiserver_dropped_requests_total{request_kind="mutating"}[$interval])) by (name)
The throttling rate of kube-apiserver.
No data
or0
indicates that request throttling is not triggered.
kube-scheduler
For more information about how to view the metrics and their descriptions, see Metrics of kube-scheduler.
Number of pending pods
Metric
PromQL
Description
Scheduler Pending Pods
scheduler_pending_pods{job="ack-scheduler"}
The number of pending pods. Pending pods consist of the following types:
unschedulable: unschedulable pods.
backoff: backoff queue pods, which are the pods that fail to be scheduled due to specific reasons.
active: active queue pods, which are the pods ready to be scheduled.
Request latency
Metric
PromQL
Description
Kube API Request Latency
histogram_quantile($quantile, sum(rate(rest_client_request_duration_seconds_bucket{job="ack-scheduler"}[$interval])) by (verb,url,le))
The time interval between a request sent by kube-scheduler and a response returned by kube-apiserver. The latency is calculated based on Verbs and URLs.
kube-controller-manager
For more information about how to view the metrics and their descriptions, see Monitor kube-controller-manager.
Workqueue
Metric | PromQL | Description |
Workqueue depth | sum(rate(workqueue_depth{job="ack-kube-controller-manager"}[$interval])) by (name) | The change of the workqueue length in the specified interval. |
Workqueue processing delay | histogram_quantile($quantile, sum(rate(workqueue_queue_duration_seconds_bucket{job="ack-kube-controller-manager"}[5m])) by (name, le)) | The duration of the events in the workqueue. |
etcd
For more information about how to view the metrics and their descriptions, see Metrics of etcd.
Total number of key-value pairs
Metric
PromQL
Description
total kv
etcd_debugging_mvcc_keys_total
The total number of key-value pairs in the etcd cluster.
etcd size (DB size)
Metric
PromQL
Description
Disk Size
etcd_mvcc_db_total_size_in_bytes
The size of the etcd backend database.
etcd_mvcc_db_total_size_in_use_in_bytes
The usage of the etcd backend database.
References
For more information about quotas and limits on ACK clusters, see Quotas and limits.
For more information about how to plan a virtual private cloud (VPC) and container network, see Plan the network of an ACK cluster.
For more information about how to ensure the high reliability of ACK clusters and workloads, see Recommended workload configurations.
For more information about how to troubleshoot issues that occur when you use ACK clusters, see Troubleshooting and FAQ about cluster management.