Suggestions on using large-scale clusters

The performance and availability of Container Service for Kubernetes (ACK) clusters depend on the amount of cluster resources, resource access frequency, and access mode. The loads and performance of the API server may also vary based on different variations. A large ACK Pro cluster usually contains more than 500 nodes or more than 10,000 pods. The cluster administrator needs to plan and use large ACK clusters based on the actual business scenario and pay close attention to monitoring metrics to ensure the stability and availability of the clusters.

Usage notes for large ACK clusters

Compared with using multiple clusters, using a single large cluster can efficiently simplify cluster O&M and improve resource utilization. In complex business scenarios, we recommend that you split your service into multiple clusters by business logic or service demand. For example, you can split a service into non-production (testing) services and production (development) services or decouple database services from front-end applications.

We recommend that you use multiple clusters instead of creating a large cluster if you have the following requirements.

Requirement	Description

Requirement	Description
Isolation	You can use multiple clusters to isolate the testing environment from the production environment in case all of your businesses are interrupted when the cluster where your businesses are deployed is down. This reduces the impacts of single points of failure.
Location	Some services must be deployed in a location that is close to the end users to improve service availability and reduce the response latency. In this scenario, we recommend that you deploy multiple clusters across regions.
Cluster size	ACK managed control planes automatically adapt to the cluster size through auto scaling and cluster component optimization. However, the Kubernetes architecture has a performance bottleneck. The availability and performance of ultra-large clusters are not guaranteed. Before you use large clusters, read the Kubernetes scalability thresholds and Kubernetes scalability and performance SLIs/SLOs defined by the Kubernetes community and log on to the Quota Center console to view and increase quotas related to Container Service for Kubernetes. If your businesses exceed the limits of Kubernetes and ACK, split your businesses into multiple clusters.

If you require multi-cluster management, such as application deployment, traffic management, job distribution, and global monitoring, we recommend that you enable Multi-cluster Fleets.

About this topic

This topic is intended for the developers and administrators of ACK Pro clusters, offering general recommendations for planning and operating large-scale ACK clusters. Your actual cluster environment and business requirements shall prevail.

Note

The shared responsibility model defines that ACK clusters are responsible for the default security of control plane components (including Kubernetes control plane components and etcd) and Alibaba Cloud infrastructure related to cluster Services. You are responsible for the security of applications deployed on the cloud and the security configuration and updates of cloud resources. For more information, see Shared responsibility model.

Use new version clusters
Pay attention to cluster resource limits
Configure control plane component parameters properly
Plan the cluster resource scaling frequency
Optimize the mode in which clients access the cluster
Plan large workloads properly
Pay attention to the metrics of control planes

Use new version clusters

ACK periodically releases new Kubernetes versions and gradually phases out technical support for deprecated versions. For deprecated versions, ACK will:

Discontinue new feature releases
Cease bug fixes and security patches
Provide only limited technical support

You can learn information about new updates through documentation, console information, and internal messages, and read update notes for the desired Kubernetes version before you update your clusters. This helps you update your clusters at the earliest opportunity to mitigate security risks and fix stability issues. For more information about cluster updates, see Manually update ACK clusters and Automatically upgrade a cluster. For more information about Kubernetes versions supported by ACK, see Support for Kubernetes versions.

Pay attention to cluster resource limits

The following table describes the limits for ensuring the availability, stability, and performance of large ACK clusters and the corresponding solutions.

Limit	Description	Suggested solution

Limit	Description	Suggested solution
Maximum etcd size (DB size)	The maximum size of the etcd is 8 GB. If it is excessively large, its performance is compromised, including the read and write latency, system resource usage, and election latency. Consequently, service and data restoration becomes difficult and time-consuming.	Make sure that the etcd size is smaller than 8 GB. Control the total amount of cluster resources and release idle resources. For resources that are frequently updated, we recommend that you limit the size of each resource to less than 100 KB. In the etcd, each update to a key-value pair generates a historical version. In big data computing scenarios that involve frequent updates, historical versions stored in the etcd usually occupy more resources.
Total size of each resource type in the etcd	If large numbers of resource objects exist, an excessive amount of system resources is consumed when a client accesses all resource objects. This may even cause the initialization of the API server or custom controller to fail.	Limit the total size of each resource type to less than 800 MB. When you define a new type of CustomResourceDefinition (CRD), determine the expected number of CustomResources (CRs) in advance to ensure that the size of each CRD is controllable. When you deploy Helm charts, Helm automatically creates releases to track the deployment progress. By default, Helm uses Secrets to store version information. In large ACK clusters, the amount of version information may exceed the maximum Secret size defined by Kubernetes. In this scenario, use the Helm SQL storage backend instead.
Connections and bandwidth of the CLB instance used by an API server	Only Classic Load Balancer (CLB) instances are supported by API servers in ACK clusters. The maximum number of connections and bandwidth supported by a CLB instance are limited. For more information about the maximum number of connections supported by a CLB instance, see CLB instances. The maximum bandwidth of a CLB instance is 5,120 Mbit/s. When the connection or bandwidth limit of the CLB instance is exceeded, nodes enter the Not Ready state.	If your cluster contains more than 1,000 nodes, we recommend that you use pay-as-you-go CLB instances. Note To accelerate connection establishment and increase the bandwidth, use Elastic Network Interfaces (ENIs) to expose Services in the default namespace of a large cluster. By default, ENIs are used to expose Services in ACK clusters that are created after February 2023 and run Kubernetes versions later than 1.20. For other clusters, submit a ticket to use ENIs to expose Services. For more information, see Kube API Server.
Number of Services per namespace	The kubelet stores Service information in environment variables and injects them into pods that run on the node. This allows pods to discover and communicate with the Services. If a namespace contains an excessive number of Services, the number of environment variables injected into pods will be big. Consequently, pods may require a long period of time to launch or even fail to launch.	We recommend that you limit the number of Services per namespace to less than 5,000. You can choose not to specify these environment variables and set `enableServiceLinks` in the `podSpec` section to `false`. For more information, see Accessing the Service.
Total number of Services in a cluster	If you create an excessive number of Services, kube-proxy needs to handle large numbers of network rules. This compromises the performance of kube-proxy. When the number of LoadBalancer Services grows, the synchronization latency between LoadBalancer Services and Server Load Balancer (SLB) instances also increases. The latency may even reach more than one minute.	We recommend that you limit the total number of Services to less than 10,000. We recommend that you limit the number of LoadBalancer Services to less than 500.
Maximum number of endpoints per Service	The kube-proxy component runs on each node to watch Service-related updates so that it can update network rules on the node at the earliest opportunity. When a Service has an excessive number of endpoints, a large number of Endpoints objects exist. Each Endpoints object update involves high-volume data transfer between kube-apiserver and kube-proxy. When the size of the cluster grows, more data needs to be updated and the impacts become larger. Note To resolve this issue, kube-proxy uses EndpointSlices to improve performance by default in ACK clusters that run Kubernetes versions later than 1.19.	We recommend that you limit the number of backend pods associated with an Endpoints object to less than 3,000. In large clusters, use EndpointSlices instead of Endpoints to split and manage network endpoints. The splitting can efficiently reduce the volume of data transfer for each update. If your custom controller relies on Endpoints objects to make routing decisions, you can keep the Endpoints objects. Make sure that the number of backend pods associated with an Endpoints object is less than 1,000. When the upper limit is exceeded, data in the Endpoints object is automatically truncated. For more information, see Over-capacity endpoints.
Total number of Service endpoints	If a cluster contains an excessive number of endpoints, the API server may be overloaded and the network performance may be compromised.	We recommend that you limit the total number of Service endpoints to less than 64,000.
Number of pending pods	If an excessive number of pending pods exist, newly submitted pods may wait a long period of time before they can be scheduled. During the waiting time, the scheduler periodically generates events and creates an event storm.	We recommend that you limit the total number of pending pods to less than 10,000.
Number of Secrets in a cluster that uses KMS to encrypt Kubernetes Secrets	When Key Management Service (KMS) v1 is used to encrypt data, each encryption generates a data encryption key (DEK). When a Kubernetes cluster starts up, the cluster needs to access and decrypt the Secrets stored in the etcd. If the cluster has an excessive number of secrets, the cluster needs to decrypt large amounts of data during startups or updates. This compromises the performance of the cluster.	We recommend that you limit the number of Secrets in a cluster that uses KMS v1 to encrypt Secrets to less than 2,000.

Configure control plane component parameters properly

ACK Pro clusters allow you to customize the parameters of control plane components. You can customize the parameters of key managed components, such as kube-apiserver, kube-controller-manager, and kube-scheduler. In large clusters, you need to configure the throttling parameters of the control plane components properly.

kube-apiserver

To prevent large numbers of requests from overloading the control planes, kube-apiserver limits the number of concurrent requests that can be processed within a period of time. When the upper limit is exceeded, the API server triggers request throttling and returns HTTP status code 429 to the client. The status code indicates that an excessive number of requests are received and the client has to try again later. If no throttling is configured for the server, the control planes may be overloaded by requests. Consequently, the stability and availability of the entire service cluster are affected. Therefore, we recommend that you configure request throttling on the server to protect the control planes.

Request throttling methods

kube-apiserver supports the following request throttling methods:

Versions earlier than v1.18: kube-apiserver can limit only the maximum concurrency. Requests are classified into read requests and write requests. kube-apiserver uses the boot parameters --max-requests-inflight and --max-mutating-requests-inflight to limit the maximum concurrency of read and write requests. This method does not handle requests based on their priorities. Slow requests with low priorities may occupy large amounts of resources and cause API server requests to accumulate. In this scenario, requests with high priorities or urgent requests cannot be handled promptly.
ACK Pro clusters allow you to customize the max-requests-inflight and max-mutating-requests-inflight parameters of kube-apiserver. For more information, see Customize the parameters of control plane components in ACK Pro clusters.

v1.18 and later: The API Priority and Fairness (APF) feature is introduced to manage requests in a more fine-grained manner. This feature can classify and isolate requests based on predefined rules and priorities to ensure that important and urgent requests are prioritized. This feature also uses a fair queuing algorithm to ensure that different types of requests are fairly handled. This feature reaches the Beta stage in Kubernetes 1.20 and is enabled by default.

View the APF introduction

In ACK clusters that run Kubernetes 1.20 and later, the maximum number of concurrent requests processed by kube-apiserver is based on the sum of the --max-requests-inflight and --max-mutating-requests-inflight parameters. kube-apiserver uses the FlowSchema and PriorityLevelConfiguration CustomResourceDefinitions (CRDs) to control the concurrency of each type of requests in order to conduct request throttling in a fine-grained manner.

PriorityLevelConfiguration: defines a priority level. This determines the share of the available concurrency budget that each priority level can handle.
FlowSchema: matches requests to a single PriorityLevelConfiguration.

PriorityLevelConfigurations and FlowSchemas are maintained by kube-apiserver. Kubernetes clusters automatically generate default PriorityLevelConfigurations and FlowSchemas based on the current Kubernetes version. You can run the following commands to query PriorityLevelConfigurations and FlowSchemas.

View the command used to query PriorityLevelConfigurations and the output

kubectl get PriorityLevelConfiguration
# Expected output:
NAME              TYPE      ASSUREDCONCURRENCYSHARES   QUEUES   HANDSIZE   QUEUELENGTHLIMIT   AGE
catch-all         Limited   5                          <none>   <none>     <none>             4m20s
exempt            Exempt    <none>                     <none>   <none>     <none>             4m20s
global-default    Limited   20                         128      6          50                 4m20s
leader-election   Limited   10                         16       4          50                 4m20s
node-high         Limited   40                         64       6          50                 4m20s
system            Limited   30                         64       6          50                 4m20s
workload-high     Limited   40                         128      6          50                 4m20s
workload-low      Limited   100                        128      6          50                 4m20s

View the command used to query FlowSchemas and the output

Note

In ACK, the ack-system-leader-election and ack-default FlowSchemas that are related to ACK key components are added. The other FlowSchemas are the same as those in Kubernetes.

kubectl get flowschemas
# Expected output:
NAME                           PRIORITYLEVEL     MATCHINGPRECEDENCE   DISTINGUISHERMETHOD   AGE     MISSINGPL
exempt                         exempt            1                    <none>                4d18h   False
probes                         exempt            2                    <none>                4d18h   False
system-leader-election         leader-election   100                  ByUser                4d18h   False
endpoint-controller            workload-high     150                  ByUser                4d18h   False
workload-leader-election       leader-election   200                  ByUser                4d18h   False
system-node-high               node-high         400                  ByUser                4d18h   False
system-nodes                   system            500                  ByUser                4d18h   False
ack-system-leader-election     leader-election   700                  ByNamespace           4d18h   False
ack-default                    workload-high     800                  ByNamespace           4d18h   False
kube-controller-manager        workload-high     800                  ByNamespace           4d18h   False
kube-scheduler                 workload-high     800                  ByNamespace           4d18h   False
kube-system-service-accounts   workload-high     900                  ByNamespace           4d18h   False
service-accounts               workload-low      9000                 ByUser                4d18h   False
global-default                 global-default    9900                 ByUser                4d18h   False
catch-all                      catch-all         10000                ByUser                4d18h   False

Request throttling monitoring and suggested solutions

The client can determine whether the server triggers request throttling based on the status code 429 or the apiserver_flowcontrol_rejected_requests_total metric. When request throttling is triggered, use the following solutions.

Monitor the resource usage of the API server: When the resource usage is low, modify the sum of the max-requests-inflight and max-mutating-requests-inflight parameters to increase the total concurrency limit.
For a cluster that contains more than 500 nodes, we recommend that set the sum to a value between 2000 and 3000. For a cluster that contains more than 3,000 nodes, we recommend that you set the sum to a value between 3000 and 5000.
Reconfigure PriorityLevelConfigurations:
- Requests with high priorities: Create a FlowSchema to match requests that you do not want to throttle to a high-priority PriorityLevelConfiguration. Example: workload-high or exempt. Take note that requests with the exempt priority level are exempted from APF. Proceed with caution. You can configure a new PriorityLevelConfiguration to allocate a larger share of the concurrency budget to requests with high priorities.
- Requests with low priorities: When the resource usage of the API server is high or the API server responds slowly due to slow requests, you can create a FlowSchema to match these requests to a low-priority PriorityLevelConfiguration.

Important

The kube-apiserver component is a managed component in ACK Pro clusters. By default, kube-apiserver uses at least two replicas deployed across zones to ensure high availability. When the resource usage of control planes increases, the number of replicas are scaled to at most six. Concurrency limit of kube-apiserver = Number of replicas × Concurrency limit of each replica.
Modifying the custom parameters of kube-apiserver triggers an API server rolling update. This may cause the client controller to reperform the List-Watch operation. In large clusters, the API server may be overloaded. If this issue occurs, your service becomes temporarily unavailable.

kube-controller-manager and kube-scheduler

kube-controller-manager uses the kubeAPIQPS and kubeAPIBurst parameters and kube-scheduler uses the connectionQPS and connectionBurst parameters to control the QPS of communication with the API server. For more information, see Customize the parameters of control plane components in ACK Pro clusters and Custom parameters of kube-scheduler.

kube-controller-manager: For a cluster that contains more than 1,000 nodes, we recommend that you set kubeAPIQPS to a value greater than 300 and kubeAPIBurst to a value greater than 500.
kube-scheduler: No modification is needed in most cases. When the pod QPS exceeds 300/s, we recommend that you set connectionQPS to 800 and connectionBurst to 1000.

kubelet

The default values of the kube-api-burst and kube-api-qps parameters of the kubelet are 5 and 10. No modification is needed in most cases. When the status of pods in your cluster is updated slowly, pods are scheduled with a latency, or volumes are mounted slowly, we recommend that you increase the values of the parameters. For more information, see Customize the kubelet parameters of a node pool.

Important

Increasing the values of the kubelet parameters also increases the QPS of the kubelet for communicating with the API server. When the kubelet sends large numbers of requests, the loads of the API server may increase. We recommend that you increase the values progressively and pay attention to the performance and resource usage of the API server to ensure the stability of the control planes.
You need to control the frequency of kubelet updates. To ensure the stability of the control planes during kubelet updates, ACK limits the maximum concurrency of each batch to 10 when you update the kubelet on nodes in a node pool.

Plan the cluster resource scaling frequency

In large-scale Kubernetes clusters, the control plane typically operates under minimal load during stable states. However, when the cluster initiates an operation on a large scale, such as creating or deleting large amounts of resources or scaling out or scaling in large numbers of nodes, the control planes may be overloaded. As a result, the cluster performance is compromised, the response latency increases, and your services may be interrupted.

For example, a cluster contains 5,000 nodes. If a large number of pods run stably in the cluster for long-term businesses, the loads of the control planes do not increase. However, if the cluster contains 1,000 nodes and you want to create 10,000 temporary jobs or add 2,000 nodes within 1 minute, the loads of the control planes spike.

Therefore, when you perform resource update operations in a large cluster, you need to limit the update frequency based on the status of the cluster to ensure the stability of the cluster and control planes.

We recommend that you perform update operations in the following ways.

Important

The numbers in the following suggestions are only for reference due to factors such as control planes. Increase the update frequency progressively. Make sure that the control planes can respond as normal and then increase the update frequency to the next level.

Node scaling: For a cluster that contains more than 2,000 nodes, we recommend that you limit the number of nodes in each batch to 100 or less when you manually scale a node pool, and limit the number of nodes in each batch to 300 or less when you manually scale multiple node pools.
Application pod scaling: If your application is associated with a Service, Endpoint and EndpointSlice updates are pushed to all nodes during scaling activities. The data to be updated increases with the number of nodes in the cluster. If the cluster contains large numbers of nodes, a cluster storm occurs. For a cluster that contains more than 5,000 nodes, we recommend that you limit the update QPS of pods that are not associated with endpoints to 300/s or lower, and limit the update QPS of pods that are associated with endpoints to 10/s or lower. For example, when you claim a pod rolling update policy in a Deployment, we recommend that you set maxUnavailable and maxSurge to small values to reduce the pod update frequency.

Optimize the mode in which clients access the cluster

In a Kubernetes cluster, clients obtain cluster resource information from the API server. As resources in the cluster grows, frequent client queries may overload the control planes. Consequently, the control planes may respond slowly or even crash. Therefore, you must plan the size of resources to be accessed and the access frequency. We recommend that you read the following suggestions:

Preferably use informers to access the local cache

Preferably use client-go informers to obtain resources. Retrieve data from the local cache instead of sending LIST requests to the API server. This reduces the loads of the API server.

Optimize the method used to retrieve resources from the API server

Requests that do not hit the local cache are still sent to the API server to retrieve resources. In this scenario, read the following suggestions.

Specify resourceVersion=0 in LIST requests.
resourceVersion indicates the resource version. When the value is 0, cache data is retrieved from the API server instead of the etcd. This reduces the frequency of communication between the API server and etcd. LIST requests can be handled much faster. Example:
```
k8sClient.CoreV1().Pods("").List(metav1.ListOptions{ResourceVersion: "0"})
```
Avoid listing all resources.
To reduce the volume of the returned data, use a filter to limit the scope of LIST requests. For example, use lable-selector (filter based on resource labels) or field-selector (filter based on resource fields) to filter LIST requests.
Note
etcd is a key-value storage system. etcd cannot filter data by label or field. The API server filters all requests based on the specified filter conditions. When you use filters, we recommend that you set resourceVersion to 0 for LIST requests. The requested data is retrieved from the cache on the API server instead of the etcd, which reduces the loads of the etcd.
Use protobuf (not JSON) to access non-CRD resources.
The API server can return resource objects in different formats to clients, including JSON and protobuf. By default, when a client sends a Kubernetes API request, Kubernetes returns a serialized JSON object. The content type (Content-Type) of the object is application/json. The client can request Kubernetes to return data in the protobuf format. Protobuf outperforms JSON in memory usage and data transfer.
However, not all API resource types support the protobuf format. You can specify multiple content types in the Accept request header, such as application/json and application/vnd.kubernetes.protobuf. This way, resources in JSON format are returned when the protobuf format is not supported. For more information, see Alternate representations of resources. Example:
```
Accept: application/vnd.kubernetes.protobuf, application/json
```

Use centralized controllers

You need to avoid creating a separate controller on each node to watch the cluster data. Otherwise, when the controllers start up, they send large numbers of LIST requests to the API server at the same time to synchronize the cluster status. This increases the loads of the control planes, compromises service stability, or even causes service interruptions.

To avoid this issues, we recommend that you use centralized controllers. You can create a controller instance that runs on one node or a group of controller instances that run across a few number of nodes for centralized management. The centralized controllers can listen for and handle LIST requests by launching once or a few times. In addition, the centralized controllers need only to maintain a small number of watch connections, which greatly reduces the loads of the API server.

Plan large workloads properly

Disable the feature of automatically mounting the default service account

To ensure that Secrets in pods are synchronously updated, the kubelet creates a watch persistent connection for each Secret. The watch mechanism allows the kubelet to receive Secret update notifications in real time. When an excessive number of watches are created, the watch connections may affect the performance of the control planes.

In Kubernetes versions earlier than 1.22: When you create a pod, if you do not specify a service account, Kubernetes automatically mounts the default service account as the Secret of the pod. Applications in the pod can use the service account to securely communicate with the API server.
For pods of a batch system or pods that do not need to access the API server, we recommend that you explicitly forbid auto service account token mounting. This way, relevant Secrets and watches are not created. For more information, see automountServiceAccountToken. In a large cluster, this operation helps avoid creating unnecessary Secrets and API server watch connections, which reduces the loads of the control planes.
In Kubernetes 1.22 and later: You can use the TokenRequest API to obtain a temporary and automatically-rotated token, and use a projected volume to mount the token. This operation not only enhances the security of Secrets but also reduces watch connections established by the kubelet for the Secret of each service account. This way, the performance of the cluster is guaranteed.
For more information about how to enable serviceAccountToken projected volumes, see Use ServiceAccount token volume projection.

Control the number and size of Kubernetes objects

Delete idle Kubernetes resources, such as ConfigMaps, Secrets, and persistent volume claims (PVCs), at your earliest convenience to reduce system resource usage and ensure the cluster performance. We recommend that you read the following suggestions.

Limit the number of historical Deployment ReplicaSets: revisionHistoryLimit claims the number of historical ReplicaSets kept for a Deployment. If the value is large, Kubernetes keeps an excessive number of historical ReplicaSets. This increases the loads of kube-controller-manager. In a large cluster, if you need to frequently update a large number of Deployments, you can decrease the value of revisionHistoryLimit for Deployments and delete historical ReplicaSets. The default value of revisionHistoryLimit for Deployments is 10.
Delete jobs and pods that you no longer need: If your cluster contains a large number of Job objects created by CronJobs or other mechanisms, use ttlSecondsAfterFinished to automatically delete pods that are created for the jobs within the previous cycle.

Allocate resources to Informer components properly

Informer components are typically used to monitor and synchronize the status of resources in Kubernetes clusters. Informer components establish watch connections to watch the status of API server resources and maintain a local cache for each resource object. This way, changes in resource status can be quickly synchronized.

The memory usage of Informer components, such as controllers and kube-scheduler, depends on the size of their Watch resources. In a large cluster, pay attention to the memory usage of Informer components to avoid Out-of-Memory (OOM) issues. If an Informer component frequently encounters OOM issues, resource listening errors may occur. If an Informer component frequently restarts, each List-Watch operation increases the loads of the control planes (especially the API server).

Pay attention to the metrics of control planes

You can view the metrics of key control plane components and analyze abnormal metrics in the control plane component dashboards. In large clusters, you need to pay close attention to the following metrics. For more information about the usage notes and descriptions of the metrics, see Monitor control plane components.

Resource usage of control plane components

The following table describes the resource usage metrics of control planes components.

Metric	PromQL	Description

Metric

PromQL

Description

Memory Usage

memory_utilization_byte{container="kube-apiserver"}

The memory usage of kube-apiserver. Unit: bytes.

CPU Usage

cpu_utilization_core{container="kube-apiserver"}*1000

The CPU usage of kube-apiserver. Unit: millicores.

kube-apiserver

For more information about how to view the metrics and their descriptions, see Metrics of kube-apiserver.

Number of resource objects

Metric	PromQL	Description

Metric

PromQL

Description

Number of resource objects

max by(resource)(apiserver_storage_objects)
max by(resource)(etcd_object_counts)

The metric name is apiserver_storage_objects if your ACK cluster runs Kubernetes 1.22 or later.
The metric name is etcd_object_counts if your ACK cluster runs Kubernetes 1.22 or earlier.

Note

Due to compatibility issues, both the apiserver_storage_objects and etcd_object_counts metrics exist in Kubernetes 1.22.

Request latency

Metric	PromQL	Description

Metric	PromQL	Description
GET read request delay P[0.9]	histogram_quantile($quantile, sum(irate(apiserver_request_duration_seconds_bucket{verb="GET",resource!="",subresource!~"log\|proxy"}[$interval])) by (pod, verb, resource, subresource, scope, le))	The response time of GET requests displayed based on the following dimensions: API server pods, GET verb, resources, and scope.
LIST read request delay P[0.9]	histogram_quantile($quantile, sum(irate(apiserver_request_duration_seconds_bucket{verb="LIST"}[$interval])) by (pod_name, verb, resource, scope, le))	The response time of LIST requests displayed based on the following dimensions: API server pods, LIST verb, resources, and scope.
Write request delay P[0.9]	histogram_quantile($quantile, sum(irate(apiserver_request_duration_seconds_bucket{verb!~"GET\|WATCH\|LIST\|CONNECT"}[$interval])) by (cluster, pod_name, verb, resource, scope, le))	The response time of Mutating requests displayed based on the following dimensions: API server pods, verbs such as GET, WATCH, LIST, and CONNECT, resources, and scope.

Request throttling

Metric	PromQL	Description

Metric

PromQL

Description

Request Limit Rate

sum(irate(apiserver_dropped_requests_total{request_kind="readOnly"}[$interval])) by (name)

sum(irate(apiserver_dropped_requests_total{request_kind="mutating"}[$interval])) by (name)

The throttling rate of kube-apiserver. No data or 0 indicates that request throttling is not triggered.

kube-scheduler

For more information about how to view the metrics and their descriptions, see Metrics of kube-scheduler.

Number of pending pods

Metric	PromQL	Description

Metric

PromQL

Description

Scheduler Pending Pods

scheduler_pending_pods{job="ack-scheduler"}

The number of pending pods. Pending pods consist of the following types:

unschedulable: unschedulable pods.
backoff: backoff queue pods, which are the pods that fail to be scheduled due to specific reasons.
active: active queue pods, which are the pods ready to be scheduled.

Request latency

Metric	PromQL	Description

Metric

PromQL

Description

Kube API Request Latency

histogram_quantile($quantile, sum(rate(rest_client_request_duration_seconds_bucket{job="ack-scheduler"}[$interval])) by (verb,url,le))

The time interval between a request sent by kube-scheduler and a response returned by kube-apiserver. The latency is calculated based on Verbs and URLs.

kube-controller-manager

For more information about how to view the metrics and their descriptions, see Monitor kube-controller-manager.

Workqueue

Metric	PromQL	Description

Metric

PromQL

Description

Workqueue depth

sum(rate(workqueue_depth{job="ack-kube-controller-manager"}[$interval])) by (name)

The change of the workqueue length in the specified interval.

Workqueue processing delay

histogram_quantile($quantile, sum(rate(workqueue_queue_duration_seconds_bucket{job="ack-kube-controller-manager"}[5m])) by (name, le))

The duration of the events in the workqueue.

etcd

For more information about how to view the metrics and their descriptions, see Metrics of etcd.

Total number of key-value pairs
Metric
PromQL
Description
Metric
PromQL
Description
total kv
etcd_debugging_mvcc_keys_total
The total number of key-value pairs in the etcd cluster.
etcd size (DB size)
Metric
PromQL
Description
Metric
PromQL
Description
Disk Size
etcd_mvcc_db_total_size_in_bytes
The size of the etcd backend database.
etcd_mvcc_db_total_size_in_use_in_bytes
The usage of the etcd backend database.

References

For more information about quotas and limits on ACK clusters, see Quotas and limits.
For more information about how to plan a virtual private cloud (VPC) and container network, see Plan the network of an ACK cluster.
For more information about how to ensure the high reliability of ACK clusters and workloads, see Recommended workload configurations.
For more information about how to troubleshoot issues that occur when you use ACK clusters, see Troubleshooting and FAQs about cluster management.

Usage notes for large ACK clusters

About this topic

Use new version clusters

Pay attention to cluster resource limits

Configure control plane component parameters properly

kube-apiserver

Request throttling methods

Request throttling monitoring and suggested solutions

kube-controller-manager and kube-scheduler

kubelet

Plan the cluster resource scaling frequency

Optimize the mode in which clients access the cluster

Preferably use informers to access the local cache

Optimize the method used to retrieve resources from the API server

Use centralized controllers

Plan large workloads properly

Disable the feature of automatically mounting the default service account

Control the number and size of Kubernetes objects

Allocate resources to Informer components properly

Pay attention to the metrics of control planes

Resource usage of control plane components

kube-apiserver

kube-scheduler

kube-controller-manager

etcd

References

Sales Support

Technical Support

Connect & Report Abuse

About Alibaba Cloud

Our Global Network

Quick Start

Global Offices

Olympic Games Paris 2024 New

Stade Roland Garros – Glitz from the Past New

Place de la Concorde – “Breaking” the Barriers New

Vaires-sur-Marne Nautical Stadium – Sports with Sustainability New

International Broadcast Center – Images, Sounds, and Data that Captivate Billions New

Customer Success Stories New

Trust Center

Security & Compliance Center

Cloud Compliance Resources

Security Compliance FAQs

Product & Feature Update New

Cloud Forward

Press Room

Alibaba Cloud e-Magazine New

Alibaba Cloud in Analyst Research

Notice

Go Global Service New

Go Global Alliance with Alibaba Cloud

Asia Accelerator Hot

Information Compliance

China Gateway - MLPS 2.0 Compliance New

China Gateway - Networking

China Gateway - Global Application Acceleration New

China Gateway - Security

China Gateway - Data Security New

ICP Support Hot

China Gateway - Omnichannel Data Mid-End New

China Gateway - Organizational Data Mid-End New

China Gateway - Business Mid-End New

China Gateway - AI Service for Conversational Chatbots New

China Gateway - Online Education

China Gateway - Domain Registration

Work at Alibaba Cloud

Experienced Professionals

Students and Graduates

Free Trial

Pricing

Promo Center

Price Reduction

Pay Less and Deploy More

FinOps

Elastic Compute Service (ECS)

Simple Application Server (SAS)

Elastic GPU Service

Elastic Desktop Service (EDS)

Object Storage Service (OSS)

Cloud Enterprise Network (CEN)

Web Application Firewall (WAF)

Domain Names

Container Compute Service (ACS)

Secure Access Service Edge (SASE)

Intelligent Media Services(IMS)

Edge Security Acceleration (ESA)(Original DCDN)

Intelligent Media Management

DingTalk Enterprise

YiDA

Alibaba Cloud Model Studio

Apsara Prime - For Easy Cloud Product Selection

Alibaba Cloud ECS - Cater All Your Cloud Hosting Needs

1TB CDN—Get Free 1 TB Outbound Traffic Plan Now

Security—Under Attack? Get Free Security Support

Short Message Service - Free Testing is Available

Elastic Compute Service (ECS) Hot

CloudBox

Compute Nest

Dedicated Host Hot

ECS Bare Metal Instance

Elastic GPU Service Featured

Simple Application Server (SAS) Hot

Auto Scaling

Cloud Phone Beta

Elastic Desktop Service (EDS) Featured

Batch Compute

Elastic High Performance Computing (E-HPC)

Super Computing Cluster (SCC)

Function Compute (FC)