The performance and availability of Container Service for Kubernetes (ACK) clusters depend on the amount of cluster resources, resource access frequency, and access mode. The loads and performance of the API server may also vary based on different variations. A large ACK Pro cluster usually contains more than 500 nodes or more than 10,000 pods. The cluster administrator needs to plan and use large ACK clusters based on the actual business scenario and pay close attention to monitoring metrics to ensure the stability and availability of the clusters.
Usage notes for large ACK clusters
Compared with using multiple clusters, using a single large cluster can efficiently simplify cluster O&M and improve resource utilization. In complex business scenarios, we recommend that you split your service into multiple clusters by business logic or service demand. For example, you can split a service into non-production (testing) services and production (development) services or decouple database services from front-end applications.
We recommend that you use multiple clusters instead of creating a large cluster if you have the following requirements.
Requirement | Description |
Isolation | You can use multiple clusters to isolate the testing environment from the production environment in case all of your businesses are interrupted when the cluster where your businesses are deployed is down. This reduces the impacts of single points of failure. |
Location | Some services must be deployed in a location that is close to the end users to improve service availability and reduce the response latency. In this scenario, we recommend that you deploy multiple clusters across regions. |
Cluster size | ACK managed control planes automatically adapt to the cluster size through auto scaling and cluster component optimization. However, the Kubernetes architecture has a performance bottleneck. The availability and performance of ultra-large clusters are not guaranteed. Before you use large clusters, read the Kubernetes scalability thresholds and Kubernetes scalability and performance SLIs/SLOs defined by the Kubernetes community and log on to the Quota Center console to view and increase quotas related to Container Service for Kubernetes. If your businesses exceed the limits of Kubernetes and ACK, split your businesses into multiple clusters. |
If you require multi-cluster management, such as application deployment, traffic management, job distribution, and global monitoring, we recommend that you enable Multi-cluster Fleets.
About this topic
This topic is intended for the developers and administrators of ACK Pro clusters, offering general recommendations for planning and operating large-scale ACK clusters. Your actual cluster environment and business requirements shall prevail.
Note
The shared responsibility model defines that ACK clusters are responsible for the default security of control plane components (including Kubernetes control plane components and etcd) and Alibaba Cloud infrastructure related to cluster Services. You are responsible for the security of applications deployed on the cloud and the security configuration and updates of cloud resources. For more information, see Shared responsibility model.
Use new version clusters
ACK periodically releases new Kubernetes versions and gradually phases out technical support for deprecated versions. For deprecated versions, ACK will:
Discontinue new feature releases
Cease bug fixes and security patches
Provide only limited technical support
You can learn information about new updates through documentation, console information, and internal messages, and read update notes for the desired Kubernetes version before you update your clusters. This helps you update your clusters at the earliest opportunity to mitigate security risks and fix stability issues. For more information about cluster updates, see Manually update ACK clusters and Automatically upgrade a cluster. For more information about Kubernetes versions supported by ACK, see Support for Kubernetes versions.
Pay attention to cluster resource limits
The following table describes the limits for ensuring the availability, stability, and performance of large ACK clusters and the corresponding solutions.
Limit | Description | Suggested solution |
Limit | Description | Suggested solution |
Maximum etcd size (DB size) | The maximum size of the etcd is 8 GB. If it is excessively large, its performance is compromised, including the read and write latency, system resource usage, and election latency. Consequently, service and data restoration becomes difficult and time-consuming. | Make sure that the etcd size is smaller than 8 GB. Control the total amount of cluster resources and release idle resources. For resources that are frequently updated, we recommend that you limit the size of each resource to less than 100 KB. In the etcd, each update to a key-value pair generates a historical version. In big data computing scenarios that involve frequent updates, historical versions stored in the etcd usually occupy more resources.
|
Total size of each resource type in the etcd | If large numbers of resource objects exist, an excessive amount of system resources is consumed when a client accesses all resource objects. This may even cause the initialization of the API server or custom controller to fail. | Limit the total size of each resource type to less than 800 MB. When you define a new type of CustomResourceDefinition (CRD), determine the expected number of CustomResources (CRs) in advance to ensure that the size of each CRD is controllable. When you deploy Helm charts, Helm automatically creates releases to track the deployment progress. By default, Helm uses Secrets to store version information. In large ACK clusters, the amount of version information may exceed the maximum Secret size defined by Kubernetes. In this scenario, use the Helm SQL storage backend instead.
|
Connections and bandwidth of the CLB instance used by an API server | Only Classic Load Balancer (CLB) instances are supported by API servers in ACK clusters. The maximum number of connections and bandwidth supported by a CLB instance are limited. For more information about the maximum number of connections supported by a CLB instance, see CLB instances. The maximum bandwidth of a CLB instance is 5,120 Mbit/s. When the connection or bandwidth limit of the CLB instance is exceeded, nodes enter the Not Ready state. | If your cluster contains more than 1,000 nodes, we recommend that you use pay-as-you-go CLB instances. Note To accelerate connection establishment and increase the bandwidth, use Elastic Network Interfaces (ENIs) to expose Services in the default namespace of a large cluster. By default, ENIs are used to expose Services in ACK clusters that are created after February 2023 and run Kubernetes versions later than 1.20. For other clusters, submit a ticket to use ENIs to expose Services. For more information, see Kube API Server. |
Number of Services per namespace | The kubelet stores Service information in environment variables and injects them into pods that run on the node. This allows pods to discover and communicate with the Services. If a namespace contains an excessive number of Services, the number of environment variables injected into pods will be big. Consequently, pods may require a long period of time to launch or even fail to launch. | We recommend that you limit the number of Services per namespace to less than 5,000. You can choose not to specify these environment variables and set enableServiceLinks in the podSpec section to false . For more information, see Accessing the Service. |
Total number of Services in a cluster | If you create an excessive number of Services, kube-proxy needs to handle large numbers of network rules. This compromises the performance of kube-proxy. When the number of LoadBalancer Services grows, the synchronization latency between LoadBalancer Services and Server Load Balancer (SLB) instances also increases. The latency may even reach more than one minute. | We recommend that you limit the total number of Services to less than 10,000. We recommend that you limit the number of LoadBalancer Services to less than 500. |
Maximum number of endpoints per Service | The kube-proxy component runs on each node to watch Service-related updates so that it can update network rules on the node at the earliest opportunity. When a Service has an excessive number of endpoints, a large number of Endpoints objects exist. Each Endpoints object update involves high-volume data transfer between kube-apiserver and kube-proxy. When the size of the cluster grows, more data needs to be updated and the impacts become larger. Note To resolve this issue, kube-proxy uses EndpointSlices to improve performance by default in ACK clusters that run Kubernetes versions later than 1.19. | We recommend that you limit the number of backend pods associated with an Endpoints object to less than 3,000. In large clusters, use EndpointSlices instead of Endpoints to split and manage network endpoints. The splitting can efficiently reduce the volume of data transfer for each update. If your custom controller relies on Endpoints objects to make routing decisions, you can keep the Endpoints objects. Make sure that the number of backend pods associated with an Endpoints object is less than 1,000. When the upper limit is exceeded, data in the Endpoints object is automatically truncated. For more information, see Over-capacity endpoints.
|
Total number of Service endpoints | If a cluster contains an excessive number of endpoints, the API server may be overloaded and the network performance may be compromised. | We recommend that you limit the total number of Service endpoints to less than 64,000. |
Number of pending pods | If an excessive number of pending pods exist, newly submitted pods may wait a long period of time before they can be scheduled. During the waiting time, the scheduler periodically generates events and creates an event storm. | We recommend that you limit the total number of pending pods to less than 10,000. |
Number of Secrets in a cluster that uses KMS to encrypt Kubernetes Secrets | When Key Management Service (KMS) v1 is used to encrypt data, each encryption generates a data encryption key (DEK). When a Kubernetes cluster starts up, the cluster needs to access and decrypt the Secrets stored in the etcd. If the cluster has an excessive number of secrets, the cluster needs to decrypt large amounts of data during startups or updates. This compromises the performance of the cluster. | We recommend that you limit the number of Secrets in a cluster that uses KMS v1 to encrypt Secrets to less than 2,000. |
Configure control plane component parameters properly
ACK Pro clusters allow you to customize the parameters of control plane components. You can customize the parameters of key managed components, such as kube-apiserver, kube-controller-manager, and kube-scheduler. In large clusters, you need to configure the throttling parameters of the control plane components properly.
kube-apiserver
To prevent large numbers of requests from overloading the control planes, kube-apiserver limits the number of concurrent requests that can be processed within a period of time. When the upper limit is exceeded, the API server triggers request throttling and returns HTTP status code 429 to the client. The status code indicates that an excessive number of requests are received and the client has to try again later. If no throttling is configured for the server, the control planes may be overloaded by requests. Consequently, the stability and availability of the entire service cluster are affected. Therefore, we recommend that you configure request throttling on the server to protect the control planes.
Request throttling methods
kube-apiserver supports the following request throttling methods:
Versions earlier than v1.18: kube-apiserver can limit only the maximum concurrency. Requests are classified into read requests and write requests. kube-apiserver uses the boot parameters --max-requests-inflight
and --max-mutating-requests-inflight
to limit the maximum concurrency of read and write requests. This method does not handle requests based on their priorities. Slow requests with low priorities may occupy large amounts of resources and cause API server requests to accumulate. In this scenario, requests with high priorities or urgent requests cannot be handled promptly.
ACK Pro clusters allow you to customize the max-requests-inflight and max-mutating-requests-inflight parameters of kube-apiserver. For more information, see Customize the parameters of control plane components in ACK Pro clusters.
v1.18 and later: The API Priority and Fairness (APF) feature is introduced to manage requests in a more fine-grained manner. This feature can classify and isolate requests based on predefined rules and priorities to ensure that important and urgent requests are prioritized. This feature also uses a fair queuing algorithm to ensure that different types of requests are fairly handled. This feature reaches the Beta stage in Kubernetes 1.20 and is enabled by default.
View the APF introduction
In ACK clusters that run Kubernetes 1.20 and later, the maximum number of concurrent requests processed by kube-apiserver is based on the sum of the --max-requests-inflight
and --max-mutating-requests-inflight
parameters. kube-apiserver uses the FlowSchema and PriorityLevelConfiguration CustomResourceDefinitions (CRDs) to control the concurrency of each type of requests in order to conduct request throttling in a fine-grained manner.
PriorityLevelConfiguration: defines a priority level. This determines the share of the available concurrency budget that each priority level can handle.
FlowSchema: matches requests to a single PriorityLevelConfiguration.
PriorityLevelConfigurations and FlowSchemas are maintained by kube-apiserver. Kubernetes clusters automatically generate default PriorityLevelConfigurations and FlowSchemas based on the current Kubernetes version. You can run the following commands to query PriorityLevelConfigurations and FlowSchemas.
View the command used to query PriorityLevelConfigurations and the output
kubectl get PriorityLevelConfiguration
NAME TYPE ASSUREDCONCURRENCYSHARES QUEUES HANDSIZE QUEUELENGTHLIMIT AGE
catch-all Limited 5 <none> <none> <none> 4m20s
exempt Exempt <none> <none> <none> <none> 4m20s
global-default Limited 20 128 6 50 4m20s
leader-election Limited 10 16 4 50 4m20s
node-high Limited 40 64 6 50 4m20s
system Limited 30 64 6 50 4m20s
workload-high Limited 40 128 6 50 4m20s
workload-low Limited 100 128 6 50 4m20s
View the command used to query FlowSchemas and the output
Note
In ACK, the ack-system-leader-election and ack-default FlowSchemas that are related to ACK key components are added. The other FlowSchemas are the same as those in Kubernetes.
kubectl get flowschemas
NAME PRIORITYLEVEL MATCHINGPRECEDENCE DISTINGUISHERMETHOD AGE MISSINGPL
exempt exempt 1 <none> 4d18h False
probes exempt 2 <none> 4d18h False
system-leader-election leader-election 100 ByUser 4d18h False
endpoint-controller workload-high 150 ByUser 4d18h False
workload-leader-election leader-election 200 ByUser 4d18h False
system-node-high node-high 400 ByUser 4d18h False
system-nodes system 500 ByUser 4d18h False
ack-system-leader-election leader-election 700 ByNamespace 4d18h False
ack-default workload-high 800 ByNamespace 4d18h False
kube-controller-manager workload-high 800 ByNamespace 4d18h False
kube-scheduler workload-high 800 ByNamespace 4d18h False
kube-system-service-accounts workload-high 900 ByNamespace 4d18h False
service-accounts workload-low 9000 ByUser 4d18h False
global-default global-default 9900 ByUser 4d18h False
catch-all catch-all 10000 ByUser 4d18h False
Request throttling monitoring and suggested solutions
The client can determine whether the server triggers request throttling based on the status code 429 or the apiserver_flowcontrol_rejected_requests_total
metric. When request throttling is triggered, use the following solutions.
Monitor the resource usage of the API server: When the resource usage is low, modify the sum of the max-requests-inflight
and max-mutating-requests-inflight
parameters to increase the total concurrency limit.
For a cluster that contains more than 500 nodes, we recommend that set the sum to a value between 2000 and 3000. For a cluster that contains more than 3,000 nodes, we recommend that you set the sum to a value between 3000 and 5000.
Reconfigure PriorityLevelConfigurations:
Requests with high priorities: Create a FlowSchema to match requests that you do not want to throttle to a high-priority PriorityLevelConfiguration. Example: workload-high
or exempt
. Take note that requests with the exempt
priority level are exempted from APF. Proceed with caution. You can configure a new PriorityLevelConfiguration to allocate a larger share of the concurrency budget to requests with high priorities.
Requests with low priorities: When the resource usage of the API server is high or the API server responds slowly due to slow requests, you can create a FlowSchema to match these requests to a low-priority PriorityLevelConfiguration.
Important
The kube-apiserver component is a managed component in ACK Pro clusters. By default, kube-apiserver uses at least two replicas deployed across zones to ensure high availability. When the resource usage of control planes increases, the number of replicas are scaled to at most six. Concurrency limit of kube-apiserver = Number of replicas × Concurrency limit of each replica.
Modifying the custom parameters of kube-apiserver triggers an API server rolling update. This may cause the client controller to reperform the List-Watch operation. In large clusters, the API server may be overloaded. If this issue occurs, your service becomes temporarily unavailable.
kube-controller-manager and kube-scheduler
kube-controller-manager uses the kubeAPIQPS and kubeAPIBurst parameters and kube-scheduler uses the connectionQPS and connectionBurst parameters to control the QPS of communication with the API server. For more information, see Customize the parameters of control plane components in ACK Pro clusters and Custom parameters of kube-scheduler.
kube-controller-manager: For a cluster that contains more than 1,000 nodes, we recommend that you set kubeAPIQPS to a value greater than 300 and kubeAPIBurst to a value greater than 500.
kube-scheduler: No modification is needed in most cases. When the pod QPS exceeds 300/s, we recommend that you set connectionQPS to 800 and connectionBurst to 1000.
kubelet
The default values of the kube-api-burst and kube-api-qps
parameters of the kubelet are 5 and 10. No modification is needed in most cases. When the status of pods in your cluster is updated slowly, pods are scheduled with a latency, or volumes are mounted slowly, we recommend that you increase the values of the parameters. For more information, see Customize the kubelet parameters of a node pool.
Important
Increasing the values of the kubelet parameters also increases the QPS of the kubelet for communicating with the API server. When the kubelet sends large numbers of requests, the loads of the API server may increase. We recommend that you increase the values progressively and pay attention to the performance and resource usage of the API server to ensure the stability of the control planes.
You need to control the frequency of kubelet updates. To ensure the stability of the control planes during kubelet updates, ACK limits the maximum concurrency of each batch to 10 when you update the kubelet on nodes in a node pool.
Plan the cluster resource scaling frequency
In large-scale Kubernetes clusters, the control plane typically operates under minimal load during stable states. However, when the cluster initiates an operation on a large scale, such as creating or deleting large amounts of resources or scaling out or scaling in large numbers of nodes, the control planes may be overloaded. As a result, the cluster performance is compromised, the response latency increases, and your services may be interrupted.
For example, a cluster contains 5,000 nodes. If a large number of pods run stably in the cluster for long-term businesses, the loads of the control planes do not increase. However, if the cluster contains 1,000 nodes and you want to create 10,000 temporary jobs or add 2,000 nodes within 1 minute, the loads of the control planes spike.
Therefore, when you perform resource update operations in a large cluster, you need to limit the update frequency based on the status of the cluster to ensure the stability of the cluster and control planes.
We recommend that you perform update operations in the following ways.
Important
The numbers in the following suggestions are only for reference due to factors such as control planes. Increase the update frequency progressively. Make sure that the control planes can respond as normal and then increase the update frequency to the next level.
Node scaling: For a cluster that contains more than 2,000 nodes, we recommend that you limit the number of nodes in each batch to 100 or less when you manually scale a node pool, and limit the number of nodes in each batch to 300 or less when you manually scale multiple node pools.
Application pod scaling: If your application is associated with a Service, Endpoint and EndpointSlice updates are pushed to all nodes during scaling activities. The data to be updated increases with the number of nodes in the cluster. If the cluster contains large numbers of nodes, a cluster storm occurs. For a cluster that contains more than 5,000 nodes, we recommend that you limit the update QPS of pods that are not associated with endpoints to 300/s or lower, and limit the update QPS of pods that are associated with endpoints to 10/s or lower. For example, when you claim a pod rolling update policy in a Deployment, we recommend that you set maxUnavailable
and maxSurge
to small values to reduce the pod update frequency.
Optimize the mode in which clients access the cluster
In a Kubernetes cluster, clients obtain cluster resource information from the API server. As resources in the cluster grows, frequent client queries may overload the control planes. Consequently, the control planes may respond slowly or even crash. Therefore, you must plan the size of resources to be accessed and the access frequency. We recommend that you read the following suggestions:
Preferably use informers to access the local cache
Preferably use client-go informers to obtain resources. Retrieve data from the local cache instead of sending LIST requests to the API server. This reduces the loads of the API server.
Optimize the method used to retrieve resources from the API server
Requests that do not hit the local cache are still sent to the API server to retrieve resources. In this scenario, read the following suggestions.
Specify resourceVersion=0
in LIST requests.
resourceVersion
indicates the resource version. When the value is 0
, cache data is retrieved from the API server instead of the etcd. This reduces the frequency of communication between the API server and etcd. LIST requests can be handled much faster. Example:
k8sClient.CoreV1().Pods("").List(metav1.ListOptions{ResourceVersion: "0"})
Avoid listing all resources.
To reduce the volume of the returned data, use a filter to limit the scope of LIST requests. For example, use lable-selector (filter based on resource labels) or field-selector (filter based on resource fields) to filter LIST requests.
Note
etcd is a key-value storage system. etcd cannot filter data by label or field. The API server filters all requests based on the specified filter conditions. When you use filters, we recommend that you set resourceVersion
to 0
for LIST requests. The requested data is retrieved from the cache on the API server instead of the etcd, which reduces the loads of the etcd.
Use protobuf (not JSON) to access non-CRD resources.
The API server can return resource objects in different formats to clients, including JSON and protobuf. By default, when a client sends a Kubernetes API request, Kubernetes returns a serialized JSON object. The content type (Content-Type) of the object is application/json
. The client can request Kubernetes to return data in the protobuf format. Protobuf outperforms JSON in memory usage and data transfer.
However, not all API resource types support the protobuf format. You can specify multiple content types in the Accept
request header, such as application/json
and application/vnd.kubernetes.protobuf
. This way, resources in JSON format are returned when the protobuf format is not supported. For more information, see Alternate representations of resources. Example:
Accept: application/vnd.kubernetes.protobuf, application/json
Use centralized controllers
You need to avoid creating a separate controller on each node to watch the cluster data. Otherwise, when the controllers start up, they send large numbers of LIST requests to the API server at the same time to synchronize the cluster status. This increases the loads of the control planes, compromises service stability, or even causes service interruptions.
To avoid this issues, we recommend that you use centralized controllers. You can create a controller instance that runs on one node or a group of controller instances that run across a few number of nodes for centralized management. The centralized controllers can listen for and handle LIST requests by launching once or a few times. In addition, the centralized controllers need only to maintain a small number of watch connections, which greatly reduces the loads of the API server.
Plan large workloads properly
Disable the feature of automatically mounting the default service account
To ensure that Secrets in pods are synchronously updated, the kubelet creates a watch persistent connection for each Secret. The watch mechanism allows the kubelet to receive Secret update notifications in real time. When an excessive number of watches are created, the watch connections may affect the performance of the control planes.
In Kubernetes versions earlier than 1.22: When you create a pod, if you do not specify a service account, Kubernetes automatically mounts the default service account as the Secret of the pod. Applications in the pod can use the service account to securely communicate with the API server.
For pods of a batch system or pods that do not need to access the API server, we recommend that you explicitly forbid auto service account token mounting. This way, relevant Secrets and watches are not created. For more information, see automountServiceAccountToken. In a large cluster, this operation helps avoid creating unnecessary Secrets and API server watch connections, which reduces the loads of the control planes.
In Kubernetes 1.22 and later: You can use the TokenRequest API to obtain a temporary and automatically-rotated token, and use a projected volume to mount the token. This operation not only enhances the security of Secrets but also reduces watch connections established by the kubelet for the Secret of each service account. This way, the performance of the cluster is guaranteed.
For more information about how to enable serviceAccountToken projected volumes, see Use ServiceAccount token volume projection.
Control the number and size of Kubernetes objects
Delete idle Kubernetes resources, such as ConfigMaps, Secrets, and persistent volume claims (PVCs), at your earliest convenience to reduce system resource usage and ensure the cluster performance. We recommend that you read the following suggestions.
Limit the number of historical Deployment ReplicaSets: revisionHistoryLimit claims the number of historical ReplicaSets kept for a Deployment. If the value is large, Kubernetes keeps an excessive number of historical ReplicaSets. This increases the loads of kube-controller-manager. In a large cluster, if you need to frequently update a large number of Deployments, you can decrease the value of revisionHistoryLimit for Deployments and delete historical ReplicaSets. The default value of revisionHistoryLimit for Deployments is 10.
Delete jobs and pods that you no longer need: If your cluster contains a large number of Job objects created by CronJobs or other mechanisms, use ttlSecondsAfterFinished to automatically delete pods that are created for the jobs within the previous cycle.
Allocate resources to Informer components properly
Informer components are typically used to monitor and synchronize the status of resources in Kubernetes clusters. Informer components establish watch connections to watch the status of API server resources and maintain a local cache for each resource object. This way, changes in resource status can be quickly synchronized.
The memory usage of Informer components, such as controllers and kube-scheduler, depends on the size of their Watch resources. In a large cluster, pay attention to the memory usage of Informer components to avoid Out-of-Memory (OOM) issues. If an Informer component frequently encounters OOM issues, resource listening errors may occur. If an Informer component frequently restarts, each List-Watch operation increases the loads of the control planes (especially the API server).
Pay attention to the metrics of control planes
You can view the metrics of key control plane components and analyze abnormal metrics in the control plane component dashboards. In large clusters, you need to pay close attention to the following metrics. For more information about the usage notes and descriptions of the metrics, see Monitor control plane components.
Resource usage of control plane components
The following table describes the resource usage metrics of control planes components.
Metric | PromQL | Description |
Memory Usage | memory_utilization_byte{container="kube-apiserver"} | The memory usage of kube-apiserver. Unit: bytes. |
CPU Usage | cpu_utilization_core{container="kube-apiserver"}*1000 | The CPU usage of kube-apiserver. Unit: millicores. |
kube-apiserver
For more information about how to view the metrics and their descriptions, see Metrics of kube-apiserver.
Request throttling
Metric | PromQL | Description |
Request Limit Rate | sum(irate(apiserver_dropped_requests_total{request_kind="readOnly"}[$interval])) by (name) sum(irate(apiserver_dropped_requests_total{request_kind="mutating"}[$interval])) by (name) | The throttling rate of kube-apiserver. No data or 0 indicates that request throttling is not triggered. |
kube-scheduler
For more information about how to view the metrics and their descriptions, see Metrics of kube-scheduler.
Number of pending pods
Metric | PromQL | Description |
Scheduler Pending Pods | scheduler_pending_pods{job="ack-scheduler"} | The number of pending pods. Pending pods consist of the following types: unschedulable: unschedulable pods. backoff: backoff queue pods, which are the pods that fail to be scheduled due to specific reasons. active: active queue pods, which are the pods ready to be scheduled.
|
Request latency
Metric | PromQL | Description |
Kube API Request Latency | histogram_quantile($quantile, sum(rate(rest_client_request_duration_seconds_bucket{job="ack-scheduler"}[$interval])) by (verb,url,le)) | The time interval between a request sent by kube-scheduler and a response returned by kube-apiserver. The latency is calculated based on Verbs and URLs. |
kube-controller-manager
For more information about how to view the metrics and their descriptions, see Monitor kube-controller-manager.
Workqueue
Metric | PromQL | Description |
Workqueue depth | sum(rate(workqueue_depth{job="ack-kube-controller-manager"}[$interval])) by (name) | The change of the workqueue length in the specified interval. |
Workqueue processing delay | histogram_quantile($quantile, sum(rate(workqueue_queue_duration_seconds_bucket{job="ack-kube-controller-manager"}[5m])) by (name, le)) | The duration of the events in the workqueue. |
etcd
For more information about how to view the metrics and their descriptions, see Metrics of etcd.