All Products
Search
Document Center

Container Service for Kubernetes:Suggestions on using large-scale clusters

Last Updated:Oct 16, 2024

The performance and availability of Container Service for Kubernetes (ACK) clusters depend on the amount of cluster resources, resource access frequency, and access mode. The loads and performance of the API server may also vary based on different variations. A large ACK Pro cluster usually contains more than 500 nodes or more than 10,000 pods. The cluster administrator needs to plan and use large ACK clusters based on the actual business scenario and pay close attention to monitoring metrics to ensure the stability and availability of the clusters.

Usage notes for large ACK clusters

Compared with using multiple clusters, using a single large cluster can efficiently simplify cluster O&M and improve resource utilization. In complex business scenarios, we recommend that you split your service into multiple clusters by business logic or service demand. For example, you can split a service into non-production (testing) services and production (development) services or decouple database services from front-end applications.

We recommend that you use multiple clusters instead of creating a large cluster if you have the following requirements.

Requirement

Description

Isolation

You can use multiple clusters to isolate the testing environment from the production environment in case all of your businesses are interrupted when the cluster where your businesses are deployed is down. This reduces the impacts of single points of failure.

Location

Some services must be deployed in a location that is close to the end users to improve service availability and reduce the response latency. In this scenario, we recommend that you deploy multiple clusters across regions.

Cluster size

ACK managed control planes automatically adapt to the cluster size through auto scaling and cluster component optimization. However, the Kubernetes architecture has a performance bottleneck. The availability and performance of ultra-large clusters are not guaranteed. Before you use large clusters, read the Kubernetes scalability thresholds and Kubernetes scalability and performance SLIs/SLOs defined by the Kubernetes community and log on to the Quota Center console to view and increase quotas related to Container Service for Kubernetes. If your businesses exceed the limits of Kubernetes and ACK, split your businesses into multiple clusters.

If you require multi-cluster management, such as application deployment, traffic management, job distribution, and global monitoring, we recommend that you enable Multi-cluster Fleets.

About this topic

This topic is intended for the developers and administrators of ACK Pro clusters. This topic lists suggestions on planning and using large ACK clusters. Your actual cluster environment and business requirements shall prevail.

Note

The shared responsibility model defines that ACK clusters are responsible for the default security of control plane components (including Kubernetes control plane components and etcd) and Alibaba Cloud infrastructure related to cluster Services. You are responsible for the security of applications deployed on the cloud and the security configuration and updates of cloud resources. For more information, see Shared responsibility model.


Use new version clusters

As new Kubernetes versions are released, the list of Kubernetes versions supported by ACK also changes. New Kubernetes versions will be added to the list and outdated Kubernetes versions will be discontinued. ACK stops releasing new features, feature patches, or security patches for outdated Kubernetes versions. ACK only provides limited technical support for these versions.

You can learn information about new updates through documentation, console information, and internal messages, and read update notes for the desired Kubernetes version before you update your clusters. This helps you update your clusters at the earliest opportunity to mitigate security risks and fix stability issues. For more information about cluster updates, see Manually upgrade ACK clusters and Automatically upgrade a cluster. For more information about Kubernetes versions supported by ACK, see Support for Kubernetes versions.

Pay attention to cluster resource limits

The following table describes the limits for ensuring the availability, stability, and performance of large ACK clusters and the corresponding solutions.

Limit

Description

Suggested solution

Maximum etcd size (DB size)

The maximum size of the etcd is 8 GB. When the etcd is excessively large, its performance is compromised, including the read and write latency, system resource usage, and election latency. Consequently, service and data restoration becomes difficult and time-consuming.

Make sure that the etcd size is smaller than 8 GB.

  • Control the total amount of cluster resources and release idle resources.

  • For resources that are frequently updated, we recommend that you limit the size of each resource to less than 100 KB. In the etcd, each update to a key-value pair generates a historical version. In big data computing scenarios that involve frequent updates, historical versions stored in the etcd usually occupy more resources.

Total size of each resource type in the etcd

If large numbers of resource objects exist, an excessive amount of system resources is consumed when a client accesses all resource objects. This may even cause the initialization of the API server or custom controller to fail.

Limit the total size of each resource type to less than 800 MB.

  • When you define a new type of CustomResourceDefinition (CRD), determine the expected number of CustomResources (CRs) in advance to ensure that the size of each CRD is controllable.

  • When you deploy Helm charts, Helm automatically creates releases to track the deployment progress. By default, Helm uses Secrets to store version information. In large ACK clusters, the amount of version information may exceed the maximum Secret size defined by Kubernetes. In this scenario, use the Helm SQL storage backend instead.

Connections and bandwidth of the CLB instance used by an API server

Only Classic Load Balancer (CLB) instances are supported by API servers in ACK clusters. The maximum number of connections and bandwidth supported by a CLB instance are limited. For more information about the maximum number of connections supported by a CLB instance, see Overview of CLB instances. The maximum bandwidth of a CLB instance is 5,120 Mbit/s.

When the connection or bandwidth limit of the CLB instance is exceeded, nodes enter the Not Ready state.

If your cluster contains more than 1,000 nodes, we recommend that you use pay-as-you-go CLB instances.

Note

To accelerate connection establishment and increase the bandwidth, use Elastic Network Interfaces (ENIs) to expose Services in the default namespace of a large cluster. By default, ENIs are used to expose Services in ACK clusters that are created after February 2023 and run Kubernetes versions later than 1.20. For other clusters, submit a ticket to use ENIs to expose Services. For more information, see Kube API Server.

Number of Services per namespace

The kubelet stores Service information in environment variables and injects them into pods that run on the node. This allows pods to discover and communicate with the Services.

If a namespace contains an excessive number of Services, the number of environment variables injected into pods will be big. Consequently, pods may require a long period of time to launch or even fail to launch.

We recommend that you limit the number of Services per namespace to less than 5,000.

You can choose not to specify these environment variables and set enableServiceLinks in the podSpec section to false. For more information, see Accessing the Service.

Total number of Services in a cluster

If you create an excessive number of Services, kube-proxy needs to handle large numbers of network rules. This compromises the performance of kube-proxy.

When the number of LoadBalancer Services grows, the synchronization latency between LoadBalancer Services and Server Load Balancer (SLB) instances also increases. The latency may even reach more than one minute.

We recommend that you limit the total number of Services to less than 10,000.

We recommend that you limit the number of LoadBalancer Services to less than 500.

Maximum number of endpoints per Service

The kube-proxy component runs on each node to watch Service-related updates so that it can update network rules on the node at the earliest opportunity. When a Service has an excessive number of endpoints, a large number of Endpoints objects exist. Each Endpoints object update involves high-volume data transfer between kube-apiserver and kube-proxy. When the size of the cluster grows, more data needs to be updated and the impacts become larger.

Note

To resolve this issue, kube-proxy uses EndpointSlices to improve performance by default in ACK clusters that run Kubernetes versions later than 1.19.

We recommend that you limit the number of backend pods associated with an Endpoints object to less than 3,000.

  • In large clusters, use EndpointSlices instead of Endpoints to split and manage network endpoints. The splitting can efficiently reduce the volume of data transfer for each update.

  • If your custom controller relies on Endpoints objects to make routing decisions, you can keep the Endpoints objects. Make sure that the number of backend pods associated with an Endpoints object is less than 1,000. When the upper limit is exceeded, data in the Endpoints object is automatically truncated. For more information, see Over-capacity endpoints.

Total number of Service endpoints

If a cluster contains an excessive number of endpoints, the API server may be overloaded and the network performance may be compromised.

We recommend that you limit the total number of Service endpoints to less than 64,000.

Number of pending pods

If an excessive number of pending pods exist, newly submitted pods may wait a long period of time before they can be scheduled. During the waiting time, the scheduler periodically generates events and creates an event storm.

We recommend that you limit the total number of pending pods to less than 10,000.

Number of Secrets in a cluster that uses KMS to encrypt Kubernetes Secrets

When Key Management Service (KMS) v1 is used to encrypt data, each encryption generates a data encryption key (DEK). When a Kubernetes cluster starts up, the cluster needs to access and decrypt the Secrets stored in the etcd. If the cluster has an excessive number of secrets, the cluster needs to decrypt large amounts of data during startups or updates. This compromises the performance of the cluster.

We recommend that you limit the number of Secrets in a cluster that uses KMS v1 to encrypt Secrets to less than 2,000.

Configure control plane component parameters properly

ACK Pro clusters allow you to customize the parameters of control plane components. You can customize the parameters of key managed components, such as kube-apiserver, kube-controller-manager, and kube-scheduler. In large clusters, you need to configure the throttling parameters of the control plane components properly.

kube-apiserver

To prevent large numbers of requests from overloading the control planes, kube-apiserver limits the number of concurrent requests that can be processed within a period of time. When the upper limit is exceeded, the API server triggers request throttling and returns HTTP status code 429 to the client. The status code indicates that an excessive number of requests are received and the client has to try again later. If no throttling is configured for the server, the control planes may be overloaded by requests. Consequently, the stability and availability of the entire service cluster are affected. Therefore, we recommend that you configure request throttling on the server to protect the control planes.

Request throttling methods

kube-apiserver supports the following request throttling methods:

  • Versions earlier than v1.18: kube-apiserver can limit only the maximum concurrency. Requests are classified into read requests and write requests. kube-apiserver uses the boot parameters --max-requests-inflight and --max-mutating-requests-inflight to limit the maximum concurrency of read and write requests. This method does not handle requests based on their priorities. Slow requests with low priorities may occupy large amounts of resources and cause API server requests to accumulate. In this scenario, requests with high priorities or urgent requests cannot be handled promptly.

    ACK Pro clusters allow you to customize the max-requests-inflight and max-mutating-requests-inflight parameters of kube-apiserver. For more information, see Customize the parameters of control plane components in ACK Pro clusters.

  • v1.18 and later: The API Priority and Fairness (APF) feature is introduced to manage requests in a more fine-grained manner. This feature can classify and isolate requests based on predefined rules and priorities to ensure that important and urgent requests are prioritized. This feature also uses a fair queuing algorithm to ensure that different types of requests are fairly handled. This feature reaches the Beta stage in Kubernetes 1.20 and is enabled by default.

    View the APF introduction

    In ACK clusters that run Kubernetes 1.20 and later, the maximum number of concurrent requests processed by kube-apiserver is based on the sum of the --max-requests-inflight and --max-mutating-requests-inflight parameters. kube-apiserver uses the FlowSchema and PriorityLevelConfiguration CustomResourceDefinitions (CRDs) to control the concurrency of each type of requests in order to conduct request throttling in a fine-grained manner.

    • PriorityLevelConfiguration: defines a priority level. This determines the share of the available concurrency budget that each priority level can handle.

    • FlowSchema: matches requests to a single PriorityLevelConfiguration.

    PriorityLevelConfigurations and FlowSchemas are maintained by kube-apiserver. Kubernetes clusters automatically generate default PriorityLevelConfigurations and FlowSchemas based on the current Kubernetes version. You can run the following commands to query PriorityLevelConfigurations and FlowSchemas.

    View the command used to query PriorityLevelConfigurations and the output

    kubectl get PriorityLevelConfiguration
    # Expected output:
    NAME              TYPE      ASSUREDCONCURRENCYSHARES   QUEUES   HANDSIZE   QUEUELENGTHLIMIT   AGE
    catch-all         Limited   5                          <none>   <none>     <none>             4m20s
    exempt            Exempt    <none>                     <none>   <none>     <none>             4m20s
    global-default    Limited   20                         128      6          50                 4m20s
    leader-election   Limited   10                         16       4          50                 4m20s
    node-high         Limited   40                         64       6          50                 4m20s
    system            Limited   30                         64       6          50                 4m20s
    workload-high     Limited   40                         128      6          50                 4m20s
    workload-low      Limited   100                        128      6          50                 4m20s

    View the command used to query FlowSchemas and the output

    Note

    In ACK, the ack-system-leader-election and ack-default FlowSchemas that are related to ACK key components are added. The other FlowSchemas are the same as those in Kubernetes.

    kubectl get flowschemas
    # Expected output:
    NAME                           PRIORITYLEVEL     MATCHINGPRECEDENCE   DISTINGUISHERMETHOD   AGE     MISSINGPL
    exempt                         exempt            1                    <none>                4d18h   False
    probes                         exempt            2                    <none>                4d18h   False
    system-leader-election         leader-election   100                  ByUser                4d18h   False
    endpoint-controller            workload-high     150                  ByUser                4d18h   False
    workload-leader-election       leader-election   200                  ByUser                4d18h   False
    system-node-high               node-high         400                  ByUser                4d18h   False
    system-nodes                   system            500                  ByUser                4d18h   False
    ack-system-leader-election     leader-election   700                  ByNamespace           4d18h   False
    ack-default                    workload-high     800                  ByNamespace           4d18h   False
    kube-controller-manager        workload-high     800                  ByNamespace           4d18h   False
    kube-scheduler                 workload-high     800                  ByNamespace           4d18h   False
    kube-system-service-accounts   workload-high     900                  ByNamespace           4d18h   False
    service-accounts               workload-low      9000                 ByUser                4d18h   False
    global-default                 global-default    9900                 ByUser                4d18h   False
    catch-all                      catch-all         10000                ByUser                4d18h   False

Request throttling monitoring and suggested solutions

The client can determine whether the server triggers request throttling based on the status code 429 or the apiserver_flowcontrol_rejected_requests_total metric. When request throttling is triggered, use the following solutions.

  • Monitor the resource usage of the API server: When the resource usage is low, modify the sum of the max-requests-inflight and max-mutating-requests-inflight parameters to increase the total concurrency limit.

    For a cluster that contains more than 500 nodes, we recommend that set the sum to a value between 2000 and 3000. For a cluster that contains more than 3,000 nodes, we recommend that you set the sum to a value between 3000 and 5000.

  • Reconfigure PriorityLevelConfigurations:

    • Requests with high priorities: Create a FlowSchema to match requests that you do not want to throttle to a high-priority PriorityLevelConfiguration. Example: workload-high or exempt. Take note that requests with the exempt priority level are exempted from APF. Proceed with caution. You can configure a new PriorityLevelConfiguration to allocate a larger share of the concurrency budget to requests with high priorities.

    • Requests with low priorities: When the resource usage of the API server is high or the API server responds slowly due to slow requests, you can create a FlowSchema to match these requests to a low-priority PriorityLevelConfiguration.

Important
  • The kube-apiserver component is a managed component in ACK Pro clusters. By default, kube-apiserver uses at least two replicas deployed across zones to ensure high availability. When the resource usage of control planes increases, the number of replicas are scaled to at most six. Concurrency limit of kube-apiserver = Number of replicas × Concurrency limit of each replica.

  • Modifying the custom parameters of kube-apiserver triggers an API server rolling update. This may cause the client controller to reperform the List-Watch operation. In large clusters, the API server may be overloaded. If this issue occurs, your service becomes temporarily unavailable.

kube-controller-manager and kube-scheduler

kube-controller-manager uses the kubeAPIQPS and kubeAPIBurst parameters and kube-scheduler uses the connectionQPS and connectionBurst parameters to control the QPS of communication with the API server. For more information, see Customize the parameters of control plane components in ACK Pro clusters and Custom parameters of kube-scheduler.

  • kube-controller-manager: For a cluster that contains more than 1,000 nodes, we recommend that you set kubeAPIQPS to a value greater than 300 and kubeAPIBurst to a value greater than 500.

  • kube-scheduler: No modification is needed in most cases. When the pod QPS exceeds 300/s, we recommend that you set connectionQPS to 800 and connectionBurst to 1000.

kubelet

The default values of the kube-api-burst and kube-api-qps parameters of the kubelet are 5 and 10. No modification is needed in most cases. When the status of pods in your cluster is updated slowly, pods are scheduled with a latency, or volumes are mounted slowly, we recommend that you increase the values of the parameters. For more information, see Customize the kubelet parameters of a node pool.

Important
  • Increasing the values of the kubelet parameters also increases the QPS of the kubelet for communicating with the API server. When the kubelet sends large numbers of requests, the loads of the API server may increase. We recommend that you increase the values progressively and pay attention to the performance and resource usage of the API server to ensure the stability of the control planes.

  • You need to control the frequency of kubelet updates. To ensure the stability of the control planes during kubelet updates, ACK limits the maximum concurrency of each batch to 10 when you update the kubelet on nodes in a node pool.

Plan the cluster resource scaling frequency

Typically, when a large cluster runs stably, the control planes in the cluster do not have stress. When the cluster initiates an operation on a large scale, such as creating or deleting large amounts of resources or scaling out or scaling in large numbers of nodes, the control planes may be overloaded. As a result, the cluster performance is compromised, the response latency increases, and your services may be interrupted.

For example, a cluster contains 5,000 nodes. If a large number of pods run stably in the cluster for long-term businesses, the loads of the control planes do not increase. However, if the cluster contains 1,000 nodes and you want to create 10,000 temporary jobs or add 2,000 nodes within 1 minute, the loads of the control planes spike.

Therefore, when you perform resource update operations in a large cluster, you need to limit the update frequency based on the status of the cluster to ensure the stability of the cluster and control planes.

We recommend that you perform update operations in the following ways.

Important

The numbers in the following suggestions are only for reference due to factors such as control planes. Increase the update frequency progressively. Make sure that the control planes can respond as normal and then increase the update frequency to the next level.

  • Node scaling: For a cluster that contains more than 2,000 nodes, we recommend that you limit the number of nodes in each batch to 100 or less when you manually scale a node pool, and limit the number of nodes in each batch to 300 or less when you manually scale multiple node pools.

  • Application pod scaling: If your application is associated with a Service, Endpoint and EndpointSlice updates are pushed to all nodes during scaling activities. The data to be updated increases with the number of nodes in the cluster. If the cluster contains large numbers of nodes, a cluster storm occurs. For a cluster that contains more than 5,000 nodes, we recommend that you limit the update QPS of pods that are not associated with endpoints to 300/s or lower, and limit the update QPS of pods that are associated with endpoints to 10/s or lower. For example, when you claim a pod rolling update policy in a Deployment, we recommend that you set maxUnavailable and maxSurge to small values to reduce the pod update frequency.

Optimize the mode in which clients access the cluster

In a Kubernetes cluster, clients such as applications or kubectl clients obtain cluster resource information from the API server. When the amount of resources in the cluster grows but the client still sends requests at the same frequency, the requests may overload the control planes. Consequently, the control planes may respond slowly or even crash. Therefore, you need to plan the size of resources to be accessed and the access frequency before you access resources deployed in a Kubernetes cluster. We recommend that you read the following suggestions when you want to access resources in a large cluster.

Preferably use informers to access the local cache

Preferably use client-go informers to obtain resources. Retrieve data from the local cache instead of sending LIST requests to the API server. This reduces the loads of the API server.

Optimize the method used to retrieve resources from the API server

Requests that do not hit the local cache are still sent to the API server to retrieve resources. In this scenario, read the following suggestions.

  • Specify resourceVersion=0 in LIST requests.

    resourceVersion indicates the resource version. When the value is 0, cache data is retrieved from the API server instead of the etcd. This reduces the frequency of communication between the API server and etcd. LIST requests can be handled much faster. Example:

    k8sClient.CoreV1().Pods("").List(metav1.ListOptions{ResourceVersion: "0"})
  • Avoid listing all resources.

    To reduce the volume of the returned data, use a filter to limit the scope of LIST requests. For example, use lable-selector (filter based on resource labels) or field-selector (filter based on resource fields) to filter LIST requests.

    Note

    etcd is a key-value storage system. etcd cannot filter data by label or field. The API server filters all requests based on the specified filter conditions. When you use filters, we recommend that you set resourceVersion to 0 for LIST requests. The requested data is retrieved from the cache on the API server instead of the etcd, which reduces the loads of the etcd.

  • Use protobuf (not JSON) to access non-CRD resources.

    The API server can return resource objects in different formats to clients, including JSON and protobuf. By default, when a client sends a Kubernetes API request, Kubernetes returns a serialized JSON object. The content type (Content-Type) of the object is application/json. The client can request Kubernetes to return data in the protobuf format. Protobuf outperforms JSON in memory usage and data transfer.

    However, not all API resource types support the protobuf format. You can specify multiple content types in the Accept request header, such as application/json and application/vnd.kubernetes.protobuf. This way, resources in JSON format are returned when the protobuf format is not supported. For more information, see Alternate representations of resources. Example:

    Accept: application/vnd.kubernetes.protobuf, application/json

Use centralized controllers

You need to avoid creating a separate controller on each node to watch the cluster data. Otherwise, when the controllers start up, they send large numbers of LIST requests to the API server at the same time to synchronize the cluster status. This increases the loads of the control planes, compromises service stability, or even causes service interruptions.

To avoid this issues, we recommend that you use centralized controllers. You can create a controller instance that runs on one node or a group of controller instances that run across a few number of nodes for centralized management. The centralized controllers can listen for and handle LIST requests by launching once or a few times. In addition, the centralized controllers need only to maintain a small number of watch connections, which greatly reduces the loads of the API server.

Plan large workloads properly

Disable the feature of automatically mounting the default service account

To ensure that Secrets in pods are synchronously updated, the kubelet creates a watch persistent connection for each Secret. The watch mechanism allows the kubelet to receive Secret update notifications in real time. When an excessive number of watches are created, the watch connections may affect the performance of the control planes.

  • In Kubernetes versions earlier than 1.22: When you create a pod, if you do not specify a service account, Kubernetes automatically mounts the default service account as the Secret of the pod. Applications in the pod can use the service account to securely communicate with the API server.

    For pods of a batch system or pods that do not need to access the API server, we recommend that you explicitly forbid auto service account token mounting. This way, relevant Secrets and watches are not created. For more information, see automountServiceAccountToken. In a large cluster, this operation helps avoid creating unnecessary Secrets and API server watch connections, which reduces the loads of the control planes.

  • In Kubernetes 1.22 and later: You can use the TokenRequest API to obtain a temporary and automatically-rotated token, and use a projected volume to mount the token. This operation not only enhances the security of Secrets but also reduces watch connections established by the kubelet for the Secret of each service account. This way, the performance of the cluster is guaranteed.

    For more information about how to enable serviceAccountToken projected volumes, see Use ServiceAccount token volume projection.

Control the number and size of Kubernetes objects

Delete idle Kubernetes resources, such as ConfigMaps, Secrets, and persistent volume claims (PVCs), at your earliest convenience to reduce system resource usage and ensure the cluster performance. We recommend that you read the following suggestions.

  • Limit the number of historical Deployment ReplicaSets: revisionHistoryLimit claims the number of historical ReplicaSets kept for a Deployment. If the value is large, Kubernetes keeps an excessive number of historical ReplicaSets. This increases the loads of kube-controller-manager. In a large cluster, if you need to frequently update a large number of Deployments, you can decrease the value of revisionHistoryLimit for Deployments and delete historical ReplicaSets. The default value of revisionHistoryLimit for Deployments is 10.

  • Delete jobs and pods that you no longer need: If your cluster contains a large number of Job objects created by CronJobs or other mechanisms, use ttlSecondsAfterFinished to automatically delete pods that are created for the jobs within the previous cycle.

Allocate resources to Informer components properly

Informer components are typically used to monitor and synchronize the status of resources in Kubernetes clusters. Informer components establish watch connections to watch the status of API server resources and maintain a local cache for each resource object. This way, changes in resource status can be quickly synchronized.

The memory usage of Informer components, such as controllers and kube-scheduler, depends on the size of their Watch resources. In a large cluster, pay attention to the memory usage of Informer components to avoid Out-of-Memory (OOM) issues. If an Informer component frequently encounters OOM issues, resource listening errors may occur. If an Informer component frequently restarts, each List-Watch operation increases the loads of the control planes (especially the API server).

Pay attention to the metrics of control planes

You can view the metrics of key control plane components and analyze abnormal metrics in the control plane component dashboards. In large clusters, you need to pay close attention to the following metrics. For more information about the usage notes and descriptions of the metrics, see Monitor control plane components.

Resource usage of control plane components

The following table describes the resource usage metrics of control planes components.

Metric

PromQL

Description

Memory Usage

memory_utilization_byte{container="kube-apiserver"}

The memory usage of kube-apiserver. Unit: bytes.

CPU Usage

cpu_utilization_core{container="kube-apiserver"}*1000

The CPU usage of kube-apiserver. Unit: millicores.

kube-apiserver

For more information about how to view the metrics and their descriptions, see Metrics of kube-apiserver.

  • Number of resource objects

    Metric

    PromQL

    Description

    Number of resource objects

    • max by(resource)(apiserver_storage_objects)

    • max by(resource)(etcd_object_counts)

    • The metric name is apiserver_storage_objects if your ACK cluster runs Kubernetes 1.22 or later.

    • The metric name is etcd_object_counts if your ACK cluster runs Kubernetes 1.22 or earlier.

    Note

    Due to compatibility issues, both the apiserver_storage_objects and etcd_object_counts metrics exist in Kubernetes 1.22.

  • Request latency

    Metric

    PromQL

    Description

    GET read request delay P[0.9]

    histogram_quantile($quantile, sum(irate(apiserver_request_duration_seconds_bucket{verb="GET",resource!="",subresource!~"log|proxy"}[$interval])) by (pod, verb, resource, subresource, scope, le))

    The response time of GET requests displayed based on the following dimensions: API server pods, GET verb, resources, and scope.

    LIST read request delay P[0.9]

    histogram_quantile($quantile, sum(irate(apiserver_request_duration_seconds_bucket{verb="LIST"}[$interval])) by (pod_name, verb, resource, scope, le))

    The response time of LIST requests displayed based on the following dimensions: API server pods, LIST verb, resources, and scope.

    Write request delay P[0.9]

    histogram_quantile($quantile, sum(irate(apiserver_request_duration_seconds_bucket{verb!~"GET|WATCH|LIST|CONNECT"}[$interval])) by (cluster, pod_name, verb, resource, scope, le))

    The response time of Mutating requests displayed based on the following dimensions: API server pods, verbs such as GET, WATCH, LIST, and CONNECT, resources, and scope.

  • Request throttling

    Metric

    PromQL

    Description

    Request Limit Rate

    sum(irate(apiserver_dropped_requests_total{request_kind="readOnly"}[$interval])) by (name)

    sum(irate(apiserver_dropped_requests_total{request_kind="mutating"}[$interval])) by (name)

    The throttling rate of kube-apiserver. No data or 0 indicates that request throttling is not triggered.

kube-scheduler

For more information about how to view the metrics and their descriptions, see Metrics of kube-scheduler.

  • Number of pending pods

    Metric

    PromQL

    Description

    Scheduler Pending Pods

    scheduler_pending_pods{job="ack-scheduler"}

    The number of pending pods. Pending pods consist of the following types:

    • unschedulable: unschedulable pods.

    • backoff: backoff queue pods, which are the pods that fail to be scheduled due to specific reasons.

    • active: active queue pods, which are the pods ready to be scheduled.

  • Request latency

    Metric

    PromQL

    Description

    Kube API Request Latency

    histogram_quantile($quantile, sum(rate(rest_client_request_duration_seconds_bucket{job="ack-scheduler"}[$interval])) by (verb,url,le))

    The time interval between a request sent by kube-scheduler and a response returned by kube-apiserver. The latency is calculated based on Verbs and URLs.

kube-controller-manager

For more information about how to view the metrics and their descriptions, see Monitor kube-controller-manager.

Workqueue

Metric

PromQL

Description

Workqueue depth

sum(rate(workqueue_depth{job="ack-kube-controller-manager"}[$interval])) by (name)

The change of the workqueue length in the specified interval.

Workqueue processing delay

histogram_quantile($quantile, sum(rate(workqueue_queue_duration_seconds_bucket{job="ack-kube-controller-manager"}[5m])) by (name, le))

The duration of the events in the workqueue.

etcd

For more information about how to view the metrics and their descriptions, see Metrics of etcd.

  • Total number of key-value pairs

    Metric

    PromQL

    Description

    total kv

    etcd_debugging_mvcc_keys_total

    The total number of key-value pairs in the etcd cluster.

  • etcd size (DB size)

    Metric

    PromQL

    Description

    Disk Size

    etcd_mvcc_db_total_size_in_bytes

    The size of the etcd backend database.

    etcd_mvcc_db_total_size_in_use_in_bytes

    The usage of the etcd backend database.

References