Monitoring metrics and dashboards of kube-apiserver

Updated at: 2024-09-02 09:56

The kube-apiserver component provides RESTful APIs of Kubernetes to allow external clients and other components in a Container Service for Kubernetes (ACK) cluster to interact with the ACK cluster. This topic describes the monitoring metrics of the kube-apiserver component. This topic also describes how to use monitoring dashboards and how to handle metric anomalies.

Usage notes

Access method

For more information, see View control plane component dashboards in ACK Pro clusters.


Metrics can indicate the status and parameter settings of a component. The following table describes the metrics supported by kube-apiserver.









The latency between a request sent to kube-apiserver and a response returned by kube-apiserver.

Requests are classified based on the following dimensions:

  • Verb: the type of the request, such as GET, POST, PUT, and DELETE.

  • Group: the API group, which contains related API operations used to extend the Kubernetes API.

  • Version: the API version, such as v1 and v1beta1.

  • Resource: the type of the resource that the request is sent to access, such as pod, Service, and lease.

  • Subresource: the subresources of the resource, such as pod details and pod logs.

  • Scope: the scope of the request, such as resources in a namespace or resources in a cluster.

  • Component: the name of the component that initiates the request, such as kube-controller-manager, kube-scheduler, or cloud-controller-manager.

  • Client: the client that sends the request, which may be an internal component or an external service.

The buckets in the histogram of kube-scheduler include 0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.25, 1.5, 1.75, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 40, 50, and 60. Unit: seconds.



The number of different requests received by kube-apiserver. Requests are classified based on verbs, groups, versions, resource, scope, component, HTTP content types, HTTP status code, and clients.



The number of LIST requests that are sent to kube-apiserver and for which the resourceVersion parameter is not specified. This metric is used to check whether an excessive number of LIST requests of the quorum read type are sent to kube-apiserver and locate the client that sends such requests. This can help optimize client behavior and improve cluster performance. Requests are classified based on groups, versions, resources, scopes, and clients.



The number of requests that are being processed by kube-apiserver. Requests are classified into the following types:

  • ReadOnly: This type of requests does not change the status of clusters. In most cases, this type of requests is sent to read resources in clusters, such as querying pods and querying node status.

  • Mutating: This type of requests changes the status of clusters. In most cases, this type of requests is sent to create, update, or delete resources, such as creating a pod or updating the configurations of a Service.



The number of requests that are dropped when throttling is performed on kube-apiserver. A request is dropped if the 429 'Try again later' HTTP status code is returned.



The latency between a request sent from kube-apiserver and a response returned by etcd.

Requests are classified based on operations and operation types.

The buckets in the histogram include 0.005, 0.025, 0.05, 0.1, 0.2, 0.4, 0.6, 0.8, 1.0, 1.25, 1.5, 2, 3, 4, 5, 6, 8, 10, 15, 20, 30, 45, and 60. Unit: seconds.



The processing latency of the admission controller. The histogram is identified by the admission controller name, operation such as CREATE, UPDATE, or CONNECT, API resource, operation type such as validate or admit, and whether the request is denied.

The buckets of the histogram include 0.005, 0.025, 0.1, 0.5, and 2.5. Unit: seconds.



The processing latency of the admission webhook. The histogram is identified by the admission controller name, operation such as CREATE, UPDATE, or CONNECT, API resource, operation type such as validate or admit, and whether the request is denied.

The buckets of the histogram include 0.005, 0.025, 0.1, 0.5, and 2.5. Unit: seconds.



The number of requests processed by the admission webhook. The histogram is identified by the admission controller name, operation such as CREATE, UPDATE, or CONNECT, API resource, operation type such as validate or admit, and whether the request is denied.



The number of used CPU cores. Unit: cores.



The amount of used memory. Unit: bytes.



Indicates whether kube-apiserver is available.

  • 1: kube-apiserver is available.

  • 0: kube-apiserver is unavailable.


The following resource utilization metrics are deprecated. Remove any alerts and monitoring data that depend on these metrics at the earliest opportunity:

  • cpu_utilization_ratio: CPU utilization.

  • memory_utilization_ratio: Memory utilization.

Usage notes for dashboards

Dashboards are generated based on metrics and Prometheus Query Language (PromQL). The following sections describe the kube-apiserver dashboards for key metrics, cluster-level overview, resource analysis, queries per second (QPS) and latency, admission controller and webhook, and client analysis.

In most cases, these dashboards are used in the following sequence:

  1. View the key metrics dashboards to quickly view cluster performance statistics.

  2. View the cluster-level overview dashboards to analyze the response latency of kube-apiserver, the number of requests that are being processed by kube-apiserver, and whether request throttling is triggered.

  3. View the resource analysis dashboards to check the resource usage of the managed components.

  4. View the QPS and latency dashboards to analyze the QPS and response time based on multiple dimensions.

  5. View the admission controller and webhook dashboards to analyze the QPS and response time of the admission controller and webhook.

  6. View the client analysis dashboards to analyze the client QPS based on multiple dimensions.


Multiple filters are displayed above the dashboards. You can use the following filters to filter requests sent to kube-apiserver based on verbs and resources, modify the quantile, and change the PromQL sampling interval.


To change the quantile, use the quantile filter. For example, if you select 0.9, 90% of the sample values of a metric are used as sample values in the histogram. A value of 0.9 (P90) can help eliminate the impacts of long-tail samples, which are only a small portion of the total sample values. A value of 0.99 (P99) includes long-tail samples.


The following filters are used to specify the time range and update interval.筛选框2

Key metrics











The QPS of the kube-apiserver.

Read Request Success Rate


The success rate of read requests sent to kube-apiserver.

Write Request Success Rate


The success rate of write requests sent to kube-apiserver.

Number of read requests processed


The number of read requests that are being processed by kube-apiserver.

Number of write requests processed


The number of write requests that are being processed by kube-apiserver.

Request Limit Rate


The ratio of the number of dropped requests to the total number of requests sent to kube-apiserver when the request throttling is performed on kube-apiserver.

Cluster-level overview

Observability 50








GET read request delay P[0.9]

histogram_quantile($quantile, sum(irate(apiserver_request_duration_seconds_bucket{verb="GET",resource!="",subresource!~"log|proxy"}[$interval])) by (pod, verb, resource, subresource, scope, le))

The response time of GET requests displayed based on the following dimensions: API server pods, GET verb, resources, and scope.

LIST read request delay P[0.9]

histogram_quantile($quantile, sum(irate(apiserver_request_duration_seconds_bucket{verb="LIST"}[$interval])) by (pod_name, verb, resource, scope, le))

The response time of LIST requests displayed based on the following dimensions: API server pods, LIST verb, resources, and scope.

Write request delay P[0.9]

histogram_quantile($quantile, sum(irate(apiserver_request_duration_seconds_bucket{verb!~"GET|WATCH|LIST|CONNECT"}[$interval])) by (cluster, pod_name, verb, resource, scope, le))

The response time of Mutating requests displayed based on the following dimensions: API server pods, verbs such as GET, WATCH, LIST, and CONNECT, resources, and scope.

Number of read requests processed


The number of read requests that are being processed by kube-apiserver.

Number of write requests processed


The number of write requests that are being processed by kube-apiserver.

Request Limit Rate

sum(irate(apiserver_dropped_requests_total{request_kind="readOnly"}[$interval])) by (name)

sum(irate(apiserver_dropped_requests_total{request_kind="mutating"}[$interval])) by (name)

The throttling rate of kube-apiserver. No data or 0 indicates that request throttling is not triggered.

Resource analysis









Memory Usage


The memory usage of kube-apiserver. Unit: bytes.

CPU Usage


The CPU usage of kube-apiserver. Unit: millicores.

Number of resource objects

  • max by(resource)(apiserver_storage_objects)

  • max by(resource)(etcd_object_counts)

  • The metric name is apiserver_storage_objects if your ACK cluster runs Kubernetes 1.22 or later.

  • The metric name is etcd_object_counts if your ACK cluster runs Kubernetes 1.22 or earlier.


Due to compatibility issues, both the apiserver_storage_objects and etcd_object_counts metrics exist in Kubernetes 1.22.

QPS and latency

Observability 48








Analyze QPS [All] P[0.9] by Verb dimension


The QPS calculated based on verbs.

Analyze QPS [All] P[0.9] by Verb Resource dimension


The QPS calculated based on verbs and resources.

Analyze request latency by Verb dimension [All] P[0.9]

histogram_quantile($quantile, sum(irate(apiserver_request_duration_seconds_bucket{verb=~"$verb", verb!~"WATCH|CONNECT",resource!=""}[$interval])) by (le,verb))

The response latency calculated based on verbs.

Analyze request latency by Verb Resource dimension [All] P[0.9]

histogram_quantile($quantile, sum(irate(apiserver_request_duration_seconds_bucket{verb=~"$verb", verb!~"WATCH|CONNECT", resource=~"$resource",resource!=""}[$interval])) by (le,verb,resource))

The response latency calculated based on verbs and resources.

Read request QPS [5m] for non-2xx return values

sum(irate(apiserver_request_total{verb=~"GET|LIST",resource=~"$resource",code!~"2.*"}[$interval])) by (verb,resource,code)

The QPS of read requests for which HTTP status codes other than 2xx, such as 4xx or 5xx, are returned.

QPS [5m] for write requests with non-2xx return values

sum(irate(apiserver_request_total{verb!~"GET|LIST|WATCH",verb=~"$verb",resource=~"$resource",code!~"2.*"}[$interval])) by (verb,resource,code)

The QPS of write requests for which HTTP status codes other than 2xx, such as 4xx or 5xx, are returned.

Apiserver to etcd request latency [5m]

histogram_quantile($quantile, sum(irate(etcd_request_duration_seconds_bucket[$interval])) by (le,operation,type,instance))

The latency of requests sent from kube-apiserver to etcd.

Admission controller and webhook









Admission controller delay [admit]

histogram_quantile($quantile, sum by(operation, name, le, type, rejected) (irate(apiserver_admission_controller_admission_duration_seconds_bucket{type="admit"}[$interval])) )

The statistics about the name of the admission controller of the admit type, the performed operations, whether the operations are denied, and the duration of the operations.

The buckets of the histogram include 0.005, 0.025, 0.1, 0.5, and 2.5. Unit: seconds.

Admission Controller Delay [validate]

histogram_quantile($quantile, sum by(operation, name, le, type, rejected) (irate(apiserver_admission_controller_admission_duration_seconds_bucket{type="validate"}[$interval])) )

The statistics about the name of the admission controller of the validate type, the performed operations, whether the operations are denied, and the duration of the operations.

The buckets of the histogram include 0.005, 0.025, 0.1, 0.5, and 2.5. Unit: seconds.

Admission Webhook delay [admit]

histogram_quantile($quantile, sum by(operation, name, le, type, rejected) (irate(apiserver_admission_webhook_admission_duration_seconds_bucket{type="admit"}[$interval])) )

The statistics about the name of the admission webhook of the admit type, the performed operations, whether the operations are denied, and the duration of the operations.

The buckets of the histogram include 0.005, 0.025, 0.1, 0.5, and 2.5. Unit: seconds.

Admission Webhook Delay [validating]

histogram_quantile($quantile, sum by(operation, name, le, type, rejected) (irate(apiserver_admission_webhook_admission_duration_seconds_bucket{type="validating"}[$interval])) )

The statistics about the name of the admission webhook of the validating type, the performed operations, whether the operations are denied, and the duration of the operations.

The buckets of the histogram include 0.005, 0.025, 0.1, 0.5, and 2.5. Unit: seconds.

Admission Webhook Request QPS


The QPS of the admission webhook.

Client analysis









Analyze QPS by Client dimension

sum(irate(apiserver_request_total{client!=""}[$interval])) by (client)

The QPS statistics based on clients. This can help you analyze the clients that access kube-apiserver and the QPS.

Analyze QPS by Verb Resource Client dimension [All]

sum(irate(apiserver_request_total{client!="",verb=~"$verb", resource=~"$resource"}[$interval]))by(verb,resource,client)

The QPS statistics based on verbs, resources, and clients.

Analyze LIST request QPS by Verb Resource Client dimension (no resourceVersion field)


  • The QPS of LIST requests based on verbs, resources, and clients. The resourceVersion parameter is not specified in such requests.

  • You can analyze and optimize the LIST operations performed by clients based on the LIST requests sent to kube-apiserver and the LIST requests that retrieve data from etcd.

Common metric anomalies

If the metrics of kube-apiserver become abnormal, check whether the metric anomalies described in the following sections exist. If metric anomalies that are not described in the following sections occur, submit a ticket.

Success rate of read/write requests

Case description







The values of Read Request Success Rate and Write Request Success Rate are close to 100%.

The values of Read Request Success Rate and Write Request Success Rate are small. For example, the values are smaller than 90%.

A large number of requests for which HTTP status codes other than 200 are returned exist.

Recommended solution

Check Read request QPS [5m] for non-2xx return values and QPS [5m] for write requests with non-2xx return values for request types and resources that cause kube-apiserver to return HTTP status codes other than 2xx. Evaluate whether such requests meet your expectations and optimize the requests based on the evaluation results. For example, if GET/deployment 404 exists, GET Deployment requests for which the HTTP status code 404 is returned exist. This decreases the value of Read Request Success Rate.

Latency of GET/LIST requests and latency of write requests

Case description







The values of GET read request delay P[0.9], LIST read request delay P[0.9], and Write request delay P[0.9] vary based on the amount of resources to be accessed in the cluster and the cluster size. Therefore, no specific threshold can be used to identify anomalies. All cases are acceptable if your workloads are not adversely affected. For example, if the number of requests that are sent to access a specific type of resource increases, the latency of LIST requests increases. In most cases, the values of GET read request delay P[0.9] and Write request latency delay P[0.9] are smaller than 1 second, and the value of LIST read request delay P[0.9] is greater than 5 seconds.

  • The values of GET read request delay P[0.9] and Write request latency delay P[0.9] are greater than 1 second.

  • The value of LIST read request delay P[0.9] is greater than 5 seconds.

Check whether the response latency increases due to the admission webhook that cannot promptly respond or the increase in requests sent from clients that access the resources.

Recommended solution

  • Check GET read request delay P[0.9], LIST read request delay P[0.9], and Write request latency delay P[0.9] for request types and resources that cause kube-apiserver to return HTTP status codes other than 2xx. Evaluate whether such requests meet your expectations and optimize the requests based on the evaluation results.

    The upper limit of the apiserver_request_duration_seconds_bucket metric is 60 seconds. Response latencies that are longer than 60 seconds are rounded down to 60 seconds. Pod access requests POST pod/exec and log retrieval requests create persistent connections. The response latency of these requests is longer than 60 seconds. Therefore, you can ignore these requests when you analyze requests.

  • Analyze whether the response latency of kube-apiserver increases due to the admission webhook that cannot promptly respond. For more information, see the Admission webhook latency section of this topic.

Number of read or write requests that are being processed and dropped requests

Case description







In most cases, if the values of Number of read requests processed or Number of write requests processed are smaller than 100, and the value of Request Limit Rate is 0, no anomaly occurs.

  • The values of Number of read requests processed and Number of write requests processed are greater than 100.

  • The value of Request Limit Rate is greater than 0.

The request queue is full. Check whether the issue is caused by temporary request spikes or the admission webhook that cannot promptly respond. If the number of pending requests exceeds the length of the queue, kube-apiserver triggers request throttling and the value of Request Limit Rate exceeds 0. As a result, the stability of the cluster is affected.

Recommended solution

  • View the QPS and latency and client analysis dashboards. Check whether the top requests are necessary. If the requests are generated by workloads, check whether you can reduce the number of similar requests.

  • Analyze whether the response latency of kube-apiserver increases due to the admission webhook that cannot promptly respond. For more information, see the Admission webhook latency section of this topic

  • If the value of Request Limit Rate remains greater than 0, submit a ticket for technical support.

Admission webhook latency

Case description







The value of Admission Webhook Delay is smaller than 0.5 seconds.

The value of Admission Webhook Delay remains greater than 0.5 seconds.

If the admission webhook cannot promptly respond, the response latency of kube-apiserver increases.

Recommended solution

Analyze the admission webhook logs and check whether the webhooks work as expected. If you no longer need a webhook, uninstall it.


