Use metrics and dashboards of cloud-controller-mananger - Container Service for Kubernetes

cloud-controller-manager allows Kubernetes core components to interact with cloud service providers by using the Kubernetes API. This topic describes the metrics supported by cloud-controller-mananger, provides usage notes for using the dashboards of cloud-controller-mananger, and provides suggestions on how to troubleshoot common metric anomalies.

Usage notes

Dashboard access

For more information, see View control plane component dashboards in ACK Pro clusters.

Metrics

Metrics can indicate the status and parameter settings of a component. The following table describes the metrics supported by cloud-controller-mananger.

Metric	Type	Description
ccm_slb_latency_ms	Histogram	The Classic Load Balancer (CLB) synchronization delay. Unit: millisecond. The bucket thresholds are defined as the set `{100, 200, 300, 400, 500, 600, 700, 800, 900, 1000,1500, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, and 10000}`.
ccm_node_latency_ms	Histogram	The node synchronization delay. Unit: millisecond. The bucket thresholds are defined as the set `{100, 200, 300, 400, 500, 600, 700, 800, 900, 1000,1500, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, and 10000}`.
ccm_route_latency_ms	Histogram	The route synchronization delay. Unit: millisecond. The bucket thresholds are defined as the set `{100, 200, 300, 400, 500, 600, 700, 800, 900, 1000,1500, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, and 10000}`.
workqueue_adds_total	Counter	The number of Adds events processed by the workqueue.
workqueue_depth	Gauge	The length of the workqueue. If the workqueue length remains at a high level for an extended period of time, the controller cannot process tasks in the workqueue in a timely manner, which results in task accumulation.
workqueue_queue_duration_seconds_bucket	Histogram	The duration for which a task remains in the workqueue. The bucket thresholds are defined as the set {10^-8, 10^-7, 10^-6, 10^-5, 10^-4, 10^-3, 10^-2, 10^-1, 1, 10}. Unit: seconds.
memory_utilization_byte	Gauge	The memory usage. Unit: bytes.
cpu_utilization_core	Gauge	The used CPU capacity. Unit: core.
rest_client_requests_total	Counter	The number of HTTP requests calculated based on status codes, methods, and hosts.
rest_client_request_duration_seconds_bucket	Histogram	The HTTP response delay calculated based on Verbs and URLs.

Note

The following resource utilization metrics are deprecated. Remove any alerts and monitoring data that depend on these metrics at the earliest opportunity:

cpu_utilization_ratio: CPU utilization.
memory_utilization_ratio: Memory utilization.

Usage notes for dashboards

Dashboards are generated based on metrics and Prometheus Query Language (PromQL). The following section describes the observability and features of the dashboards of cloud-controller-mananger.

CCM

Observability

Features

Dashboard	PromQL	Description
routing synchronization delay	histogram_quantile($quantile, sum(rate(ccm_route_latencies_duration_milliseconds_bucket[$interval])) by (verb, le))	The route synchronization delay. Unit: millisecond.
node synchronization delay	histogram_quantile($quantile, sum(rate(ccm_node_latencies_duration_milliseconds_bucket[$interval])) by (verb, le))	The nod synchronization delay. Unit: millisecond.
CLB(Classical Load Balancer) synchronization delay	histogram_quantile($quantile, sum(rate(ccm_slb_latencies_duration_milliseconds_bucket[$interval])) by (verb, le))	The CLB synchronization delay. Unit: millisecond.

Queue

Observability

Features

Dashboard	PromQL	Description
Workqueue enqueue rate	sum(rate(workqueue_adds_total{job="ack-cloud-controller-manager"}[$interval])) by (name)	The number of Adds events that are added to the workflow in the specified interval.
Workqueue depth	workqueue_depth{job="ack-cloud-controller-manager"}	The change of the workqueue length in the specified interval.
Workqueue processing delay	histogram_quantile($quantile, sum(rate(workqueue_queue_duration_seconds_bucket{job="ack-cloud-controller-manager"}[$interval])) by (name, le))	The duration of the events in the workqueue.

Resources

Observability

Features

Dashboard	PromQL	Description
Memory Usage	memory_utilization_byte{container="cloud-controller-manager"}	The memory usage. Unit: bytes.
CPU Usage	cpu_utilization_core{container="cloud-controller-manager"}*1000	The used CPU capacity. Unit: millicore.

Kube API

Observability

Features

Dashboard	PromQL	Description
Kube API Request QPS	sum(rate(rest_client_requests_total{job="ack-cloud-controller-manager",code=~"2.."}[$interval])) by (method,code) sum(rate(rest_client_requests_total{job="ack-cloud-controller-manager",code=~"3.."}[$interval])) by (method,code) sum(rate(rest_client_requests_total{job="ack-cloud-controller-manager",code=~"4.."}[$interval])) by (method,code) sum(rate(rest_client_requests_total{job="ack-cloud-controller-manager",code=~"5.."}[$interval])) by (method,code)	The number of HTTP requests sent by cloud-controller-mananger to kube-apiserver per second. The queries per second (QPS) value is calculated based on Verbs and URLs.

Common metric anomalies

If the metrics of kube-apiserver become abnormal, check whether the metric anomalies described in the following sections exist. If metric anomalies that are not described in the following sections occur, submit a ticket.

CLB(Classical Load Balancer) synchronization delay

Normal case	Anomaly	Anomaly description	Suggestion
The CLB synchronization delay is shorter than 10 seconds.	The CLB synchronization delay is longer than 10 seconds.	CLB synchronization is time-consuming.	Check whether abnormal Service events exist.

Workqueue depth

Normal case	Anomaly	Anomaly description	Suggestion
The workqueue depth is smaller than 10.	The workqueue depth is greater than 10.	A large number of Services pending for synchronization exist in the workqueue.	If the workqueue is too long, Service synchronization is slowed down. You can update the nodes, pods, and Services in the cluster less frequently.

References

For more information about the metrics, usage notes for using the dashboards, and suggestions on how to troubleshoot common metric anomalies for other control plane components, see the following topics: Metrics of kube-apiserver, Metrics of etcd, Metrics of kube-scheduler, and Metrics of kube-controller-manager