cloud-controller-manager allows Kubernetes core components to interact with cloud service providers by using the Kubernetes API. This topic describes the metrics supported by cloud-controller-mananger, provides usage notes for using the dashboards of cloud-controller-mananger, and provides suggestions on how to troubleshoot common metric anomalies.
Usage notes
Dashboard access
For more information, see View control plane component dashboards in ACK Pro clusters.
Metrics
Metrics can indicate the status and parameter settings of a component. The following table describes the metrics supported by cloud-controller-mananger.
Metric | Type | Description |
ccm_slb_latency_ms | Histogram | The Classic Load Balancer (CLB) synchronization delay. Unit: millisecond. The bucket thresholds are defined as the set |
ccm_node_latency_ms | Histogram | The node synchronization delay. Unit: millisecond. The bucket thresholds are defined as the set |
ccm_route_latency_ms | Histogram | The route synchronization delay. Unit: millisecond. The bucket thresholds are defined as the set |
workqueue_adds_total | Counter | The number of Adds events processed by the workqueue. |
workqueue_depth | Gauge | The length of the workqueue. If the workqueue length remains at a high level for an extended period of time, the controller cannot process tasks in the workqueue in a timely manner, which results in task accumulation. |
workqueue_queue_duration_seconds_bucket | Histogram | The duration for which a task remains in the workqueue. The bucket thresholds are defined as the set {10-8, 10-7, 10-6, 10-5, 10-4, 10-3, 10-2, 10-1, 1, 10}. Unit: seconds. |
memory_utilization_byte | Gauge | The memory usage. Unit: bytes. |
cpu_utilization_core | Gauge | The used CPU capacity. Unit: core. |
rest_client_requests_total | Counter | The number of HTTP requests calculated based on status codes, methods, and hosts. |
rest_client_request_duration_seconds_bucket | Histogram | The HTTP response delay calculated based on Verbs and URLs. |
The following resource utilization metrics are deprecated. Remove any alerts and monitoring data that depend on these metrics at the earliest opportunity:
cpu_utilization_ratio: CPU utilization.
memory_utilization_ratio: Memory utilization.
Usage notes for dashboards
Dashboards are generated based on metrics and Prometheus Query Language (PromQL). The following section describes the observability and features of the dashboards of cloud-controller-mananger.
CCM
Observability
Features
Dashboard | PromQL | Description |
routing synchronization delay | histogram_quantile($quantile, sum(rate(ccm_route_latencies_duration_milliseconds_bucket[$interval])) by (verb, le)) | The route synchronization delay. Unit: millisecond. |
node synchronization delay | histogram_quantile($quantile, sum(rate(ccm_node_latencies_duration_milliseconds_bucket[$interval])) by (verb, le)) | The nod synchronization delay. Unit: millisecond. |
CLB(Classical Load Balancer) synchronization delay | histogram_quantile($quantile, sum(rate(ccm_slb_latencies_duration_milliseconds_bucket[$interval])) by (verb, le)) | The CLB synchronization delay. Unit: millisecond. |
Queue
Observability
Features
Dashboard | PromQL | Description |
Workqueue enqueue rate | sum(rate(workqueue_adds_total{job="ack-cloud-controller-manager"}[$interval])) by (name) | The number of Adds events that are added to the workflow in the specified interval. |
Workqueue depth | workqueue_depth{job="ack-cloud-controller-manager"} | The change of the workqueue length in the specified interval. |
Workqueue processing delay | histogram_quantile($quantile, sum(rate(workqueue_queue_duration_seconds_bucket{job="ack-cloud-controller-manager"}[$interval])) by (name, le)) | The duration of the events in the workqueue. |
Resources
Observability
Features
Dashboard | PromQL | Description |
Memory Usage | memory_utilization_byte{container="cloud-controller-manager"} | The memory usage. Unit: bytes. |
CPU Usage | cpu_utilization_core{container="cloud-controller-manager"}*1000 | The used CPU capacity. Unit: millicore. |
Kube API
Observability
Features
Dashboard | PromQL | Description |
Kube API Request QPS |
| The number of HTTP requests sent by cloud-controller-mananger to kube-apiserver per second. The queries per second (QPS) value is calculated based on Verbs and URLs. |
Common metric anomalies
If the metrics of kube-apiserver become abnormal, check whether the metric anomalies described in the following sections exist. If metric anomalies that are not described in the following sections occur, submit a ticket.
CLB(Classical Load Balancer) synchronization delay
Normal case | Anomaly | Anomaly description | Suggestion |
The CLB synchronization delay is shorter than 10 seconds. | The CLB synchronization delay is longer than 10 seconds. | CLB synchronization is time-consuming. | Check whether abnormal Service events exist. |
Workqueue depth
Normal case | Anomaly | Anomaly description | Suggestion |
The workqueue depth is smaller than 10. | The workqueue depth is greater than 10. | A large number of Services pending for synchronization exist in the workqueue. | If the workqueue is too long, Service synchronization is slowed down. You can update the nodes, pods, and Services in the cluster less frequently. |
References
For more information about the metrics, usage notes for using the dashboards, and suggestions on how to troubleshoot common metric anomalies for other control plane components, see the following topics: Metrics of kube-apiserver, Metrics of etcd, Metrics of kube-scheduler, and Metrics of kube-controller-manager