All Products
Search
Document Center

Container Service for Kubernetes:Use metrics and dashboards of cloud-controller-mananger

Last Updated:Sep 02, 2024

cloud-controller-manager allows Kubernetes core components to interact with cloud service providers by using the Kubernetes API. This topic describes the metrics supported by cloud-controller-mananger, provides usage notes for using the dashboards of cloud-controller-mananger, and provides suggestions on how to troubleshoot common metric anomalies.

Usage notes

Dashboard access

For more information, see View control plane component dashboards in ACK Pro clusters.

Metrics

Metrics can indicate the status and parameter settings of a component. The following table describes the metrics supported by cloud-controller-mananger.

Metric

Type

Description

ccm_slb_latency_ms

Histogram

The Classic Load Balancer (CLB) synchronization delay. Unit: millisecond.

The bucket thresholds are defined as the set {100, 200, 300, 400, 500, 600, 700, 800, 900, 1000,1500, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, and 10000}.

ccm_node_latency_ms

Histogram

The node synchronization delay. Unit: millisecond.

The bucket thresholds are defined as the set {100, 200, 300, 400, 500, 600, 700, 800, 900, 1000,1500, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, and 10000}.

ccm_route_latency_ms

Histogram

The route synchronization delay. Unit: millisecond.

The bucket thresholds are defined as the set {100, 200, 300, 400, 500, 600, 700, 800, 900, 1000,1500, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, and 10000}.

workqueue_adds_total

Counter

The number of Adds events processed by the workqueue.

workqueue_depth

Gauge

The length of the workqueue. If the workqueue length remains at a high level for an extended period of time, the controller cannot process tasks in the workqueue in a timely manner, which results in task accumulation.

workqueue_queue_duration_seconds_bucket

Histogram

The duration for which a task remains in the workqueue. The bucket thresholds are defined as the set {10-8, 10-7, 10-6, 10-5, 10-4, 10-3, 10-2, 10-1, 1, 10}. Unit: seconds.

memory_utilization_byte

Gauge

The memory usage. Unit: bytes.

cpu_utilization_core

Gauge

The used CPU capacity. Unit: core.

rest_client_requests_total

Counter

The number of HTTP requests calculated based on status codes, methods, and hosts.

rest_client_request_duration_seconds_bucket

Histogram

The HTTP response delay calculated based on Verbs and URLs.

Note

The following resource utilization metrics are deprecated. Remove any alerts and monitoring data that depend on these metrics at the earliest opportunity:

  • cpu_utilization_ratio: CPU utilization.

  • memory_utilization_ratio: Memory utilization.

Usage notes for dashboards

Dashboards are generated based on metrics and Prometheus Query Language (PromQL). The following section describes the observability and features of the dashboards of cloud-controller-mananger.

CCM

Observabilityccm1

Features

Dashboard

PromQL

Description

routing synchronization delay

histogram_quantile($quantile, sum(rate(ccm_route_latencies_duration_milliseconds_bucket[$interval])) by (verb, le))

The route synchronization delay. Unit: millisecond.

node synchronization delay

histogram_quantile($quantile, sum(rate(ccm_node_latencies_duration_milliseconds_bucket[$interval])) by (verb, le))

The nod synchronization delay. Unit: millisecond.

CLB(Classical Load Balancer) synchronization delay

histogram_quantile($quantile, sum(rate(ccm_slb_latencies_duration_milliseconds_bucket[$interval])) by (verb, le))

The CLB synchronization delay. Unit: millisecond.

Queue

Observabilityccm2

Features

Dashboard

PromQL

Description

Workqueue enqueue rate

sum(rate(workqueue_adds_total{job="ack-cloud-controller-manager"}[$interval])) by (name)

The number of Adds events that are added to the workflow in the specified interval.

Workqueue depth

workqueue_depth{job="ack-cloud-controller-manager"}

The change of the workqueue length in the specified interval.

Workqueue processing delay

histogram_quantile($quantile, sum(rate(workqueue_queue_duration_seconds_bucket{job="ack-cloud-controller-manager"}[$interval])) by (name, le))

The duration of the events in the workqueue.

Resources

Observabilityccm3

Features

Dashboard

PromQL

Description

Memory Usage

memory_utilization_byte{container="cloud-controller-manager"}

The memory usage. Unit: bytes.

CPU Usage

cpu_utilization_core{container="cloud-controller-manager"}*1000

The used CPU capacity. Unit: millicore.

Kube API

Observabilityccm4

Features

Dashboard

PromQL

Description

Kube API Request QPS

  • sum(rate(rest_client_requests_total{job="ack-cloud-controller-manager",code=~"2.."}[$interval])) by (method,code)

  • sum(rate(rest_client_requests_total{job="ack-cloud-controller-manager",code=~"3.."}[$interval])) by (method,code)

  • sum(rate(rest_client_requests_total{job="ack-cloud-controller-manager",code=~"4.."}[$interval])) by (method,code)

  • sum(rate(rest_client_requests_total{job="ack-cloud-controller-manager",code=~"5.."}[$interval])) by (method,code)

The number of HTTP requests sent by cloud-controller-mananger to kube-apiserver per second. The queries per second (QPS) value is calculated based on Verbs and URLs.

Common metric anomalies

If the metrics of kube-apiserver become abnormal, check whether the metric anomalies described in the following sections exist. If metric anomalies that are not described in the following sections occur, submit a ticket.

CLB(Classical Load Balancer) synchronization delay

Normal case

Anomaly

Anomaly description

Suggestion

The CLB synchronization delay is shorter than 10 seconds.

The CLB synchronization delay is longer than 10 seconds.

CLB synchronization is time-consuming.

Check whether abnormal Service events exist.

Workqueue depth

Normal case

Anomaly

Anomaly description

Suggestion

The workqueue depth is smaller than 10.

The workqueue depth is greater than 10.

A large number of Services pending for synchronization exist in the workqueue.

If the workqueue is too long, Service synchronization is slowed down. You can update the nodes, pods, and Services in the cluster less frequently.

References

For more information about the metrics, usage notes for using the dashboards, and suggestions on how to troubleshoot common metric anomalies for other control plane components, see the following topics: Metrics of kube-apiserver, Metrics of etcd, Metrics of kube-scheduler, and Metrics of kube-controller-manager