In addition to the data collection endpoint <nodeIP>:10250/metrics/cadvisor
, Container Service for Kubernetes (ACK) also provides the endpoint <nodeIP>:10250/metrics/cadvisor?nodeName=<nodeName>
, which allows you to collect the metrics of a virtual node by specifying the name of the virtual node. You can modify the configuration of Prometheus to collect the metrics of the specified virtual node.
Introduction
The virtual node architecture enables multiple virtual nodes in a cluster to share the same node IP address. Consequently, when you want to collect the metrics of a virtual node, the metrics of all virtual nodes are returned. Prometheus usually uses the kubelet Service to collect the metrics of all nodes. Therefore, duplicate metrics are returned when the cluster contains more than one virtual node.
To address this issue, ACK allows you to collect the metrics of the specified virtual node. In addition to the data collection endpoint <nodeIP>:10250/metrics/cadvisor
, ACK provides the endpoint <nodeIP>:10250/metrics/cadvisor?nodeName=<nodeName>
, which allows you to specify the name of a virtual node. After you specify the name of a virtual node, only the monitoring data of the pods managed by the specified virtual node is returned.
Prerequisites
The ACK virtual node component is installed and the version of the component is v2.11.0 or later. For more information, see Manage components.
Modify the configuration of Prometheus
You can modify the configuration of Prometheus to collect the metrics of the specified virtual node. This topic describes how to modify the configurations of Managed Service for Prometheus, open source Prometheus Operator, and open source Prometheus.
Managed Service for Prometheus
To modify the configuration of Managed Service for Prometheus to collect the metrics of the specified virtual node, submit a ticket to apply to be added to the whitelist.
Open source Prometheus Operator
If you use open source Prometheus Operator and ack-prometheus-operator in the marketplace of Container Service for Kubernetes, you need to add the following ServiceMonitor CustomResource (CR):
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: virutal-kubelet
namespace: monitoring
labels:
k8s-app: kubelet
# Add this label to automatically manage prometheus-operator.
release: prometheus-operator
spec:
jobLabel: k8s-app
selector:
matchLabels:
k8s-app: kubelet
namespaceSelector:
matchNames:
- kube-system
endpoints:
- port: https-metrics
interval: 15s
scheme: https
path: /metrics/cadvisor
bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
tlsConfig:
caFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecureSkipVerify: true
relabelings:
# Retain only the virtual node endpoint.
- sourceLabels: [__meta_kubernetes_endpoint_address_target_name]
regex: (^virtual-kubelet.*)
action: keep
# Add parameters to query based on the specified nodeName.
- sourceLabels: [__meta_kubernetes_endpoint_address_target_name]
regex: (^virtual-kubelet.*)
targetLabel: __param_nodeName
replacement: ${1}
action: replace
If the cluster is already configured with service discovery to collect cAdvisor metrics based on the kubelet Service, you need to add the following configuration to discard the <Virtual Node IP>:10250/metrics/cadvisor
endpoint in case duplicate data is collected.
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
...
spec:
endpoints:
- path: /metrics/cadvisor
port: https-metrics
...
relabelings:
# The relabeling rule discards the endpoints of all targets whose names start with virtual-kubelet.
- action: drop
regex: (^virtual-kubelet.*)
sourceLabels:
- __meta_kubernetes_endpoint_address_target_name
Open source Prometheus
Find the configuration file of open source Prometheus. Typically, you can find the configuration file in /etc/prometheus/prometheus.yml
or in your custom configuration directory. Then, add the following collection configuration to the file:
scrape_configs:
...Other job configuration.
- job_name: monitoring/virutal-kubelet/0
honor_timestamps: true
scrape_interval: 15s
scrape_timeout: 10s
metrics_path: /metrics/cadvisor
scheme: https
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
relabel_configs:
- source_labels: [__meta_kubernetes_service_label_k8s_app]
separator: ;
regex: kubelet
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
separator: ;
regex: https-metrics
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Node;(.*)
target_label: node
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Pod;(.*)
target_label: pod
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: service
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_name]
separator: ;
regex: (.*)
target_label: pod
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_container_name]
separator: ;
regex: (.*)
target_label: container
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: job
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_service_label_k8s_app]
separator: ;
regex: (.+)
target_label: job
replacement: ${1}
action: replace
- separator: ;
regex: (.*)
target_label: endpoint
replacement: https-metrics
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: (^virtual-kubelet.*)
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: (^virtual-kubelet.*)
target_label: __param_nodeName
replacement: ${1}
action: replace
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- kube-system
If the cluster is already configured with service discovery to collect cAdvisor metrics based on the kubelet Service, you need to add the following configuration to discard the <Virtual Node IP>:10250/metrics/cadvisor
endpoint in case duplicate data is collected.
scrape_configs:
...Other job configuration.
- job_name: monitoring/ack-prometheus-operator-kubelet/0
honor_labels: true
honor_timestamps: true
...
relabel_configs:
...
// Discard the endpoint for collecting the /metrics/cadviso metrics of virtual nodes.
- source_labels: [__meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: (^virtual-kubelet.*)
replacement: $1
action: drop