為了採集指定虛擬節點的Metrics資料,ACK在保留原有採集端點<nodeIP>:10250/metrics/cadvisor
的基礎上,額外提供指定虛擬節點名稱的端點<nodeIP>:10250/metrics/cadvisor?nodeName=<nodeName>
。您可以通過修改Prometheus監控配置來採集指定虛擬節點的Metrics。
功能介紹
在虛擬節點的架構設計下,同一叢集內的多個虛擬節點會共用同一個Node IP。這導致在採集單個虛擬節點的資料時,會返回所有虛擬節點的全量資料。而Prometheus的常見採集配置是,通過Kubelet Service來採集所有節點的Metrics。因此,在叢集記憶體在多個虛擬節點的情況下,會出現Metrics重複的現象。
為瞭解決這個問題,ACK提供了採集指定虛擬節點的Metrics資料的能力。除了保留原有的採集端點<nodeIP>:10250/metrics/cadvisor
外,還額外提供指定虛擬節點名稱的端點<nodeIP>:10250/metrics/cadvisor?nodeName=<nodeName>
。指定虛擬節點名稱後,虛擬節點僅返回對應虛擬節點管理的所有Pod的監控資料,
前提條件
已安裝ACK Virtual Node組件,且組件版本為v2.11.0及以上。詳細資料,請參見管理組件。
修改Prometheus監控配置
您可以通過修改Prometheus監控配置來採集指定虛擬節點的Metrics,本文為您介紹阿里雲Managed Service for Prometheus、社區版Prometheus Operator方案和開源Prometheus三種情境下的配置方式。
阿里雲Managed Service for Prometheus
在阿里雲Managed Service for Prometheus中,該功能僅支援白名單開放,請提交工單申請。
社區版Prometheus Operator方案
如果使用的Prometheus方案為社區Prometheus Operator方案,以及Container Service for Kubernetes應用市場ack-prometheus-operator,需要增加以下ServiceMonitor CR配置。
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: virutal-kubelet
namespace: monitoring
labels:
k8s-app: kubelet
# 增加該label用於prometheus-operator自動管理。
release: prometheus-operator
spec:
jobLabel: k8s-app
selector:
matchLabels:
k8s-app: kubelet
namespaceSelector:
matchNames:
- kube-system
endpoints:
- port: https-metrics
interval: 15s
scheme: https
path: /metrics/cadvisor
bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
tlsConfig:
caFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecureSkipVerify: true
relabelings:
# 只保留Virtual Node的端點。
- sourceLabels: [__meta_kubernetes_endpoint_address_target_name]
regex: (^virtual-kubelet.*)
action: keep
# 增加指定nodeName的查詢參數。
- sourceLabels: [__meta_kubernetes_endpoint_address_target_name]
regex: (^virtual-kubelet.*)
targetLabel: __param_nodeName
replacement: ${1}
action: replace
如果叢集中已經配置了基於kubelet service的服務發現來收集cAdvisor的Metrics,您需要增加以下配置來移除對<Virtual Node IP>:10250/metrics/cadvisor
的採集配置,避免重複採集資料。
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
...
spec:
endpoints:
- path: /metrics/cadvisor
port: https-metrics
...
relabelings:
# 這個relabeling規則使得所有Target名稱以virtual-kubelet開頭的endpoints被丟棄.
- action: drop
regex: (^virtual-kubelet.*)
sourceLabels:
- __meta_kubernetes_endpoint_address_target_name
開源Prometheus
在開源Prometheus中找到Prometheus的設定檔,通常位於/etc/prometheus/prometheus.yml
或者您自訂的配置目錄下,增加以下採集配置。
scrape_configs:
...其他job配置。
- job_name: monitoring/virutal-kubelet/0
honor_timestamps: true
scrape_interval: 15s
scrape_timeout: 10s
metrics_path: /metrics/cadvisor
scheme: https
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
relabel_configs:
- source_labels: [__meta_kubernetes_service_label_k8s_app]
separator: ;
regex: kubelet
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
separator: ;
regex: https-metrics
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Node;(.*)
target_label: node
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: Pod;(.*)
target_label: pod
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: service
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_name]
separator: ;
regex: (.*)
target_label: pod
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_container_name]
separator: ;
regex: (.*)
target_label: container
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: job
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_service_label_k8s_app]
separator: ;
regex: (.+)
target_label: job
replacement: ${1}
action: replace
- separator: ;
regex: (.*)
target_label: endpoint
replacement: https-metrics
action: replace
- source_labels: [__meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: (^virtual-kubelet.*)
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: (^virtual-kubelet.*)
target_label: __param_nodeName
replacement: ${1}
action: replace
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- kube-system
如果叢集中已經配置了基於kubelet service的服務發現來收集cAdvisor的Metrics,您需要增加以下配置來移除對<Virtual Node IP>:10250/metrics/cadvisor
的採集配置,避免重複採集資料。
scrape_configs:
...其他job配置。
- job_name: monitoring/ack-prometheus-operator-kubelet/0
honor_labels: true
honor_timestamps: true
...
relabel_configs:
...
// 移除對虛擬節點的/metrics/cadviso端點的採集。
- source_labels: [__meta_kubernetes_endpoint_address_target_name]
separator: ;
regex: (^virtual-kubelet.*)
replacement: $1
action: drop