全部產品
Search
文件中心

Container Service for Kubernetes:採集指定虛擬節點的Metrics

更新時間:Jun 19, 2024

為了採集指定虛擬節點的Metrics資料,ACK在保留原有採集端點<nodeIP>:10250/metrics/cadvisor的基礎上,額外提供指定虛擬節點名稱的端點<nodeIP>:10250/metrics/cadvisor?nodeName=<nodeName>。您可以通過修改Prometheus監控配置來採集指定虛擬節點的Metrics。

功能介紹

在虛擬節點的架構設計下,同一叢集內的多個虛擬節點會共用同一個Node IP。這導致在採集單個虛擬節點的資料時,會返回所有虛擬節點的全量資料。而Prometheus的常見採集配置是,通過Kubelet Service來採集所有節點的Metrics。因此,在叢集記憶體在多個虛擬節點的情況下,會出現Metrics重複的現象。

為瞭解決這個問題,ACK提供了採集指定虛擬節點的Metrics資料的能力。除了保留原有的採集端點<nodeIP>:10250/metrics/cadvisor外,還額外提供指定虛擬節點名稱的端點<nodeIP>:10250/metrics/cadvisor?nodeName=<nodeName> 。指定虛擬節點名稱後,虛擬節點僅返回對應虛擬節點管理的所有Pod的監控資料,

前提條件

已安裝ACK Virtual Node組件,且組件版本為v2.11.0及以上。詳細資料,請參見管理組件

修改Prometheus監控配置

您可以通過修改Prometheus監控配置來採集指定虛擬節點的Metrics,本文為您介紹阿里雲Managed Service for Prometheus、社區版Prometheus Operator方案和開源Prometheus三種情境下的配置方式。

阿里雲Managed Service for Prometheus

在阿里雲Managed Service for Prometheus中,該功能僅支援白名單開放,請提交工單申請。

社區版Prometheus Operator方案

如果使用的Prometheus方案為社區Prometheus Operator方案,以及Container Service for Kubernetes應用市場ack-prometheus-operator,需要增加以下ServiceMonitor CR配置。

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: virutal-kubelet
  namespace: monitoring
  labels:
    k8s-app: kubelet
    # 增加該label用於prometheus-operator自動管理。
    release: prometheus-operator
spec:
  jobLabel: k8s-app
  selector:
    matchLabels:
      k8s-app: kubelet
  namespaceSelector:
    matchNames:
    - kube-system
  endpoints:
  - port: https-metrics
    interval: 15s
    scheme: https
    path: /metrics/cadvisor
    bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    tlsConfig:
      caFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      insecureSkipVerify: true
    relabelings:
    # 只保留Virtual Node的端點。
    - sourceLabels: [__meta_kubernetes_endpoint_address_target_name]
      regex: (^virtual-kubelet.*)
      action: keep
    # 增加指定nodeName的查詢參數。
    - sourceLabels: [__meta_kubernetes_endpoint_address_target_name]
      regex: (^virtual-kubelet.*)
      targetLabel: __param_nodeName
      replacement: ${1}
      action: replace

如果叢集中已經配置了基於kubelet service的服務發現來收集cAdvisor的Metrics,您需要增加以下配置來移除對<Virtual Node IP>:10250/metrics/cadvisor的採集配置,避免重複採集資料。

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
...
spec:
  endpoints:
  - path: /metrics/cadvisor
    port: https-metrics
    ...
    relabelings:
    # 這個relabeling規則使得所有Target名稱以virtual-kubelet開頭的endpoints被丟棄.
    - action: drop
      regex: (^virtual-kubelet.*)
      sourceLabels:
      - __meta_kubernetes_endpoint_address_target_name

開源Prometheus

在開源Prometheus中找到Prometheus的設定檔,通常位於/etc/prometheus/prometheus.yml或者您自訂的配置目錄下,增加以下採集配置。

scrape_configs:

...其他job配置。

- job_name: monitoring/virutal-kubelet/0
  honor_timestamps: true
  scrape_interval: 15s
  scrape_timeout: 10s
  metrics_path: /metrics/cadvisor
  scheme: https
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    insecure_skip_verify: true
  relabel_configs:
  - source_labels: [__meta_kubernetes_service_label_k8s_app]
    separator: ;
    regex: kubelet
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_endpoint_port_name]
    separator: ;
    regex: https-metrics
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
    separator: ;
    regex: Node;(.*)
    target_label: node
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
    separator: ;
    regex: Pod;(.*)
    target_label: pod
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_namespace]
    separator: ;
    regex: (.*)
    target_label: namespace
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: service
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_name]
    separator: ;
    regex: (.*)
    target_label: pod
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_container_name]
    separator: ;
    regex: (.*)
    target_label: container
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: job
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_service_label_k8s_app]
    separator: ;
    regex: (.+)
    target_label: job
    replacement: ${1}
    action: replace
  - separator: ;
    regex: (.*)
    target_label: endpoint
    replacement: https-metrics
    action: replace
  - source_labels: [__meta_kubernetes_endpoint_address_target_name]
    separator: ;
    regex: (^virtual-kubelet.*)
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_endpoint_address_target_name]
    separator: ;
    regex: (^virtual-kubelet.*)
    target_label: __param_nodeName
    replacement: ${1}
    action: replace
  kubernetes_sd_configs:
  - role: endpoints
    namespaces:
      names:
      - kube-system

如果叢集中已經配置了基於kubelet service的服務發現來收集cAdvisor的Metrics,您需要增加以下配置來移除對<Virtual Node IP>:10250/metrics/cadvisor的採集配置,避免重複採集資料。

scrape_configs:

...其他job配置。

- job_name: monitoring/ack-prometheus-operator-kubelet/0
  honor_labels: true
  honor_timestamps: true
  ...
  relabel_configs:
  ...
  // 移除對虛擬節點的/metrics/cadviso端點的採集。
  - source_labels: [__meta_kubernetes_endpoint_address_target_name]
    separator: ;
    regex: (^virtual-kubelet.*)
    replacement: $1
    action: drop