All Products
Search
Document Center

Container Service for Kubernetes:ack-node-problem-detector

Last Updated:Sep 19, 2024

The ack-node-problem-detector component is optimized and enhanced based on the open source Node Problem Detector (NPD) that is provided by the Kubernetes community. The ack-node-problem-detector component is used to monitor nodes and integrate third-party monitoring plug-ins. This component detects node anomalies in a Container Service for Kubernetes (ACK) cluster and supports the event center feature. You can use ack-node-problem-detector to integrate custom monitoring plug-ins. This allows you to enhance node monitoring and detect more node anomalies. This topic introduces ack-node-problem-detector and provides usage notes and release notes for ack-node-problem-detector.

Introduction

The ack-node-problem-detector component is a node diagnostic tool provided by ACK clusters to monitor and report node anomalies. The component consists of three modules:

  • kube-event-init: When you install ack-node-problem-detector, kube-event-init initializes the resources in the event center of Simple Log Service. This way, ack-node-problem-detector-daemonset and kube-eventer can use these resources to store and analyze event data.

  • ack-node-problem-detector-daemonset: This module runs a pod on each node that meets the specified conditions to monitor the health status of each node and report cluster status and events. In the following tables, the image address of ack-node-problem-detector is the image address of ack-node-problem-detector-daemonset.

    Note

    For more information about open source node-problem-detector, see node-problem-detector.

  • kube-eventer: By default, kube-eventer reports all events in the cluster to the event center of Simple Log Service. The event center retains event data for 90 days and provides features such as dashboards, alerts, and event search and analysis. You can configure kube-eventer to report cluster events to other systems such as DingTalk and EventBridge for data integration. For more information, see kube-eventer.

Usage notes

For more information about how to install ack-node-problem-detector and the usage notes and new features of ack-node-problem-detector, see Event monitoring.

Release notes

August 2024

Version number

Image address

Release date

Description

1.2.20

  • ack-node-problem-detector: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/node-problem-detector:v0.8.14-3c6002c-aliyun

  • kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer:v1.2.11-0620284-aliyun

  • kube-event-init: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer-init:v1.8-e43647f-aliyun

2024-08-20

  • The GPU fault inspection feature is available on ECS nodes.

  • kube-eventer update:

    • The performance bottleneck for large-scale event reporting in clusters is optimized.

    • The V4 signature algorithm can be used for log service data transmission.

  • ack-node-problem-detector update: The local port for the DaemonSet pod can be set to 20256 or 20257. By default, this port is disabled.

December 2023

Version number

Image address

Release date

Description

v1.2.18

  • ack-node-problem-detector: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/node-problem-detector:v0.8.13-003ac31-aliyun

  • kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer:v1.2.8-27a468a-aliyun

  • kube-event-init: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer-init:v1.7-48a2acc-aliyun

2023-12-18

  • The issue that the cached kernel log causes the system to report pod OOMKilling errors is fixed.

  • Custom component configurations can be inherited when you update ack-node-problem-detector.

August 2023

Version number

Image address

Release date

Description

v1.2.17

  • ack-node-problem-detector: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/node-problem-detector:v0.8.12-bf8aff8-aliyun

  • kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer:v1.2.8-27a468a-aliyun

  • kube-event-init: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer-init:v1.7-48a2acc-aliyun

2023-08-24

  • The parameters of ack-node-problem-detector can be modified from the Add-ons page of the ACK console. This way, the configurations of the project and Logstore in Simple Log Service can be updated.

  • Labels can be sent together with log data to Simple Log Service, such as cluster names. These labels are displayed in Simple Log Service data in the event center of ACK by default.

June 2023

Version number

Image address

Release date

Description

v1.2.16

  • ack-node-problem-detector: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/node-problem-detector:v0.8.12-bf8aff8-aliyun

  • kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer:v1.2.8-019546c-aliyun

  • kube-event-init: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer-init:v1.7-48a2acc-aliyun

2023-06-27

The resource specifications of ack-node-problem-detector can be configured from the Add-ons page of the ACK console.

v1.2.15

  • ack-node-problem-detector: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/node-problem-detector:v0.8.12-bf8aff8-aliyun

  • kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer:v1.2.8-019546c-aliyun

  • kube-event-init: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer-init:v1.7-48a2acc-aliyun

June 06, 2023

The issue that ack-node-problem-detector affects the performance of the API server and etcd when OOMKilling errors frequently occur in large numbers of pods is fixed.

February 2023

Version number

Image address

Release date

Description

v1.2.14

  • ack-node-problem-detector: registry.aliyuncs.com/acs/node-problem-detector:v0.8.11-edc7907-aliyun

  • kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer:v1.2.6-bbf76f7-aliyun

  • kube-event-init: registry.{ .Values.controller.regionId }.aliyuncs.com/acs/kube-eventer-init:v1.7-48a2acc-aliyun

2023-02-03

  • Image pulling is accelerated.

  • ack-node-problem-detector is supported by ACK Edge clusters.

September 2022

Version number

Image address

Release date

Description

v1.2.11

  • ack-node-problem-detector: registry.aliyuncs.com/acs/node-problem-detector:v0.8.11-edc7907-aliyun

  • kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer:v1.2.6-bbf76f7-aliyun

  • kube-event-init: registry.{ .Values.controller.regionId }.aliyuncs.com/acs/kube-eventer-init:v1.7-48a2acc-aliyun

2022-09-30

  • The inspection logic of ack-node-problem-detector is optimized. The loads on key components in ACK clusters are reduced.

  • Image security hardening is supported.

February 2022

Version number

Image address

Release date

Description

v1.2.9

  • ack-node-problem-detector: registry.aliyuncs.com/acs/node-problem-detector:v0.8.10-e0ff7d2

  • kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer-amd64:v1.2.6-f0efecf-aliyun

  • kube-event-init: registry.{ .Values.controller.regionId }.aliyuncs.com/acs/kube-eventer-init:v1.6-a92aba6-aliyun

2022-02-22

  • Kernel inspection is supported.

  • Security is enhanced.

January 2022

Version number

Image address

Release date

Description

v1.2.8

  • ack-node-problem-detector: registry.aliyuncs.com/acs/node-problem-detector:v0.8.10-e0ff7d2

  • kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer-amd64:v1.2.5-cc7ec54-aliyun

  • kube-event-init: registry.{ .Values.controller.regionId }.aliyuncs.com/acs/kube-eventer-init:v1.6-a92aba6-aliyun

2022-01-20

  • Different containerd modes are supported.

  • The Quality of service (QoS) limits of the resources of ack-node-problem-detector are optimized to improve stability.

November 2021

Version number

Image address

Release date

Description

v1.2.7

  • ack-node-problem-detector: registry.aliyuncs.com/acs/node-problem-detector:v0.8.10-e0ff7d2

  • kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer-amd64:v1.2.5-cc7ec54-aliyun

  • kube-event-init: registry.{ .Values.controller.regionId }.aliyuncs.com/acs/kube-eventer-init:v1.6-a92aba6-aliyun

2021-11-25

  • This version is compatible with Alibaba Cloud Linux 3 and CentOS 8.

  • ARM architecture environments are supported.

April 2021

Version number

Image address

Release date

Description

v1.2.5

  • ack-node-problem-detector: registry.aliyuncs.com/acs/node-problem-detector:v0.6.3-28-160499f

  • kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer-amd64:v1.2.4-0f5aaee-aliyun

  • kube-event-init: registry.{ .Values.controller.regionId }.aliyuncs.com/acs/kube-eventer-init:1.5-5e0e7c1-aliyun

2021-04-25

  • The following issue is fixed: kube-event-init in the kube-system namespace returns the "414 Request Too Large" error when the event center feature is enabled.

  • The eventer list-watch mechanism is optimized. This prevents etcd from receiving a large number of requests. For more information, see eventer list-watch.

  • The following issue is fixed: kube-eventer fails to parse the timestamps of some system events. For more information, see fix FailedScheduling event write to sls with wrong timestamp.

July 2020

Version number

Image address

Release date

Description

v0.6.3-28-160499f

registry.aliyuncs.com/acs/node-problem-detector:v0.6.3-28-160499f

2020-07-27

  • The following information can be added to OOMKilling events: the name of the relevant pod, the namespace to which the pod belongs, and the user IDs (UIDs) of the killed processes.

  • The efficiency of the check_fd plug-in is improved.

  • Node events are optimized to allow you to receive alerts when the process ID (PID) usage of cluster nodes exceeds the threshold.

  • Plug-ins that detect network connections are upgraded.

  • Alert plug-ins are added to allow you to receive alerts when the inode usage in the system disks of cluster nodes exceeds the threshold.