The ack-node-problem-detector component is optimized and enhanced based on the open source Node Problem Detector (NPD) that is provided by the Kubernetes community. The ack-node-problem-detector component is used to monitor nodes and integrate third-party monitoring plug-ins. This component detects node anomalies in a Container Service for Kubernetes (ACK) cluster and supports the event center feature. You can use ack-node-problem-detector to integrate custom monitoring plug-ins. This allows you to enhance node monitoring and detect more node anomalies. This topic introduces ack-node-problem-detector and provides usage notes and release notes for ack-node-problem-detector.
Introduction
The ack-node-problem-detector component is a node diagnostic tool provided by ACK clusters to monitor and report node anomalies. The component consists of three modules:
kube-event-init: When you install ack-node-problem-detector, kube-event-init initializes the resources in the event center of Simple Log Service. This way, ack-node-problem-detector-daemonset and kube-eventer can use these resources to store and analyze event data.
ack-node-problem-detector-daemonset: This module runs a pod on each node that meets the specified conditions to monitor the health status of each node and report cluster status and events. In the following tables, the image address of ack-node-problem-detector is the image address of ack-node-problem-detector-daemonset.
kube-eventer: By default, kube-eventer reports all events in the cluster to the event center of Simple Log Service. The event center retains event data for 90 days and provides features such as dashboards, alerts, and event search and analysis. You can configure kube-eventer to report cluster events to other systems such as DingTalk and EventBridge for data integration. For more information, see kube-eventer.
Usage notes
For more information about how to install ack-node-problem-detector and the usage notes and new features of ack-node-problem-detector, see Event monitoring.
Release notes
August 2024
Version number | Image address | Release date | Description |
1.2.20 | ack-node-problem-detector: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/node-problem-detector:v0.8.14-3c6002c-aliyun kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer:v1.2.11-0620284-aliyun kube-event-init: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer-init:v1.8-e43647f-aliyun
| 2024-08-20 | The GPU fault inspection feature is available on ECS nodes. kube-eventer update: ack-node-problem-detector update: The local port for the DaemonSet pod can be set to 20256 or 20257. By default, this port is disabled.
|
December 2023
Version number | Image address | Release date | Description |
v1.2.18 | ack-node-problem-detector: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/node-problem-detector:v0.8.13-003ac31-aliyun kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer:v1.2.8-27a468a-aliyun kube-event-init: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer-init:v1.7-48a2acc-aliyun
| 2023-12-18 | |
August 2023
Version number | Image address | Release date | Description |
v1.2.17 | ack-node-problem-detector: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/node-problem-detector:v0.8.12-bf8aff8-aliyun kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer:v1.2.8-27a468a-aliyun kube-event-init: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer-init:v1.7-48a2acc-aliyun
| 2023-08-24 | The parameters of ack-node-problem-detector can be modified from the Add-ons page of the ACK console. This way, the configurations of the project and Logstore in Simple Log Service can be updated. Labels can be sent together with log data to Simple Log Service, such as cluster names. These labels are displayed in Simple Log Service data in the event center of ACK by default.
|
June 2023
Version number | Image address | Release date | Description |
v1.2.16 | ack-node-problem-detector: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/node-problem-detector:v0.8.12-bf8aff8-aliyun kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer:v1.2.8-019546c-aliyun kube-event-init: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer-init:v1.7-48a2acc-aliyun
| 2023-06-27 | The resource specifications of ack-node-problem-detector can be configured from the Add-ons page of the ACK console. |
v1.2.15 | ack-node-problem-detector: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/node-problem-detector:v0.8.12-bf8aff8-aliyun kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer:v1.2.8-019546c-aliyun kube-event-init: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer-init:v1.7-48a2acc-aliyun
| June 06, 2023 | The issue that ack-node-problem-detector affects the performance of the API server and etcd when OOMKilling errors frequently occur in large numbers of pods is fixed. |
February 2023
Version number | Image address | Release date | Description |
v1.2.14 | ack-node-problem-detector: registry.aliyuncs.com/acs/node-problem-detector:v0.8.11-edc7907-aliyun kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer:v1.2.6-bbf76f7-aliyun kube-event-init: registry.{ .Values.controller.regionId }.aliyuncs.com/acs/kube-eventer-init:v1.7-48a2acc-aliyun
| 2023-02-03 | |
September 2022
Version number | Image address | Release date | Description |
v1.2.11 | ack-node-problem-detector: registry.aliyuncs.com/acs/node-problem-detector:v0.8.11-edc7907-aliyun kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer:v1.2.6-bbf76f7-aliyun kube-event-init: registry.{ .Values.controller.regionId }.aliyuncs.com/acs/kube-eventer-init:v1.7-48a2acc-aliyun
| 2022-09-30 | |
February 2022
Version number | Image address | Release date | Description |
v1.2.9 | ack-node-problem-detector: registry.aliyuncs.com/acs/node-problem-detector:v0.8.10-e0ff7d2 kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer-amd64:v1.2.6-f0efecf-aliyun kube-event-init: registry.{ .Values.controller.regionId }.aliyuncs.com/acs/kube-eventer-init:v1.6-a92aba6-aliyun
| 2022-02-22 | |
January 2022
Version number | Image address | Release date | Description |
v1.2.8 | ack-node-problem-detector: registry.aliyuncs.com/acs/node-problem-detector:v0.8.10-e0ff7d2 kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer-amd64:v1.2.5-cc7ec54-aliyun kube-event-init: registry.{ .Values.controller.regionId }.aliyuncs.com/acs/kube-eventer-init:v1.6-a92aba6-aliyun
| 2022-01-20 | |
November 2021
Version number | Image address | Release date | Description |
v1.2.7 | ack-node-problem-detector: registry.aliyuncs.com/acs/node-problem-detector:v0.8.10-e0ff7d2 kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer-amd64:v1.2.5-cc7ec54-aliyun kube-event-init: registry.{ .Values.controller.regionId }.aliyuncs.com/acs/kube-eventer-init:v1.6-a92aba6-aliyun
| 2021-11-25 | |
April 2021
Version number | Image address | Release date | Description |
v1.2.5 | ack-node-problem-detector: registry.aliyuncs.com/acs/node-problem-detector:v0.6.3-28-160499f kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer-amd64:v1.2.4-0f5aaee-aliyun kube-event-init: registry.{ .Values.controller.regionId }.aliyuncs.com/acs/kube-eventer-init:1.5-5e0e7c1-aliyun
| 2021-04-25 | The following issue is fixed: kube-event-init in the kube-system namespace returns the "414 Request Too Large" error when the event center feature is enabled. The eventer list-watch mechanism is optimized. This prevents etcd from receiving a large number of requests. For more information, see eventer list-watch. The following issue is fixed: kube-eventer fails to parse the timestamps of some system events. For more information, see fix FailedScheduling event write to sls with wrong timestamp.
|
July 2020
Version number | Image address | Release date | Description |
v0.6.3-28-160499f | registry.aliyuncs.com/acs/node-problem-detector:v0.6.3-28-160499f | 2020-07-27 | The following information can be added to OOMKilling events: the name of the relevant pod, the namespace to which the pod belongs, and the user IDs (UIDs) of the killed processes. The efficiency of the check_fd plug-in is improved. Node events are optimized to allow you to receive alerts when the process ID (PID) usage of cluster nodes exceeds the threshold. Plug-ins that detect network connections are upgraded. Alert plug-ins are added to allow you to receive alerts when the inode usage in the system disks of cluster nodes exceeds the threshold.
|