Monitor basic resources of ACK clusters by using Kubernetes monitoring of CloudMonitor - Container Service for Kubernetes

Resource monitoring is one of the most commonly used monitoring methods in Kubernetes. You can use the Kubernetes monitoring feature of CloudMonitor to efficiently check the usage and health status of basic resources used by workloads in your Container Service for Kubernetes (ACK) clusters. The resources include CPU, memory, and network resources. This ensures that your ACK clusters can stably run as expected.

Feature description

CloudMonitor automatically collects the metrics of all ACK clusters within your Alibaba Cloud account. This way, you can monitor the ACK clusters that are deployed across multiple regions in a centralized and global way. For more information, see Overview.

Metrics from the cluster perspective
CloudMonitor provides metrics such as alerts, number of nodes, and memory usage and CPU utilization of pods and nodes. This helps you efficiently gain insights into cluster performance.
More professional monitoring and alerting
The container monitoring feature of CloudMonitor is updated to Kubernetes monitoring, which provides more professional basic monitoring capabilities for containers. CloudMonitor provides key monitoring metrics for native Kubernetes objects, such as namespaces, nodes, workloads, and pods. The alerting feature is updated to allow you to configure alert rules from different perspectives.
Appropriate metrics for different container monitoring scenarios
CloudMonitor supports the most appropriate metrics for different scenarios at specific layers, such as the host infrastructure layer, container layer in Platform as a Service (PaaS), and Kubernetes scheduling layer. For example, the memory metrics that affect Kubernetes scheduling in containers are dedicated to the working memory of containers. This helps distinguish container memory usage from host memory usage.

Prerequisites

The version of the metrics-server component in your cluster is V0.3.8.5 or later. For more information about how to update metrics-server, see the following topics:
- ACK managed clusters: Manage components
- ACK dedicated clusters that run Kubernetes 1.12 or earlier: Update the metrics-server component before you update the Kubernetes version to 1.12
If the metrics-server component cannot be updated to V0.3.8.5 or later, use resource monitoring of the previous version. For more information, see the Use resource monitoring of the previous version section of this topic.

Enable the monitoring feature of CloudMonitor for an ACK cluster

For more information, see Enable the monitoring feature of CloudMonitor for an ACK cluster.

View resource monitoring data

Log on to the CloudMonitor console.
In the left-side navigation pane, choose Cloud Service Monitoring > Container Service ACK.
On the Container Service Monitoring page, find the cluster that you want to manage and click its name or click View Details in the Actions column.
Note
If you use CloudMonitor to monitor the cluster for the first time, a message appears, asking you to perform authorization. You must click Authorize to complete authorization before you can go to the cluster details page.
On the cluster details page, view the monitoring data of the cluster in the following sections: Cluster overview, Node, Namespace, Workload, and Alert Rules.
For more information, see View monitoring data.

Scenarios of metric-based alerting

Scenario	Description	Configuration method
Configure threshold-triggered alerting for the resource usage of a cluster or nodes in the cluster	If the resource usage of a cluster or nodes in the cluster exceeds the threshold, an alert is triggered to prevent service interruptions. We recommend that you configure threshold-triggered alert rules to monitor the resource usage of the entire cluster or all nodes in the cluster.	When you create an alert rule, set the Resource Range parameter to Cluster or Node. This way, you can receive alert notifications if abnormal metric values are detected in the cluster or on a node in the cluster. If you set the Resource Range parameter to Node, make sure that you select all nodes from the Node drop-down list. This way, an alert is triggered if an abnormal value of the metric specified by the Rule Description parameter is detected on any node in the cluster.
Configure threshold-triggered alerting for the resource usage of pods in a cluster	If the resource usage of a cluster exceeds the threshold, you need to find the pod that causes the issue. We recommend that you configure threshold-triggered alert rules to monitor the resource usage of all pods in the cluster.	When you create an alert rule, set the Resource Range parameter to Container Group (pod) and select All from both the Namespace and Container Group (pod) drop-down lists. This way, an alert is triggered if an abnormal value of the metric specified by the Rule Description parameter is detected on any pod in the cluster.
Configure threshold-triggered alerting for the resource usage of pods in the specified namespace of a cluster	In most cases, a cluster is shared among multiple applications. Kubernetes allows you to isolate applications by using namespaces in a multi-tenant environment. If the resource usage exceeds the threshold in a namespace in which an application resides, an alert is triggered. We recommend that you configure threshold-triggered alert rules to monitor the resource usage of all pods in the specified namespace in the cluster.	When you create an alert rule, set the Resource Range to Container Group (pod), select the namespace in which your application resides from the Namespace drop-down list, and select All from the Container Group (pod) drop-down list. This way, an alert is triggered if an abnormal value of the metric specified in the Rule Description parameter is detected on any pod in the specified namespace.
Configure threshold-triggered alerting for the resource usage of pods that run the specified application in the specified namespace of a cluster	In most cases, a cluster is shared among multiple applications. Kubernetes allows you to isolate applications by using workloads in a multi-tenant environment. For example, an application may be run as a Deployment. If the resource usage of the Deployment exceeds the threshold, an alert is triggered. We recommend that you configure threshold-triggered alert rules to monitor the resource usage of all pods of the specified workload.	When you create an alert rule, set the Resource Range parameter to Container Group (pod), select the namespace in which your application resides from the Namespace drop-down list, and select the workload of your application. The following workloads are supported: Deployment, StatefulSet, DaemonSet, Job, and CronJob. Select All from the Container Group (pod) drop-down list. This way, an alert is triggered if an abnormal value of the metric specified in the Rule Description parameter is detected on any pod of the specified workload.

Configure alert rules

Step 1: Create an alert contact and add it to an alert contact group

Log on to the CloudMonitor console.
In the left-side navigation pane, choose Alerts > Alert Contacts.
Create an alert contact and add it to an alert contact group.
For more information, see Create an alert contact or alert contact group.

Step 2: Create an alert rule

Log on to the CloudMonitor console.
In the left-side navigation pane, choose Container Service Monitoring > Container Service ACK.
On the Container Service Monitoring page, find the cluster that you want to manage and click View Alert Rules in the Actions column.
On the page that appears, click Create Alert Rule.

In the Create Alert Rule panel, configure the parameters. The following table describes the parameters.

Parameter	Description
Resource Range	The resources to which the alert rule is applied. Valid values: Cluster: The alert rule is applied to the cluster. If you select this option, you must select a cluster name. Node: The alert rule is applied to all nodes or specified nodes in the cluster. If you select this option, you must select a cluster and one or more nodes. Container Group (pod): The alert rule is applied to all pods or specified pods in the specified application under the specified namespace of the cluster. If you select this option, you must select a cluster and its namespace, and then select an application and one or more pods on the Stateless, Stateful, Daemon set, Task, or Scheduled Tasks tab. Note On the Container Group tab, you only need to select one or more pods.
Rule Description	The condition that triggers the alert rule. If the metric meets the specified condition, the alert rule is triggered. Configure the metric, threshold, and alert level. For more information about the metrics of pods, see ACK (new version).
Mute For	The interval at which CloudMonitor resends alert notifications before an alert is cleared. Valid values: 5 Minutes, 15 Minutes, 30 Minutes, 60 Minutes, 3 Hours, 6 Hours, 12 Hours, and 24 Hours. If a metric value reaches the threshold, CloudMonitor sends an alert notification. If the metric value reaches the threshold again within the mute period, CloudMonitor does not resend an alert notification. If the alert is not cleared after the mute period ends, CloudMonitor resends an alert notification.
Effective Period	The period during which the alert rule is effective. CloudMonitor monitors the specified resources based on the alert rule only within the specified period.
Alert Callback	The callback URL that can be accessed over the Internet. CloudMonitor sends HTTP POST requests to push alert notifications to the specified URL. Only the HTTP protocol is supported. For more information about how to configure alert callback, see Use the alert callback feature to send notifications about threshold-triggered alerts. Note The callback URL to which CloudMonitor sends requests to push alert notifications. We recommend that you specify a callback URL that can be accessed over the Internet.
Alert Contact Group	The alert contact groups to which alert notifications are sent. The alert notifications are sent to the alert contacts that belong to the selected alert contact groups. An alert contact group can contain one or more alert contacts. For more information about how to create an alert contact and an alert contact group, see Create an alert contact or alert contact group.

Click OK to create the alert rule.
You can view the created alert rule in the Alert Rules section. For more information about alert rules, see Manage alert rules.

Verification

In the left-side navigation pane, choose Alerts > Alert History.
On the Alert History page, view the alert trend and detailed alert history.

Use resource monitoring of the previous version

If the metrics-server component of your cluster is not updated to V0.3.8.5 or later, you can perform the following steps to use resource monitoring of the previous version:

Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, choose Workloads > Deployments.
On the Deployments page, find the Deployment that you want to manage and click Monitor in the Actions column to view the monitoring data.
View monitoring data on the Deployment Application, Container group list, and Container group hotspot tabs.
Optional. In the left-side navigation pane, choose Alerts > Alert Rules to configure alert rules.
The name of a group-based metric starts with group and the name of an instance-based metric starts with pod.

FAQ

What do I do if no monitoring data of CloudMonitor exists for an ACK cluster?

For more information about how to troubleshoot this issue, see What do I do if no data exists in an ACK cluster?