Configure alert rules in Managed Service for Prometheus and view performance metrics

You can enable Managed Service for Prometheus for a Container Service for Kubernetes (ACK) cluster to monitor the cluster and containers in the cluster in real time. After you enable Managed Service for Prometheus, you can view metrics displayed on Grafana dashboards. You can also specify custom contacts to receive alert notifications and configure custom metrics.

Introduction to Managed Service for Prometheus

Managed Service for Prometheus is a fully managed monitoring service interfaced with the open source Prometheus ecosystem. Managed Service for Prometheus monitors a wide array of components and provides multiple predefined dashboards. Managed Service for Prometheus saves you the efforts to manage underlying services, such as data storage, data display, and system maintenance.

Managed Service for Prometheus provides two container monitoring editions: Basic Edition and Pro Edition. Compared with Container Monitoring Basic Edition, Container Monitoring Pro Edition provides various Grafana monitoring dashboards, default alert rules for components of Container Service for Kubernetes (ACK), and Remote Write and EventBridge-based data delivery capabilities. For more information about the benefits of Container Monitoring Pro Edition, see Comparison of features and billing rules for the Basic Edition and Pro Edition.

Billing

Basic metrics

After you enable Managed Service for Prometheus, ACK collects metrics from containers. The default metrics that ACK collects are basic metrics. By default, basic metrics are free of charge. For more information about the basic metrics supported by Managed Service for Prometheus, see Metrics.

Basic metrics collected by default	Basic metrics collected after you enable other features

Basic metrics collected by default

Basic metrics collected after you enable other features

Metrics related to container resources (kubelet)
Metrics related to application status (kube-state-metrics)
Metrics related to node resources (node-exporter)
Metrics related to GPUs (ack-gpu-exporter)
Metrics related to control plane components in ACK managed clusters, such as the API server, etcd, kube-scheduler, kube-controller-manager, and cloud-controller-manager.
Metrics related to CoreDNS
Metrics related to Ingress controllers

After you use csi-plugin to monitor storage resources on the node side, metrics related to csi-plugin are collected.
After you enable cost insights, metrics related to ack-cost-exporter are collected.
After you enable fine-grained scheduling features such as colocation monitoring and resource profiling, metrics related to ack-koordinator are collected.

Important

If you modify the retention period of collected metrics or collect custom metrics, additional fees are charged. For more information about how to modify the retention period of a metric, see How do I change the storage duration of the data samples for a metric? For more information about the billing rules of Managed Service for Prometheus, see Billing.

Managed Service for Grafana
After you enable Managed Service for Prometheus, Managed Service for Grafana Shared Edition is used to display the metrics collected by Managed Service for Prometheus. For more information about the billing rules of Managed Service for Grafana, see Billing rules.

Step 1: Enable Managed Service for Prometheus

Enable Managed Service for Prometheus when you create a cluster

Enable Managed Service for Prometheus for an existing cluster

ACK Pro clusters:
On the Component Configurations wizard page in the ACK console, select ACK Cluster Monitoring Pro Edition or ACK Cluster Monitoring Basic Edition. For more information, see Create an ACK managed cluster.
Other cluster types or specifications:
On the Component Configurations wizard page in the ACK console, select Enable Managed Service for Prometheus. For more information, see Create an ACK managed cluster.
By default, Enable Managed Service for Prometheus is selected when you create a cluster. After the cluster is created, the system automatically configures Managed Service for Prometheus.

Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, choose Operations > Prometheus Monitoring.
On the Prometheus Monitoring page, follow the on-screen instructions to install the required component and check the relevant dashboards.
The system automatically installs the component and checks the dashboards. After the installation is completed, you can click each tab to view metrics.

When you use ACK dedicated clusters, you must perform the following operations to complete authorization

Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, click Cluster Information.
On the Basic Information tab, click KubernetesWorkerRole-*** to the right of Worker RAM Role. On the Permissions tab, click k8sWorkerRole**** in the Policy column.
On the Policy Document tab of the policy details page, click Modify Policy Document.

Add the following authorization rule to the Statement field in the policy content editor, click Next to edit policy information, and then click OK.

{
    "Version": "1",
    "Statement": [
        {
            "Action": [
                "arms:Describe*",
                "arms:List*",
                "arms:Get*",
                "arms:Search*",
                "arms:Check*",
                "arms:Query*",
                "arms:ListEnvironments",
                "arms:DescribeAddonRelease",
                "arms:InstallAddon",
                "arms:DeleteAddonRelease",
                "arms:ListEnvironmentDashboards",
                "arms:ListAddonReleases",
                "arms:CreateEnvironment",
                "arms:UpdateEnvironment",
                "arms:InitEnvironment",
                "arms:DescribeEnvironment",
                "arms:InstallEnvironmentFeature",
                "arms:ListEnvironmentFeatures"
            ],
            "Resource": "*",
            "Effect": "Allow"
        }
    ]
}

Note

To specify multiple actions, add a comma (,) to the end of the content of each action before you enter the content of the next action.

Step 2: View Grafana dashboards provided by Managed Service for Prometheus

On the Prometheus Monitoring page in the ACK console, you can click different Grafana dashboards to view different monitoring data.

Step 3: (Optional) Configure alert rules in Managed Service for Prometheus

Managed Service for Prometheus allows you to create alert rules for monitoring jobs. When alert rules are met, you can receive alerts through emails, text messages, and DingTalk notifications in real time. This helps you detect errors in a proactive manner. If an alert rule is triggered, the system sends alert notifications to the specified contacts.

1. Create a notification object

Log on to the ARMS console. In the left-side navigation pane, choose Alert Management > Notification Objects.
Follow the on-screen instructions to configure a notification object.
For more information, see Notification objects.

2: Configure alert rules

Log on to the ARMS console. In the left-side navigation pane, choose Managed Service for Prometheus > Instances.
In the upper part of the page that appears, select the region where your cluster is deployed. Click the name of the Prometheus instance used by your cluster to go to the instance details page.
In the left-side navigation pane, click Alert Rules. On the Prometheus Alert Rules page, configure alert rules for the notification object.
For more information, see Create an alert rule for a Prometheus instance.

(Optional) Step 4: Create custom metrics and use Grafana to display the metrics

You can add pod annotations to specify custom metrics based on the default service discovery feature. You can also add Service labels to specify custom metrics based on ServiceMonitors. If you add pod annotations and Service labels at the same time, duplicate metrics may be collected from the same data source.

Pod annotations: After you add specific annotations to a pod, Managed Service for Kubernetes can automatically discover the pod and collect metrics from the pod.
Service labels: Create a Service and add specific labels to the Service. Then, create a ServiceMonitor to specify how the backend pods of the Service are monitored.

Pod Annotation

Service labels

You can add annotations to the templates of Deployments to define custom metrics.

Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, click the name of a cluster. In the left-side navigation pane of the cluster details page, choose Workloads > Deployments. Follow the on-screen instructions to create a workload.
The following example shows how to configure the parameters of a Deployment. For more information, see Create a stateless application by using a Deployment.
1. On the Container wizard page, specify a container image and the required resources, create a web application, expose port 5000, and then click Next.
2. On the Advanced wizard page, create a Service and add pod annotations. Then, click Create.
  - Create a Service.
    In the Services section, click Create and configure the Service. Set Service Type to SLB and configure Port Mapping.
  - In the Annotations section, add the following annotations:
    - Add the prometheus.io/scrape annotation and set the value to true. This enables Managed Service for Prometheus to scrape metrics.
    - Add the prometheus.io/port annotation and set the value to 5000. This specifies that the endpoint port 5000 is scraped by Managed Service for Prometheus.
    - Add the prometheus.io/path annotation and set the value to /access. This specifies that the endpoint path /access is scraped by Managed Service for Prometheus.
Configure custom metrics.
1. Log on to the ARMS console.
2. In the left-side navigation pane, click Integration Management.
3. On the Integrated Environments tab, view the environment list on the Container Service tab. Find the ACK environment instance and click Metric Scraping in the Actions column. The Metric Scraping tab appears.
4. On the Metric Scraping tab, add ServiceMonitor and PodMonitor settings to define Prometheus metric collection rules.
  For more information, see Manage the custom collection rules of ACK environments.
5. After you complete the preceding operations, click the Self-Monitoring tab on the Integration Management page and click Targets to check whether the custom metrics are configured. You can click the hyperlink in the Endpoint column to increase the metric value.
  For more information about how to configure metrics, see DATA MODEL.
View custom metrics.
On the Metrics Explorer tab of the Integration Management page, you can select the custom metrics that you want to view or specify PromQL statements to view and verify custom metrics. For more information, see Metric exploration.
On the Self-Monitoring tab of the Integration Management page, click Monitoring to view custom metrics on Grafana dashboards.

To use ServiceMonitors to create custom metrics, you need to add Service labels instead of adding pod annotations.

Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, click the name of your cluster. In the left-side navigation pane of the cluster details page, choose Workloads > Deployments. Follow the on-screen instructions to create a workload.
The following example shows how to configure the parameters of a Deployment. For more information, see Create a stateless application by using a Deployment.
1. On the Container wizard page, specify a container image and the required resources, create a web application, expose port 5000, and then click Next.
2. On the Advanced page, click Create in the Services section and configure the Service.
  Set Service Type to SLB and configure Port Mapping. Add a Service label. For example, set the label key to app and the label value to custom-metrics-pindex. This label is used by ServiceMonitors as a selector.
Configure custom metrics. Use the endpoints that Managed Service for Prometheus scrapes.
1. Log on to the ARMS console.
2. In the left-side navigation pane, click Integration Management.
3. On the Integrated Environments tab, view the environment list on the Container Service tab. Find the ACK environment instance and click Metric Scraping in the Actions column. The Metric Scraping tab appears.
4. On the Metric Scraping tab, click Service Monitor and then click Create. Follow the on-screen instructions to configure a ServiceMonitor and click Create.
  For more information about how to configure custom metrics, see ACK service discoveries.
  Show YAML content
  apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: # Specify a unique name. name: custom-metrics-pindex # Specify a namespace. namespace: default spec: endpoints: - interval: 30s # Specify the name of the port specified in the service.yaml file. port: web # Specify the path of the Service. path: /access namespaceSelector: any: true # The namespace to which the NGINX demo application belongs. selector: matchLabels: # Specify the label specified in the service.yaml file. app: custom-metrics-pindex
5. On the Self-Monitoring tab of the Integration Management page, click Targets to check whether the endpoints that Managed Service for Prometheus scrapes are displayed.
  Note
  Compared with the method of creating custom metrics by adding annotations, this method provides more information, which includes the namespace and name of the Service.
Select a metric and click the hyperlink in the Endpoint column to increase the metric value.
For more information about how to configure metrics, see DATA MODEL.
View custom metrics.
On the Metrics Explorer tab of the Integration Management page, you can select the custom metrics that you want to view or specify PromQL statements to view and verify custom metrics. For more information, see Metric exploration.
On the Self-Monitoring tab of the Integration Management page, click Monitoring to view custom metrics on Grafana dashboards.

FAQ

How do I check the version of the ack-arms-prometheus component?

Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side navigation pane, choose Operations > Add-ons.
On the Add-ons page, click the Logs and Monitoring tab and find the ack-arms-prometheus component.
The version number is displayed in the lower part of the component. If a new version is available, click Upgrade on the right side to update the component.
Note
The Upgrade button is displayed only if the component is not updated to the latest version.

Why is Managed Service for Prometheus unable to monitor GPU-accelerated nodes?

Managed Service for Prometheus may be unable to monitor GPU-accelerated nodes that are configured with taints. You can perform the following steps to view the taints of a GPU-accelerated node.

Run the following command to view the taints of a GPU-accelerated node:
If you added custom taints to the GPU-accelerated node, you can view information about the custom taints. In this example, a taint whose key is set to test-key, value is set to test-value, and effect is set to NoSchedule is added to the node.
```
kubectl describe node cn-beijing.47.100.***.***
```
Expected output:
```
Taints:test-key=test-value:NoSchedule
```

Use one of the following methods to handle the taint:

Run the following command to delete the taint from the GPU-accelerated node:

kubectl taint node cn-beijing.47.100.***.*** test-key=test-value:NoSchedule-

Add a toleration rule that allows pods to be scheduled to the CPU-accelerated node with the taint.

# 1 Run the following command to modify ack-prometheus-gpu-exporter: 
kubectl edit daemonset -n arms-prom ack-prometheus-gpu-exporter

# 2. Add the following fields to the YAML file to tolerate the taint: 
#Other fields are omitted. 
# The tolerations field must be added above the containers field and both fields must be of the same level. 
tolerations:
- key: "test-key"
  operator: "Equal"
  value: "test-value"
  effect: "NoSchedule"
containers:
 # Irrelevant fields are not shown.

What do I do if I fail to reinstall ack-arms-prometheus due to residual resource configurations of ack-arms-prometheus?

If you delete only the namespace of Managed Service for Prometheus, resource configurations are retained. In this case, you may fail to reinstall ack-arms-prometheus. You can perform the following operations to delete the residual resource configurations:

Run the following command to delete the arms-prom namespace:
```
kubectl delete namespace arms-prom
```

Run the following commands to delete the related ClusterRoles:

kubectl delete ClusterRole arms-kube-state-metrics
kubectl delete ClusterRole arms-node-exporter
kubectl delete ClusterRole arms-prom-ack-arms-prometheus-role
kubectl delete ClusterRole arms-prometheus-oper3
kubectl delete ClusterRole arms-prometheus-ack-arms-prometheus-role
kubectl delete ClusterRole arms-pilot-prom-k8s
kubectl delete ClusterRole gpu-prometheus-exporter

Run the following commands to delete the related ClusterRoleBindings:

kubectl delete ClusterRoleBinding arms-node-exporter
kubectl delete ClusterRoleBinding arms-prom-ack-arms-prometheus-role-binding
kubectl delete ClusterRoleBinding arms-prometheus-oper-bind2
kubectl delete ClusterRoleBinding arms-kube-state-metrics
kubectl delete ClusterRoleBinding arms-pilot-prom-k8s
kubectl delete ClusterRoleBinding arms-prometheus-ack-arms-prometheus-role-binding
kubectl delete ClusterRoleBinding gpu-prometheus-exporter

Run the following commands to delete the related Roles and RoleBindings:

kubectl delete Role arms-pilot-prom-spec-ns-k8s
kubectl delete Role arms-pilot-prom-spec-ns-k8s -n kube-system
kubectl delete RoleBinding arms-pilot-prom-spec-ns-k8s
kubectl delete RoleBinding arms-pilot-prom-spec-ns-k8s -n kube-system

After you delete the residual resource configurations, go to the ACK console, choose Operations > Add-ons, and reinstall the ack-arms-prometheus component.

What do I do if the "xxx in use" error is prompted when I install ack-arms-prometheus?

Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage. Then, click the name of the cluster or click Details in the Actions column.
In the left-side navigation pane of the cluster details page, choose Applications > Helm. On the Helm page, check whether the ack-arms-prometheus application is displayed.
- If the ack-arms-prometheus application is displayed on the Helm page, delete the ack-arms-prometheus application and then install ack-arms-prometheus on the Add-ons page. For more information about how to install ack-arms-prometheus, see Manage components.
- If the ack-arms-prometheus application is displayed on the Helm page, perform the following steps:
  1. If the ack-arms-prometheus application is not displayed on the Helm page, it indicates that residual data exists after the ack-arms-prometheus application is deleted. You must manually delete the residual data. For more information about how to delete the residual data related to ack-arms-prometheus, see Managed Service for Prometheus FAQ.
  2. Install ack-arms-prometheus on the Add-ons page. For more information about how to install ack-arms-prometheus, see Manage components.
  3. If the issue persists, submit a ticket.

What do I do if ack-arms-prometheus installation fails after the system prompts "Component Not Installed"?

Check whether ack-arms-prometheus is already installed.
1. Log on to the ACK console. In the left-side navigation pane, click Clusters.
2. On the Clusters page, find the cluster that you want to manage. Then, click the name of the cluster or click Details in the Actions column.
3. Go to the cluster details page in the ACK console and choose Applications > Helm in the left-side navigation pane.
  Check whether the ack-arms-prometheus application is displayed on the Helm page.
  - If the ack-arms-prometheus application is displayed on the Helm page, delete the ack-arms-prometheus application on the Helm page and then install ack-arms-prometheus from the Add-ons page. For more information about how to install ack-arms-prometheus, see Manage components.
  - If the ack-arms-prometheus application is not displayed on the Helm page, perform the following operations:
    1. If the ack-arms-prometheus application is not displayed on the Helm page, it indicates that residual data exists after the ack-arms-prometheus application is deleted. You must manually delete the residual data. For more information about how to delete the residual data related to ack-arms-prometheus, see Managed Service for Prometheus FAQ.
    2. Install ack-arms-prometheus on the Add-ons page. For more information about how to install ack-arms-prometheus, see Manage components.
    3. If the issue persists, submit a ticket.
Check whether errors are reported in the log of ack-arms-prometheus.
1. Log on to the ACK console. In the left-side navigation pane, click Clusters.
2. On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, choose Workloads > Deployments.
3. In the upper part of the Deployments page, set Namespace to arms-prom and then click arms-prometheus-ack-arms-prometheus.
4. Click the Logs tab and check whether errors are reported in the log.
  If errors are reported in the log, submit a ticket.
Check whether installation errors are reported by the Prometheus agent.
1. Log on to the ARMS console.
2. In the left-side navigation pane, click Integration Management.
3. On the Integrated Environments tab, view the environment list on the Container Service tab. Find the ACK environment instance and click Configure Agent in the Actions column. The Configure Agent page appears.
4. Check whether the installed agents run as normal. If an error is reported, submit a ticket.

Introduction to Managed Service for Prometheus

Billing

Step 1: Enable Managed Service for Prometheus

Step 2: View Grafana dashboards provided by Managed Service for Prometheus

Step 3: (Optional) Configure alert rules in Managed Service for Prometheus

1. Create a notification object

2: Configure alert rules

(Optional) Step 4: Create custom metrics and use Grafana to display the metrics

FAQ

How do I check the version of the ack-arms-prometheus component?

Why is Managed Service for Prometheus unable to monitor GPU-accelerated nodes?

What do I do if I fail to reinstall ack-arms-prometheus due to residual resource configurations of ack-arms-prometheus?

What do I do if the "xxx in use" error is prompted when I install ack-arms-prometheus?

What do I do if ack-arms-prometheus installation fails after the system prompts "Component Not Installed"?

Sales Support

Technical Support

Connect & Report Abuse

About Alibaba Cloud

Our Global Network

Quick Start

Global Offices

Olympic Games Paris 2024 New

Stade Roland Garros – Glitz from the Past New

Place de la Concorde – “Breaking” the Barriers New

Vaires-sur-Marne Nautical Stadium – Sports with Sustainability New

International Broadcast Center – Images, Sounds, and Data that Captivate Billions New

Customer Success Stories New

Trust Center

Security & Compliance Center

Cloud Compliance Resources

Security Compliance FAQs

Product & Feature Update New

Cloud Forward

Press Room

Alibaba Cloud e-Magazine New

Alibaba Cloud in Analyst Research

Notice

Go Global Service New

Go Global Alliance with Alibaba Cloud

China Gateway Hot

Information Compliance

China Gateway - MLPS 2.0 Compliance New

China Gateway - Networking

China Gateway - Global Application Acceleration New

China Gateway - Security

China Gateway - Data Security New

ICP Support Hot

China Gateway - Omnichannel Data Mid-End New

China Gateway - Organizational Data Mid-End New

China Gateway - Business Mid-End New

China Gateway - AI Service for Conversational Chatbots New

China Gateway - Online Education

China Gateway - Domain Registration

Work at Alibaba Cloud

Experienced Professionals

Students and Graduates

Free Trial

Pricing

Promo Center

Price Reduction

Pay Less and Deploy More

FinOps

Elastic Compute Service (ECS)

Simple Application Server (SAS)

Elastic GPU Service

Elastic Desktop Service (EDS)

Object Storage Service (OSS)

Cloud Enterprise Network (CEN)

Web Application Firewall (WAF)

Domain Names

Container Compute Service (ACS)

Secure Access Service Edge (SASE)

Intelligent Media Services(IMS)

Edge Security Acceleration (ESA)(Original DCDN)

Intelligent Media Management

DingTalk Enterprise

YiDA

Alibaba Cloud Model Studio

Apsara Prime - For Easy Cloud Product Selection

Alibaba Cloud ECS - Cater All Your Cloud Hosting Needs

1TB CDN—Get Free 1 TB Outbound Traffic Plan Now

Security—Under Attack? Get Free Security Support

Short Message Service - Free Testing is Available

Elastic Compute Service (ECS) Hot

CloudBox

Compute Nest

Dedicated Host Hot

ECS Bare Metal Instance

Elastic Desktop Service (EDS) Featured

Cloud Phone Beta

Elastic GPU Service Featured

Simple Application Server (SAS) Hot

Auto Scaling

Batch Compute

Elastic High Performance Computing (E-HPC)

Super Computing Cluster (SCC)

Function Compute (FC)