You can enable Managed Service for Prometheus for a Container Service for Kubernetes (ACK) cluster to monitor the cluster and containers in the cluster in real time. After you enable Managed Service for Prometheus, you can view metrics displayed on Grafana dashboards. You can also specify custom contacts to receive alert notifications and configure custom metrics.
Introduction to Managed Service for Prometheus
Managed Service for Prometheus is a fully managed monitoring service interfaced with the open source Prometheus ecosystem. Managed Service for Prometheus monitors a wide array of components and provides multiple predefined dashboards. Managed Service for Prometheus provides two container monitoring editions: basic edition and Pro edition.
Basic edition: This edition uses an unmanaged Prometheus agent to collect basic metrics from containers. A pod in which an unmanaged Prometheus agent is deployed requests 3 CPU cores and 4 GB of memory. The default retention period of metrics collected by this edition is seven days.
Pro edition: This edition uses a managed Prometheus agent to collect basic and custom metrics from containers. A managed Prometheus agent does not consume the resources of your cluster. The default retention period of metrics collected by this edition is 90 days.
Managed Service for Prometheus provides a managed Prometheus monitoring system, which saves you the effort to manage underlying services, such as data storage, data display, and system maintenance. For more information about Managed Service for Prometheus, see What is Managed Service for Prometheus?
Billing
Basic metrics
After you enable Managed Service for Prometheus, ACK collects metrics from containers. The default metrics that ACK collects are basic metrics. By default, basic metrics are free of charge. For more information about the basic metrics supported by Managed Service for Prometheus, see Metrics.
Basic metrics collected by default
Basic metrics collected after you enable other features
Metrics related to container resources (kubelet)
Metrics related to application status (kube-state-metrics)
Metrics related to node resources (node-exporter)
Metrics related to GPUs (ack-gpu-exporter)
Metrics related to control plane components in ACK managed clusters, such as the API server, etcd, kube-scheduler, kube-controller-manager, and cloud-controller-manager.
Metrics related to CoreDNS
Metrics related to Ingress controllers
After you use csi-plugin to monitor storage resources on the node side, metrics related to csi-plugin are collected.
After you enable cost insights, metrics related to ack-cost-exporter are collected.
After you enable basic monitoring or resource profiling for fine-grained scheduling, metrics related to ack-koordinator are collected.
ImportantIf you modify the retention period of collected metrics or collect custom metrics, additional fees are charged. For more information about how to modify the retention period of a metric, see How do I change the retention period of metric data? For more information about the billing rules of Managed Service for Prometheus, see Billing overview.
Managed Service for Grafana
By default, Managed Service for Grafana Shared Edition is used to display the metrics collected by Managed Service for Prometheus. For more information about the billing rules of Managed Service for Grafana, see Billing rules.
Step 1: Enable Managed Service for Prometheus
In a new cluster
On the Component Configurations wizard page, select Enable Managed Service for Prometheus. For more information, see Create an ACK managed cluster.
By default, Enable Managed Service for Prometheus is selected when you create a cluster. After the cluster is created, the system automatically configures Managed Service for Prometheus.
In an existing cluster
Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, choose .
On the Prometheus Monitoring page, follow the on-screen instructions to install the required component and check the relevant dashboards.
The system automatically installs the component and checks the dashboards. After the installation is completed, you can click each tab to view metrics.
Step 2: View Grafana dashboards provided by Managed Service for Prometheus
On the Prometheus Monitoring page in the ACK console, you can click different Grafana dashboards to view different monitoring data.
Step 3: (Optional) Configure alert rules in Managed Service for Prometheus
Managed Service for Prometheus allows you to create alert rules for monitoring jobs. When alert rules are met, you can receive alerts by email, text message, and DingTalk notification in real time. This helps you detect errors in a proactive manner. When an alert rule is met, notifications are sent to the contact group that you specified. Before you can create a contact group, you must create a contact. When you create a contact, you can specify the mobile phone number and email address of the contact to receive notifications. You can also provide a DingTalk chatbot webhook URL that is used to automatically send alert notifications.
1: Create a contact
Log on to the Managed Service for Prometheus console. In the upper-left corner of the Managed Service for Prometheus page, select the region where your cluster is deployed.
In the left-side navigation pane, choose Alert Management > Notification Objects.
On the Contacts tab, click Create Contact.
Follow the on-screen instructions to specify the contact information.
For more information, see Contacts.
2: Configure alert rules
Log on to the Managed Service for Prometheus console. In the left-side navigation pane, click Instances.
In the upper part of the page that appears, select the region where your cluster is deployed. Click the name of the Prometheus instance used by your cluster to go to the instance details page.
In the left-side navigation pane of the instance details page, click Alert rules. On the Prometheus Alert Rules page, select an alert rule and click Edit in the Actions column to modify the alert rule. After you modify the rule, click Save to quickly create an alert rule for a metric.
For more information, see Create an alert rule for a Prometheus instance (for the new console version) or Create an alert rule (for the old console version).
(Optional) Step 4: Create custom metrics and use Grafana to display the metrics
Add annotations to create custom metrics and use the default service discovery feature to collect the metrics
You can add annotations to the templates of Deployments to define custom metrics. Managed Service for Prometheus uses the default service discovery feature to automatically collect custom metrics from pods.
Log on to the ACK console. In the left-side navigation pane, click Clusters.
Create an application.
For more information about how to create a Deployment, see Create a Deployment.
On the Clusters page, click the name of your cluster. In the left-side navigation pane of the cluster details page, choose Workloads > Deployments.
On the Deployments page, click Create from Image. On the Basic Information wizard page, specify the basic information of the application and click Next.
On the Container wizard page, specify a container image and the required resources, create a web application, expose port 5000, and then click Next.
Name: In this example,
yejianhonghong/pindex:latest
is specified.Ports: In this example, the port name is set to
web
, the port number is set to5000
, and the protocol is set toTCP
.
On the Advanced page, create a Service and add pod annotations.
Create a Service.
In the Services section, click Create and configure the Service.
For more information about how to create a Service, see Create a Service.
Parameter
Description
Name
You can specify a custom name, such as
custom-metrics-pindex
.Service Type
Select Server Load Balancer and Public Access.
Port Mapping
Set the Name, Service Port, and Container Port fields. For example, set Name to
web
, Service Port to5000
, and Container Port to5000
.In the Pod Annotations section, click Add to add the following pod annotations.
Add the
prometheus.io/scrape
annotation and set the value totrue
. This enables Managed Service for Prometheus to scrape metrics.Add the
prometheus.io/port
annotation and set the value to5000
. This specifies that the endpoint port5000
is scraped by Managed Service for Prometheus.Add the
prometheus.io/path
annotation and set the value to/access
. This specifies that the endpoint path/access
is scraped by Managed Service for Prometheus.
Click Create to create the application.
Configure custom metrics.
Log on to the Managed Service for Prometheus console.
In the upper part of the Instances page, select the region where your cluster is deployed. Click the name of the Prometheus instance used by your cluster to go to the instance details page.
In the left-side navigation pane of the instance details page, click Service Discovery. Then, click the Configure tab and add ServiceMonitor and PodMonitor settings to define Prometheus metric collection rules.
For more information about how to configure custom metrics, see ACK service discoveries.
After the preceding operations are completed, click the Targets tab to check whether the custom metrics are configured.
Select a metric and click the hyperlink in the Endpoint column to increase the metric value.
For more information about how to configure metrics, see DATA MODEL.
Use Grafana to display custom metrics.
Go to the Instances page in the Managed Service for Prometheus console and click the name of the Prometheus instance used by your cluster to go to the instance details page.
In the left-side navigation pane, click Dashboards. Click any dashboard to log on to Grafana. Then, click in the upper-right corner and click Add a new panel.
Select your ACK cluster as the data source and enter a PromQL statement. For example, select the Code mode and set Metrics to
current_person_counts
.
Save the configurations to view the Grafana chart of the custom metric.
Use ServiceMonitors to create custom metrics and use Service labels to collect the metrics
To use ServiceMonitors to create custom metrics, you need to add Service labels instead of adding pod annotations.
Log on to the ACK console. In the left-side navigation pane, click Clusters.
Create an application.
On the Clusters page, click the name of your cluster. In the left-side navigation pane of the cluster details page, choose Workloads > Deployments.
On the Deployments page, click Create from Image. On the Basic Information wizard page, specify the basic information of the application and click Next.
On the Container wizard page, specify a container image and the required resources, create a web application, expose port 5000, and then click Next.
Name: In this example,
yejianhonghong/pindex:latest
is specified.Ports: In this example, the port name is set to
web
, the port number is set to5000
, and the protocol is set toTCP
.
On the Advanced page, click Create in the Services section and configure the Service.
For more information about how to create a Service, see Create a Service.
Parameter
Description
Name
You can specify a custom name, such as
custom-metrics-pindex
.Service Type
Select Server Load Balancer and Public Access.
Port Mapping
Set the Name, Service Port, and Container Port fields. For example, set Name to
web
, Service Port to5000
, and Container Port to5000
.Label
Add a label. For example, set the key to
app
and the value tocustom-metrics-pindex
. This label is used by ServiceMonitors as a selector.Click Create to create the application.
Configure custom metrics. Use the endpoints that Managed Service for Prometheus scrapes.
Log on to the Managed Service for Prometheus console.
In the upper-left corner of the Managed Service for Prometheus page, select the region where your cluster is deployed and click the name of the corresponding Prometheus instance to go to the instance details page.
In the left-side navigation pane of the instance details page, click Service Discovery. Then, click the Configure tab and click ServiceMonitor on the Configure tab.
On the ServiceMonitor tab, click Add ServiceMonitor. In the Add ServiceMonitor dialog box, configure the ServiceMonitor and click OK.
For more information about how to configure custom metrics, see ACK service discoveries.
On the Targets tab, the endpoints that Managed Service for Prometheus scrapes are displayed.
NoteCompared with the method of creating custom metrics by adding annotations, this method provides more information, which includes the namespace and name of the Service.
Select a metric and click the hyperlink in the Endpoint column to increase the metric value.
For more information about how to configure metrics, see DATA MODEL.
Use Grafana to display custom metrics.
Log on to the Managed Service for Prometheus console.
In the upper-left corner of the Managed Service for Prometheus page, select the region where your cluster is deployed and click the name of the Prometheus instance used by your cluster to go to the instance details page.
In the left-side navigation pane, click Dashboards. Click any dashboard to log on to Grafana. Then, click in the upper-right corner and click Add a new panel.
Select your ACK cluster as the data source and enter a PromQL statement. For example, select the
Code
mode and set Metrics tocurrent_person_counts
.
Save the configurations to view the Grafana chart of the custom metric.
FAQ
How do I check the version of the ack-arms-prometheus component?
Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side navigation pane, choose .
On the Add-ons page, click the Logs and Monitoring tab and find the ack-arms-prometheus component.
The version number is displayed in the lower part of the component. If a new version is available, click Upgrade on the right side to update the component.
NoteThe Upgrade button is displayed only if the component is not updated to the latest version.
Why is Managed Service for Prometheus unable to monitor GPU-accelerated nodes?
Managed Service for Prometheus may be unable to monitor GPU-accelerated nodes that are configured with taints. You can perform the following steps to view the taints of a GPU-accelerated node.
Run the following command to view the taints of a GPU-accelerated node:
If you added custom taints to the GPU-accelerated node, you can view information about the custom taints. In this example, a taint whose
key
is set totest-key
,value
is set totest-value
, andeffect
is set toNoSchedule
is added to the node.kubectl describe node cn-beijing.47.100.***.***
Expected output:
Taints:test-key=test-value:NoSchedule
Use one of the following methods to handle the taint:
Run the following command to delete the taint from the GPU-accelerated node:
kubectl taint node cn-beijing.47.100.***.*** test-key=test-value:NoSchedule-
Add a toleration rule that allows pods to be scheduled to the CPU-accelerated node with the taint.
# 1 Run the following command to modify ack-prometheus-gpu-exporter: kubectl edit daemonset -n arms-prom ack-prometheus-gpu-exporter # 2. Add the following fields to the YAML file to tolerate the taint: #Other fields are omitted. # The tolerations field must be added above the containers field and both fields must be of the same level. tolerations: - key: "test-key" operator: "Equal" value: "test-value" effect: "NoSchedule" containers: # Irrelevant fields are not shown.
What do I do if I fail to reinstall ack-arms-prometheus due to residual resource configurations of ack-arms-prometheus?
If you delete only the namespace of Managed Service for Prometheus, resource configurations are retained. In this case, you may fail to reinstall ack-arms-prometheus. You can perform the following operations to delete the residual resource configurations:
Run the following command to delete the arms-prom namespace:
kubectl delete namespace arms-prom
Run the following commands to delete the related ClusterRoles:
kubectl delete ClusterRole arms-kube-state-metrics kubectl delete ClusterRole arms-node-exporter kubectl delete ClusterRole arms-prom-ack-arms-prometheus-role kubectl delete ClusterRole arms-prometheus-oper3 kubectl delete ClusterRole arms-prometheus-ack-arms-prometheus-role kubectl delete ClusterRole arms-pilot-prom-k8s kubectl delete ClusterRole gpu-prometheus-exporter
Run the following commands to delete the related ClusterRoleBindings:
kubectl delete ClusterRoleBinding arms-node-exporter kubectl delete ClusterRoleBinding arms-prom-ack-arms-prometheus-role-binding kubectl delete ClusterRoleBinding arms-prometheus-oper-bind2 kubectl delete ClusterRoleBinding arms-kube-state-metrics kubectl delete ClusterRoleBinding arms-pilot-prom-k8s kubectl delete ClusterRoleBinding arms-prometheus-ack-arms-prometheus-role-binding kubectl delete ClusterRoleBinding gpu-prometheus-exporter
Run the following commands to delete the related Roles and RoleBindings:
kubectl delete Role arms-pilot-prom-spec-ns-k8s kubectl delete Role arms-pilot-prom-spec-ns-k8s -n kube-system kubectl delete RoleBinding arms-pilot-prom-spec-ns-k8s kubectl delete RoleBinding arms-pilot-prom-spec-ns-k8s -n kube-system
After you delete the residual resource configurations, go to the ACK console, choose Operations > Add-ons, and reinstall the ack-arms-prometheus component.
What do I do if the "xxx in use" error is prompted when I install ack-arms-prometheus?
Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage. Then, click the name of the cluster or click Details in the Actions column.
In the left-side navigation pane of the cluster details page, choose
.On the Helm page, check whether ack-arms-prometheus is displayed.
If ack-arms-prometheus is displayed on the Helm page, perform the following steps:
Delete ack-arms-prometheus on the Helm page and then install ack-arms-prometheus on the Add-ons page. For more information about how to install ack-arms-prometheus, see Manage components.
If ack-arms-prometheus is not displayed on the Helm page, perform the following steps:
If ack-arms-prometheus is not displayed on the Helm page, it indicates that residual data exists after ack-arms-prometheus is deleted. You must manually delete the residual data. For more information about how to delete the residual data related to ack-arms-prometheus, see Managed Service for Prometheus FAQ.
Install ack-arms-prometheus on the Add-ons page. For more information about how to install ack-arms-prometheus, see Manage components.
If the issue persists, submit a ticket.
What do I do if ack-arms-prometheus installation fails after the system prompts "Component Not Installed"?
Check whether ack-arms-prometheus is already installed.
Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage. Then, click the name of the cluster or click Details in the Actions column.
Go to the cluster details page in the ACK console and choose Applications > Helm in the left-side navigation pane.
Check whether ack-arms-prometheus is displayed on the Helm page.
ack-arms-prometheus is displayed on the Helm page
If ack-arms-prometheus is displayed on the Helm page, delete ack-arms-prometheus on the Helm page and then install ack-arms-prometheus from the Add-ons page. For more information about how to install ack-arms-prometheus, see Manage components.
ack-arms-prometheus is not displayed on the Helm page
If ack-arms-prometheus is not displayed on the Helm page, perform the following operations:
If ack-arms-prometheus is not displayed on the Helm page, it indicates that residual data exists after ack-arms-prometheus is deleted. You must manually delete the residual data. For more information about how to delete the residual data related to ack-arms-prometheus, see Managed Service for Prometheus FAQ.
Install ack-arms-prometheus on the Add-ons page. For more information about how to install ack-arms-prometheus, see Manage components.
If the issue persists, submit a ticket.
Check whether errors are reported in the log of ack-arms-prometheus.
Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, choose
.In the upper part of the Deployments page, set Namespace to arms-prom and then click arms-prometheus-ack-arms-prometheus.
Click the Logs tab and check whether errors are reported in the log.
If errors are reported in the log, submit a ticket.
Check whether installation errors are reported by the Prometheus agent.
Log on to the ARMS console.
In the top navigation bar, select the region where your cluster is deployed.
In the left-side navigation pane, click Managed Service for Prometheus. On the Managed Service for Prometheus page, click the name of your cluster.
In the left-side navigation pane, click Settings. On the page that appears, click the Settings tab.
On the Settings tab, you can view the health check result. If errors are reported in the health check result, submit a ticket.