You can view metrics of ACK Serverless clusters on predefined dashboards that are provided by Managed Service for Prometheus. This topic describes how to enable Managed Service for Prometheus for ACK Serverless clusters, configure alert rules in Managed Service for Prometheus, create custom metrics in Managed Service for Prometheus, and use Grafana to display custom metrics.
Introduction to Managed Service for Prometheus
Managed Service for Prometheus is a fully managed monitoring service interfaced with the open source Prometheus ecosystem. Managed Service for Prometheus monitors a wide array of components and provides multiple predefined dashboards.
Cluster type | Supported Prometheus agents |
ACK Serverless Pro cluster | You can install managed or unmanaged Prometheus agents. By default, managed Prometheus agents are installed.
|
ACK Serverless Basic cluster | You can install only unmanaged Prometheus agents. The pod in which an unmanaged Prometheus agent is deployed requires 3 CPU cores and 4 GB of memory. The default retention period of the collected data is seven days. |
Managed Service for Prometheus provides a managed Prometheus monitoring system, which saves you the effort to manage underlying services, such as data storage, data display, and system maintenance. For more information about Managed Service for Prometheus, see What is Managed Service for Prometheus?
Step 1: Enable Managed Service for Prometheus
Enable Managed Service for Prometheus when you create a cluster
On the Component Configurations wizard page, select Enable Managed Service for Prometheus. For more information, see Create an ACK Serverless cluster.
By default, Enable Managed Service for Prometheus is selected when you create a cluster. After the cluster is created, the system automatically configures Managed Service for Prometheus.
A managed Prometheus agent is automatically installed in the cluster. If you want to use an unmanaged Prometheus agent, go to the cluster details page and choose Operations > Add-ons in the left-side navigation pane. On the Add-ons page, uninstall ack-arms-prometheus
. Then, the unmanaged version of ack-arms-prometheus is displayed and available for installation.
If ack-arms-prometheus is not displayed, it means that the region where the ACK Serverless cluster is deployed does not support Managed Service for Prometheus.
Enable Managed Service for Prometheus for an existing cluster
Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, choose .
On the Prometheus Monitoring page, follow the on-screen instructions to install the required component and check the relevant dashboards.
The system automatically installs the component and checks the dashboards. After the installation is completed, you can click each tab to view metrics.
Step 2: View Grafana dashboards provided by Managed Service for Prometheus
On the Prometheus Monitoring page in the ACK console, you can click different Grafana dashboards to view different monitoring data.
Step 3: (Optional) Configure alert rules in Managed Service for Prometheus
Managed Service for Prometheus allows you to create alert rules for monitoring jobs. When alert rules are met, you can receive alerts through emails, text messages, and DingTalk notifications in real time. This helps you detect errors in a proactive manner. If an alert rule is triggered, the system sends alert notifications to the specified contacts.
1. Create a notification object
Log on to the ARMS console. In the left-side navigation pane, choose .
Follow the on-screen instructions to configure a notification object.
For more information, see Notification objects.
2: Configure alert rules
Log on to the ARMS console. In the left-side navigation pane, choose .
In the upper part of the page that appears, select the region where your cluster is deployed. Click the name of the Prometheus instance used by your cluster to go to the instance details page.
In the left-side navigation pane, click Alert Rules. On the Prometheus Alert Rules page, configure alert rules for the notification object.
For more information, see Create an alert rule for a Prometheus instance.
Step 4: Create custom metrics and use Grafana to display the metrics
You can add annotations to create custom metrics and use the default service discovery feature to collect the metrics. Alternatively, you can use ServiceMonitors to create custom metrics and use Service labels to collect the metrics.
Pod Annotation
You can add annotations to the templates of Deployments to define custom metrics.
Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, click the name of a cluster. In the left-side navigation pane of the cluster details page, choose Workloads > Deployments. Follow the on-screen instructions to create a workload.
The following example shows how to configure the parameters of a Deployment. For more information, see Create a stateless application from an image.
On the Container wizard page, specify a container image and the required resources, create a web application, expose port 5000, and then click Next.
On the Advanced wizard page, create a Service and add pod annotations. Then, click OK.
Create a Service.
In the Services section, click Create and configure the Service. Set Service Type to SLB and configure Port Mapping.
In the Annotations section, add the following annotations:
Add the
prometheus.io/scrape
annotation and set the value totrue
. This enables Managed Service for Prometheus to scrape metrics.Add the
prometheus.io/port
annotation and set the value to5000
. This specifies that the endpoint port5000
is scraped by Managed Service for Prometheus.Add the
prometheus.io/path
annotation and set the value to/access
. This specifies that the endpoint path/access
is scraped by Managed Service for Prometheus.
Configure custom metrics.
Log on to the ARMS console.
In the left-side navigation pane, click Integration Management.
On the Integrated Environments tab, view the environment list on the Container Service tab. Find the ACK environment instance and click Metric Scraping in the Actions column. The Metric Scraping tab appears.
On the Metric Scraping tab, add ServiceMonitor and PodMonitor settings to define Prometheus metric collection rules.
For more information, see Manage the custom collection rules of ACK environments.
After you complete the preceding operations, click the Self-Monitoring tab on the Integration Management page and click Targets to check whether the custom metrics are configured. You can click the hyperlink in the Endpoint column to increase the metric value.
For more information about how to configure metrics, see DATA MODEL.
View custom metrics.
On the Metrics Explorer tab of the Integration Management page, you can select the custom metrics that you want to view or specify PromQL statements to view and verify custom metrics. For more information, see Metric exploration.
On the Self-Monitoring tab of the Integration Management page, click Monitoring to view custom metrics on Grafana dashboards.
Service labels
To use ServiceMonitors to create custom metrics, you need to add Service labels instead of adding pod annotations.
Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, click the name of your cluster. In the left-side navigation pane of the cluster details page, choose Workloads > Deployments. Follow the on-screen instructions to create a workload.
The following example shows how to configure the parameters of a Deployment. For more information, see Create a stateless application from an image.
On the Container wizard page, specify a container image and the required resources, create a web application, expose port 5000, and then click Next.
On the Advanced page, click Create in the Services section and configure the Service.
Set Service Type to SLB and configure Port Mapping. Add a Service label. For example, set the label key to
app
and the label value tocustom-metrics-pindex
. This label is used by ServiceMonitors as a selector.
Configure custom metrics. Use the endpoints that Managed Service for Prometheus scrapes.
Log on to the ARMS console.
In the left-side navigation pane, click Integration Management.
On the Integrated Environments tab, view the environment list on the Container Service tab. Find the ACK environment instance and click Metric Scraping in the Actions column. The Metric Scraping tab appears.
On the Metric Scraping tab, click Service Monitor and then click Create. Follow the on-screen instructions to configure a ServiceMonitor and click Create.
For more information about how to configure custom metrics, see ACK service discoveries.
On the Self-Monitoring tab of the Integration Management page, click Targets to check whether the endpoints that Managed Service for Prometheus scrapes are displayed.
NoteCompared with the method of creating custom metrics by adding annotations, this method provides more information, which includes the namespace and name of the Service.
Select a metric and click the hyperlink in the Endpoint column to increase the metric value.
For more information about how to configure metrics, see DATA MODEL.
View custom metrics.
On the Metrics Explorer tab of the Integration Management page, you can select the custom metrics that you want to view or specify PromQL statements to view and verify custom metrics. For more information, see Metric exploration.
On the Self-Monitoring tab of the Integration Management page, click Monitoring to view custom metrics on Grafana dashboards.
FAQ
How do I check the version of the ack-arms-prometheus component?
Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side navigation pane, choose .
On the Add-ons page, click the Logs and Monitoring tab and find the ack-arms-prometheus component.
The version number is displayed in the lower part of the component. If a new version is available, click Upgrade on the right side to update the component.
NoteThe Upgrade button is displayed only if the component is not updated to the latest version.
Why is Managed Service for Prometheus unable to monitor GPU-accelerated nodes?
This issue is related only to unmanaged Prometheus agents.
Managed Service for Prometheus may be unable to monitor GPU-accelerated nodes that are configured with taints. You can perform the following steps to view the taints of a GPU-accelerated node.
Run the following command to view the taints of a GPU-accelerated node:
If you added custom taints to the GPU-accelerated node, you can view information about the custom taints. In this example, a taint whose
key
is set totest-key
,value
is set totest-value
, andeffect
is set toNoSchedule
is added to the node.kubectl describe node cn-beijing.47.100.***.***
Expected output:
Taints:test-key=test-value:NoSchedule
Use one of the following methods to handle the taint:
Run the following command to delete the taint from the GPU-accelerated node:
kubectl taint node cn-beijing.47.100.***.*** test-key=test-value:NoSchedule-
Add a toleration rule that allows pods to be scheduled to the CPU-accelerated node with the taint.
# 1 Run the following command to modify ack-prometheus-gpu-exporter: kubectl edit daemonset -n arms-prom ack-prometheus-gpu-exporter # 2. Add the following fields to the YAML file to tolerate the taint: #Other fields are omitted. # The tolerations field must be added above the containers field and both fields must be of the same level. tolerations: - key: "test-key" operator: "Equal" value: "test-value" effect: "NoSchedule" containers: # Irrelevant fields are not shown.
What do I do if I fail to reinstall ack-arms-prometheus due to residual resource configurations of ack-arms-prometheus?
This issue is related only to unmanaged Prometheus agents.
If you delete only the namespace of Managed Service for Prometheus, resource configurations are retained. In this case, you may fail to reinstall ack-arms-prometheus. You can perform the following operations to delete the residual resource configurations:
Run the following command to delete the arms-prom namespace:
kubectl delete namespace arms-prom
Run the following commands to delete the related ClusterRoles:
kubectl delete ClusterRole arms-kube-state-metrics kubectl delete ClusterRole arms-node-exporter kubectl delete ClusterRole arms-prom-ack-arms-prometheus-role kubectl delete ClusterRole arms-prometheus-oper3 kubectl delete ClusterRole arms-prometheus-ack-arms-prometheus-role kubectl delete ClusterRole arms-pilot-prom-k8s kubectl delete ClusterRole gpu-prometheus-exporter
Run the following commands to delete the related ClusterRoleBindings:
kubectl delete ClusterRoleBinding arms-node-exporter kubectl delete ClusterRoleBinding arms-prom-ack-arms-prometheus-role-binding kubectl delete ClusterRoleBinding arms-prometheus-oper-bind2 kubectl delete ClusterRoleBinding arms-kube-state-metrics kubectl delete ClusterRoleBinding arms-pilot-prom-k8s kubectl delete ClusterRoleBinding arms-prometheus-ack-arms-prometheus-role-binding kubectl delete ClusterRoleBinding gpu-prometheus-exporter
Run the following commands to delete the related Roles and RoleBindings:
kubectl delete Role arms-pilot-prom-spec-ns-k8s kubectl delete Role arms-pilot-prom-spec-ns-k8s -n kube-system kubectl delete RoleBinding arms-pilot-prom-spec-ns-k8s kubectl delete RoleBinding arms-pilot-prom-spec-ns-k8s -n kube-system