All Products
Search
Document Center

Container Service for Kubernetes:Enable Managed Service for Prometheus

Last Updated:Jun 17, 2024

You can view metrics of ACK Serverless clusters on predefined dashboards that are provided by Managed Service for Prometheus. This topic describes how to enable Managed Service for Prometheus for ACK Serverless clusters, configure alert rules in Managed Service for Prometheus, create custom metrics in Managed Service for Prometheus, and use Grafana to display custom metrics.

Introduction to Managed Service for Prometheus

Managed Service for Prometheus is a fully managed monitoring service interfaced with the open source Prometheus ecosystem. Managed Service for Prometheus monitors a wide array of components and provides multiple predefined dashboards.

Cluster type

Supported Prometheus agents

ACK Serverless Pro cluster

You can install managed or unmanaged Prometheus agents. By default, managed Prometheus agents are installed.

  • Managed Prometheus agents support basic and custom metrics and do not consume the resources of your cluster. The agents allow Managed Service for Prometheus to directly collect monitoring data from containers in ACK Serverless Pro clusters and allow you to use out-of-the-box features provided by Managed Service for Prometheus. The default retention period of the collected data is seven days.

  • Unmanaged Prometheus agents support basic metrics and require you to deploy a set of components, including metric collection components and Kube-State-Metrics. The pod in which an unmanaged Prometheus agent is deployed requires 3 CPU cores and 4 GB of memory. The default retention period of the collected data is seven days. You must launch at least two elastic container instances to run an unmanaged Prometheus agent. For more information about the pricing of elastic container instances, see Overview of elastic container instances.

ACK Serverless Basic cluster

You can install only unmanaged Prometheus agents. The pod in which an unmanaged Prometheus agent is deployed requires 3 CPU cores and 4 GB of memory. The default retention period of the collected data is seven days.

Managed Service for Prometheus provides a managed Prometheus monitoring system, which eliminates the need to manage underlying services, such as data storage, data display, and system maintenance. For more information about Managed Service for Prometheus, see What is Managed Service for Prometheus?

Step 1: Enable Managed Service for Prometheus

Enable Managed Service for Prometheus when you create a cluster

On the Component Configurations wizard page, select Enable Managed Service for Prometheus. For more information, see Create an ACK Serverless cluster.

image.png

    Note

    By default, Enable Managed Service for Prometheus is selected when you create an ACK Serverless cluster in the ACK console.

    After the cluster is created, the system automatically configures Managed Service for Prometheus.

A managed Prometheus agent is automatically installed in the cluster. If you want to use an unmanaged Prometheus agent, go to the cluster details page and choose Operations > Add-ons in the left-side navigation pane. On the Add-ons page, uninstall ack-arms-prometheus. Then, the unmanaged version of ack-arms-prometheus is displayed and available for installation.

Note

If ack-arms-prometheus is not displayed, it means that the region where the ACK Serverless cluster is deployed does not support Managed Service for Prometheus.

image.png

Enable Managed Service for Prometheus for an existing cluster

  1. Log on to the ACK console. In the left-side navigation pane, click Cluster.

  2. On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, choose Operations > Prometheus Monitoring.

  3. On the Prometheus Monitoring page, click Install.

    The system automatically installs the component and checks the dashboards. After the installation is complete, you can click each tab to view metrics.

View Grafana dashboards provided by Managed Service for Prometheus

On the Prometheus Monitoring page, click the name of a Grafana dashboard to view the monitoring data.

Configure alert rules in Managed Service for Prometheus

Managed Service for Prometheus allows you to create alert rules for monitoring jobs. When alert rules are met, you can receive alerts through emails, text messages, and DingTalk notifications in real time. This helps you detect errors in a proactive manner. When an alert rule is met, notifications are sent to the contact group that you specified. Before you can create a contact group, you must create a contact. When you create a contact, you can specify the mobile phone number and email address of the contact to receive notifications. You can also provide a DingTalk chatbot webhook URL that is used to automatically send alert notifications.

Step 1: Create a contact

  1. Log on to the Managed Service for Prometheus console. In the upper-left part of the Managed Service for Prometheus page, select the region where your cluster is deployed.

  2. In the left-side navigation pane, choose Alert Management > Notification Objects.

  3. On the Contacts tab, click Create Contact.

  4. In the Create Contact dialog box, set the following parameters and click OK.

    Parameter

    Description

    Name

    The name of the contact.

    Phone Number

    After you specify the mobile phone number of a contact, the contact can be notified by phone call and text message.

    Note

    You can specify only verified mobile phone numbers in a notification policy. For more information about how to verify a mobile phone number, see Verify mobile phone numbers.

    Email

    After you specify the email address of a contact, the contact can be notified by email.

    Contact Group

    Select the contact group to which you want to add the contact. For more information, see Contact groups.

    Method to Resend Notifications If Phone Notifications Fail

    Select the method for resending notifications that applies if phone notifications fail.

    You can set global default values on the Contacts tab. For more information, see the Default settings for contacts section.

    User ID

    The ID of the instant messaging (IM) tool, such as DingTalk, Lark, or WeCom.

    A valid user ID can be used to mention a contact within a group.

    To mention a contact within a Lark or WeCom group, the field is required. If you want to mention a contact within a DingTalk group, the field is optional.

    Important
    • You must specify either a mobile phone number or an email address. Each mobile phone number or email address can be used for only one contact.

    • A DingTalk chatbot can no longer be configured as a contact. To create a DingTalk chatbot, go to the DingTalk/Lark/WeCom tab. For more information, see DingTalk chatbots. Existing DingTalk chatbots remain unchanged.

Step 2: Configure alert rules

  1. Log on to the Managed Service for Prometheus console. In the left-side navigation pane, click Monitoring List.

  2. In the upper-left part of the Managed Service for Prometheus page, select the region where your cluster is deployed and click the name of the corresponding Prometheus instance.

  3. In the left-side navigation pane, click Alert Rules. On the Prometheus Alert Rules page, click Edit in the Actions column of the alert rule that you want to modify, modify the alert rule, and then click Save to quickly configure an alert rule for a metric.

    For more information, see Create an alert rule for a Prometheus instance (for the new console version) or Create an alert rule (for the old console version).

Step 4: Create custom metrics and use Grafana to display the metrics

Create custom metrics by adding annotations

Create custom metrics by using ServiceMonitors For more information, see ACK service discoveries.

  1. Log on to the ACK console. In the left-side navigation pane, click Cluster.

  2. Create an application.

    1. On the Clusters page, click the name of your cluster. In the left-side navigation pane, choose Workloads > Deployments.

    2. On the Deployments page, click Create from Image.

    3. On the Basic Information wizard page, specify the basic information of the application and click Next.

    4. On the Container wizard page, specify a container image and the required resources, create a web application, expose port 5000, and then click Next.

      In this example, the container image yejianhonghong/pindex is used.

      容器配置

    5. In the Pod Annotations section of the Advanced wizard page, add pod annotations.

      The prometheus.io/port annotation is used to specify the endpoint port that Managed Service for Prometheus scrapes. The prometheus.io/path annotation is used to specify the endpoint path that Managed Service for Prometheus scrapes.标签和注解

    6. Click Create to create the application.

      For more information about how to create an application, see Create a stateless application by using a Deployment.

  3. Create custom metrics.

    1. Log on to the Managed Service for Prometheus console.

    2. In the upper-left part of the Managed Service for Prometheus page, select the region where your cluster is deployed and click the name of the corresponding Prometheus instance to go to the instance page.

    3. In the left-side navigation pane, click Service Discovery. Then, click the Configure tab and add ServiceMonitor and PodMonitor settings to define Prometheus metric collection rules.

      For more information about how to configure custom metrics, see Manage service discoveries.

    4. Click the Targets tab to view the custom metrics that you configured.

      自定义指标

  4. In the ACK console, access the external endpoint of the Service that you created to increase the value of the following custom metric.

    image.png

    For more information about metrics, see Data model.增加指标值

  5. View custom metrics in Grafana.

    1. Log on to the Managed Service for Prometheus console.

    2. In the upper-left part of the Managed Service for Prometheus page, select the region where your cluster is deployed and click the name of the corresponding Prometheus instance to go to the instance page.

    3. In the left-side navigation pane, click Dashboards and click a predefined dashboard to log on to Grafana. Then, click the image.png icon in the upper-right part of the page and click Add a new panel to add a panel.

      image.png

    4. Select your ACK cluster as the data source and enter a PromQL statement. For example, set Metrics to current_person_counts.

      image.png

  6. Save the configurations to view custom metrics in the Grafana chart.

    Grafana

Create custom metrics by using ServiceMonitors

To use ServiceMonitors to create custom metrics, you need to add labels instead of annotations to your Services.

  1. Log on to the ACK console. In the left-side navigation pane, click Cluster.

  2. Create an application.

    1. On the Clusters page, click the name of a cluster and choose Workloads > Deployments in the left-side navigation pane.

    2. On the Deployments tab, click Create from Image.

    3. On the Basic Information wizard page, configure the basic settings and click Next.

    4. On the Container page, specify an image that is used to create a web application, specify resource specifications for the web application, and open port 5000. Then, click Next.

      In this example, the yejianhonghong/pindex image is used.

      image.png

    5. On the Advanced wizard page, click Create.

  3. Configure custom metrics. Use the endpoints that Managed Service for Prometheus scrapes.

    1. Log on to the Managed Service for Prometheus console.

    2. In the upper-left part of the Managed Service for Prometheus page, select the region where your cluster is deployed and click the name of the corresponding Prometheus instance to go to the instance page.

    3. In the left-side navigation pane, click Service Discovery. Then, click the Configure tab.

    4. On the Configure tab, click the ServiceMonitor tab.

    5. On the ServiceMonitor tab, click Add ServiceMonitor to create a ServiceMonitor.

      The following code block shows the YAML template:

      apiVersion: monitoring.coreos.com/v1
      kind: ServiceMonitor
      metadata:
        # Enter a unique name. 
        name: custom-metrics-pindex
        # Specify a namespace. 
        namespace: default
      spec:
        endpoints:
        - interval: 30s
          # Enter the name of the port specified in the Port Mapping section when you created the Service, as shown in the preceding figure. 
          port: web
          # Enter the path of the Service. 
          path: /access
        namespaceSelector:
          any: true
          # The namespace of the NGINX demo application. 
        selector:
          matchLabels:
            # Enter the label that you added to the Service. 
            app: custom-metrics-pindex

      Click OK to create the ServiceMonitor.

      For more information about how to configure custom metrics, see Manage service discovery.

    6. On the Targets tab, the endpoints that Managed Service for Prometheus scrapes are displayed.

      Scape Endpioint

      Note

      The definition of a ServiceMonitor provides more information than an annotation, which includes the namespace and name of the Service.

  4. In the ACK console, access the external endpoint of the Service to increase the value of the following metric.

    image.png

    For more information about metrics, see Data model.增加指标值

  5. View custom metrics in Grafana.

    1. Log on to the Managed Service for Prometheus console.

    2. In the upper-left part of the Managed Service for Prometheus page, select the region where your cluster is deployed and click the name of the corresponding Prometheus instance to go to the instance page.

    3. In the left-side navigation pane, click Dashboards and click a predefined dashboard to log on to Grafana. Then, click the image.png icon in the upper-right part of the page and click Add a new panel to add a panel.

      image.png

    4. Select your ACK cluster as the data source and enter a PromQL statement. For example, set Metrics to current_person_counts.

      image.png

  6. Save the configurations to view custom metrics in the Grafana chart.

    Grafana

FAQ

How do I check the version of the ack-arms-prometheus component?

  1. Log on to the ACK console. In the left-side navigation pane, click Cluster.

  2. On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, choose Operations > Add-ons.

  3. On the Add-ons page, click the Logs and Monitoring tab and find the ack-arms-prometheus component.

    The version number is displayed in the lower part of the component. If a new version is available, click Upgrade on the right side to update the component.

    Note

    The Upgrade button is displayed only if the component is not updated to the latest version.

Why is Managed Service for Prometheus unable to monitor GPU-accelerated nodes?

Note

This issue is related only to unmanaged Prometheus agents.

Managed Service for Prometheus may be unable to monitor GPU-accelerated nodes that are configured with taints. You can perform the following steps to view the taints of a GPU-accelerated node.

  1. Run the following command to view the taints of a GPU-accelerated node:

    If you added custom taints to the GPU-accelerated node, you can view information about the custom taints. In this example, a taint whose key is set to test-key, value is set to test-value, and effect is set to NoSchedule is added to the node.

    kubectl describe node cn-beijing.47.100.***.***

    Expected output:

    Taints:test-key=test-value:NoSchedule
  2. Use one of the following methods to handle the taint:

    • Run the following command to delete the taint from the GPU-accelerated node:

      kubectl taint node cn-beijing.47.100.***.*** test-key=test-value:NoSchedule-
    • Add a toleration rule that allows pods to be scheduled to the CPU-accelerated node with the taint.

      # 1 Run the following command to modify ack-prometheus-gpu-exporter: 
      kubectl edit daemonset -n arms-prom ack-prometheus-gpu-exporter
      
      # 2. Add the following fields to the YAML file to tolerate the taint: 
      #Other fields are omitted. 
      # The tolerations field must be added above the containers field and both fields must be of the same level. 
      tolerations:
      - key: "test-key"
        operator: "Equal"
        value: "test-value"
        effect: "NoSchedule"
      containers:
       # Irrelevant fields are not shown.

What do I do if I fail to reinstall ack-arms-prometheus due to residual resource configurations of ack-arms-prometheus?

Note

This issue is related only to unmanaged Prometheus agents.

If you delete only the namespace of Managed Service for Prometheus, resource configurations are retained. In this case, you may fail to reinstall ack-arms-prometheus. You can perform the following operations to delete the residual resource configurations:

  • Run the following command to delete the arms-prom namespace:

    kubectl delete namespace arms-prom
  • Run the following commands to delete the related ClusterRoles:

    kubectl delete ClusterRole arms-kube-state-metrics
    kubectl delete ClusterRole arms-node-exporter
    kubectl delete ClusterRole arms-prom-ack-arms-prometheus-role
    kubectl delete ClusterRole arms-prometheus-oper3
    kubectl delete ClusterRole arms-prometheus-ack-arms-prometheus-role
    kubectl delete ClusterRole arms-pilot-prom-k8s
    kubectl delete ClusterRole gpu-prometheus-exporter
  • Run the following commands to delete the related ClusterRoleBindings:

    kubectl delete ClusterRoleBinding arms-node-exporter
    kubectl delete ClusterRoleBinding arms-prom-ack-arms-prometheus-role-binding
    kubectl delete ClusterRoleBinding arms-prometheus-oper-bind2
    kubectl delete ClusterRoleBinding arms-kube-state-metrics
    kubectl delete ClusterRoleBinding arms-pilot-prom-k8s
    kubectl delete ClusterRoleBinding arms-prometheus-ack-arms-prometheus-role-binding
    kubectl delete ClusterRoleBinding gpu-prometheus-exporter
  • Run the following commands to delete the related Roles and RoleBindings:

    kubectl delete Role arms-pilot-prom-spec-ns-k8s
    kubectl delete Role arms-pilot-prom-spec-ns-k8s -n kube-system
    kubectl delete RoleBinding arms-pilot-prom-spec-ns-k8s
    kubectl delete RoleBinding arms-pilot-prom-spec-ns-k8s -n kube-system