Use open source Prometheus to monitor an ACK cluster - Container Service for Kubernetes

Prometheus is an open source project that is used to monitor cloud-native applications. This topic describes how to deploy Prometheus in a Container Service for Kubernetes (ACK) cluster.

Background information

This topic describes how to efficiently monitoring system components and resource entities in a Kubernetes cluster. A monitoring system monitors the following types of object:

Resource: resource utilization of nodes and applications. In a Kubernetes cluster, the monitoring system monitors the resource usage of nodes, pods, and the cluster.
Application: internal metrics of applications. For example, the monitoring system dynamically counts the number of online users who are using an application, collects monitoring metrics from application ports, and enables alerting based on the collected metrics.

In a Kubernetes cluster, the monitoring system monitors the following objects:

Cluster components: The components of the Kubernetes cluster, such as kube-apiserver, kube-controller-manager, and etcd. To monitor cluster components, specify the monitoring methods in configuration files.
Static resource entities: The status of resources on nodes and kernel events. To monitor static resource entities, specify the monitoring methods in configuration files.
Dynamic resource entities: Entities of abstract workloads in Kubernetes, such as Deployments, DaemonSets, and pods. To monitor dynamic resource entities in a Kubernetes cluster, you can deploy Prometheus in the Kubernetes cluster.
Custom objects in applications: For applications that require customized monitoring of data and metrics, specific configurations need to be set to meet unique monitoring requirements. This can be achieved by combining port exposure with the Prometheus monitoring solution.

Procedure

Deploy Prometheus on a Kubernetes cluster to monitor data.
1. Log on to the ACK console. In the left-side navigation pane, choose Marketplace > Marketplace.
2. On the Marketplace page, click the App Catalog tab. Then, find and click ack-prometheus-operator.
3. On the ack-prometheus-operator page, click Deploy.
4. In the Deploy wizard, select a cluster and a namespace, and then click Next.
5. On the Parameters wizard page, configure the parameters and click OK.
  Check the deployment result.
  1. Run the following command to map Prometheus in the cluster to local port 9090:
```
kubectl port-forward svc/ack-prometheus-operator-prometheus 9090:9090 -n monitoring
```
  2. Enter localhost:9090 in the address bar of a browser to visit the Prometheus page.
  3. In the top navigation bar, choose Status > Targets to view all data collection tasks. Tasks in the UP state are running as expected.
View the aggregated data.
1. Run the following command to map Grafana in the cluster to local port 3000:
```
kubectl -n monitoring port-forward svc/ack-prometheus-operator-grafana 3000:80
```
2. To view the aggregated data, enter localhost:3000 in the address bar of a browser, and then select a dashboard.
View alert rules and set silent alerts.
- View alert rules
  To view alert rules, enter localhost:9090 in the address bar of a browser, and then click Alerts in the top navigation bar.
  - Red: Alerts are being triggered based on alert rules in red.
  - Green: No alerts are being triggered based on alert rules in green.
- Set silent alerts
  Run the following command. Enter localhost:9093 in the address bar of a browser and click Silence to set silent alerts.
```
kubectl --namespace monitoring port-forward svc/alertmanager-operated 9093
```

You can follow the preceding steps to deploy Prometheus in a cluster. The following examples describe how to configure Prometheus in different scenarios.

Alert configurations

To configure alert notification methods or notification templates, perform the following steps to configure the config field in the alertmanager section:

Configure alert notification methods
You can set prometheus-operator to send alert notifications by using DingTalk messages or emails. You can perform the following steps to configure the alert notification method:
- Configure DingTalk notifications
  On the ack-prometheus-operator page, click Deploy. On the Parameters wizard page, set enabled to true in the dingtalk section, set the webhook URL of your DingTalk chatbot to the token field, and set the receiver field of the config parameter in the alertmanager section to the alert name that is specified in the receivers field. The default value of the receivers field is webhook.
  If you have two DingTalk chatbots, perform the following steps:
  1. Replace the parameter values in the token field with the webhook URLs of your DingTalk chatbots.
    Copy the webhook URLs of your DingTalk chatbots and replace the parameter values of dingtalk1 and dingtalk2 in the token field with the copied URLs. In this example, https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxx is replaced by the webhook URLs.
  2. Modify the value of the receiver parameter.
    In the alertmanager section, set the receiver fields in the config parameter to the alert names that are specified in the receivers field. In this example, webhook1 and webhook2 are used.
  3. Modify the value of the url parameter.
    Replace the value of the url parameter with the names of your DingTalk chatbots. In this example, dingtalk1 and dingtalk2 are used.
  Note
  To add more DingTalk chatbots, add more webhook URLs.
- Configure email notifications
  On the ack-prometheus-operator page, click Deploy. On the Parameters wizard page, specify the details about your email address as shown in the red box of the following figure, and set the receiver field of the config parameter in the alertmanager section to the alert name that is specified in the receivers field. The default value of the receivers field is mail.
Configure alert notification templates
You can customize the alert notification template in the templateFiles field of the alertmanager section on the Parameters wizard page, as shown in the following figure.

Mount a ConfigMap to Prometheus

This section describes how to mount a ConfigMap to the /etc/prometheus/configmaps/ path of a pod.

Method 1: Deploy prometheus-operator for the first time

If you are deploying the Prometheus Operator for the first time, follow Step 1 in the Produce section to deploy the Prometheus monitoring solution. On the Parameters wizard page, set the configMaps field in the prometheus section to the name of the ConfigMap that you want to mount. 挂载configmap

Method 2: prometheus-operator is deployed

If prometheus-operator has been deployed in your cluster, perform the following steps:

Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click its name. In the left-side navigation pane, choose Applications > Helm.
Find the ack-prometheus-operator release and click Update in the Actions column.
In the Update Release panel, set the configMaps fields in the prometheus and alertmanager sections to the name of the ConfigMap that you want to mount. Then, click OK.
For example, you want to mount a ConfigMap named special-config, which contains the configuration of Prometheus. To configure the special-config ConfigMap as a configuration file of the Prometheus pod, add the following configuration to the configMaps field in the prometheus section to specify the application monitoring method and mount the ConfigMap to the /etc/prometheus/configmaps/ path.
The following figure shows an example of the special-config ConfigMap.
The following figure shows how to set the configMaps field in the prometheus section.

Configure Grafana

Mount the dashboard configuration to Grafana
You can perform the following steps to mount a ConfigMap that contains the dashboard configuration to the Grafana pod. On the ack-prometheus-operator page, click Deploy. On the Parameters wizard page, add the following configurations to the extraConfigmapMounts section, as shown in the following figure.
Note
- Make sure that you have a ConfigMap that contains the dashboard configuration in your cluster.
  This labels that are added to the ConfigMap must be the same as those added to other ConfigMaps.
- In the extraConfigmapMounts section of the Grafana configuration, specify the name of the ConfigMap and how to mount the ConfigMap.
- Set mountPath to /tmp/dashboards/.
- Set configMap to the name of the ConfigMap.
- Set name to the name of the JSON file that stores the dashboard configuration.
Enable data persistence for dashboards
You can perform the following steps to enable data persistence for Grafana dashboards:
1. Log on to the ACK console. In the left-side navigation pane, click Clusters.
2. On the Clusters page, find the cluster that you want to manage and click its name. In the left-side pane, choose Applications > Helm.
3. Find ack-prometheus-operator and click Update in the Actions column.
4. In the Update Release panel, configure the persistence field in the grafana section as shown in the following figure.
You can export data on Grafana dashboards in JSON format to your on-premises machine. For more information, see Export a Grafana dashboard.

FAQ

What do I do if I fail to receive DingTalk alert notifications?
1. Obtain the webhook URL of your DingTalk chatbot. For more information, see Scenario 3: Implement Kubernetes monitoring and alerting with DingTalk chatbot.
2. On the Parameters wizard page, find the dingtalk section, set enabled to true, and then specify the webhook URL of your DingTalk chatbot in the token field. For more information, see Configure DingTalk alert notifications in Alert configurations.

What do I do if an error message appears when I deploy prometheus-operator in a cluster?

The following error message appears:

Can't install release with errors: rpc error: code = Unknown desc = object is being deleted: customresourcedefinitions.apiextensions.k8s.io "xxxxxxxx.monitoring.coreos.com" already exists

The error message indicates that the cluster fails to clear custom resource definition (CRD) objects of the previous deployment. Run the following commands to delete the CRD objects. Then, deploy prometheus-operator again:

kubectl delete crd prometheuses.monitoring.coreos.com
kubectl delete crd prometheusrules.monitoring.coreos.com
kubectl delete crd servicemonitors.monitoring.coreos.com
kubectl delete crd alertmanagers.monitoring.coreos.com

What do I do if I fail to receive email alert notifications?
Make sure that the value of smtp_auth_password is the SMTP authorization code instead of the logon password of the email account. Make sure that the SMTP server endpoint includes a port number.
What do I do if the console prompts the following error message after I click Update to update YAML templates: The current cluster is temporarily unavailable. Try again later or submit a ticket?
If the configuration file of Tiller is overlarge, the cluster cannot be accessed. To solve this issue, you can delete some annotations in the configuration file and mount the file to a pod as a ConfigMap. You can specify the name of the ConfigMap in the configMaps fields of the prometheus and alertmanager sections. For more information, see the second method in Mount a ConfigMap to Prometheus.
How do I enable the features of prometheus-operator after I deploy it in a cluster?
After prometheus-operator is deployed, you can perform the following steps to enable the features of prometheus-operator. Go to the cluster details page and choose Applications > Helm in the left-side navigation pane. On the Helm page, find ack-prometheus-operator and click Update in the Actions column. In Update Release panel, configure the code block to enable the features. Then, click OK.
How do I select data storage: TSDB or disks?
TSDB storage is available to limited regions. However, disk storage is supported in all regions. The following figure shows how to configure the data retention policy.
What do I do if a Grafana dashboard fails to display data properly?
Go to the cluster details page and choose Applications > Helm in the left-side navigation pane. On the Helm page, find ack-prometheus-operator and click Update in the Actions column. In Update Release panel, check whether the value of the clusterVersion field is correct. If the Kubernetes version of your cluster is earlier than 1.16, set clusterVersion to 1.14.8-aliyun.1. If the Kubernetes version of your cluster is 1.16 or later, set clusterVersion to 1.16.6-aliyun.1.

What do I do if I fail to install ack-prometheus after I delete the ack-prometheus namespace?

After you delete the ack-prometheus namespace, the related resource configurations may be retained. In this case, you may fail to install ack-prometheus again. You can perform the following operations to delete the related resource configurations:

Delete role-based access control (RBAC)-related resource configurations.

Run the following commands to delete the related ClusterRoles:

kubectl delete ClusterRole ack-prometheus-operator-grafana-clusterrole
kubectl delete ClusterRole ack-prometheus-operator-kube-state-metrics
kubectl delete ClusterRole psp-ack-prometheus-operator-kube-state-metrics
kubectl delete ClusterRole psp-ack-prometheus-operator-prometheus-node-exporter
kubectl delete ClusterRole ack-prometheus-operator-operator
kubectl delete ClusterRole ack-prometheus-operator-operator-psp
kubectl delete ClusterRole ack-prometheus-operator-prometheus
kubectl delete ClusterRole ack-prometheus-operator-prometheus-psp

Run the following commands to delete the related ClusterRoleBindings:

kubectl delete ClusterRoleBinding ack-prometheus-operator-grafana-clusterrolebinding
kubectl delete ClusterRoleBinding ack-prometheus-operator-kube-state-metrics
kubectl delete ClusterRoleBinding psp-ack-prometheus-operator-kube-state-metrics
kubectl delete ClusterRoleBinding psp-ack-prometheus-operator-prometheus-node-exporter
kubectl delete ClusterRoleBinding ack-prometheus-operator-operator
kubectl delete ClusterRoleBinding ack-prometheus-operator-operator-psp
kubectl delete ClusterRoleBinding ack-prometheus-operator-prometheus
kubectl delete ClusterRoleBinding ack-prometheus-operator-prometheus-psp

Run the following command to delete the related CRD objects:

kubectl delete crd alertmanagerconfigs.monitoring.coreos.com
kubectl delete crd alertmanagers.monitoring.coreos.com
kubectl delete crd podmonitors.monitoring.coreos.com
kubectl delete crd probes.monitoring.coreos.com
kubectl delete crd prometheuses.monitoring.coreos.com
kubectl delete crd prometheusrules.monitoring.coreos.com
kubectl delete crd servicemonitors.monitoring.coreos.com
kubectl delete crd thanosrulers.monitoring.coreos.com