Integrate ECS instances into Managed Service for Prometheus - Managed Service for Prometheus

Alibaba Cloud Managed Service for Prometheus allows you to use the Node Exporter to collect metrics from an Elastic Compute Service (ECS) instance that runs Linux or Windows, and use the Process Exporter to collect monitoring data of the processes. You can also collect custom metrics that are configured in text files.

Prerequisites

Managed Service for Prometheus is activated. For more information, see Instance billing.
An ECS instance is created. For more information, see Create and manage an ECS instance by using the ECS console (express version).
Alibaba Cloud Resource Center is activated. For more information, see Activate Resource Center.

Benefits

Managed Service for Prometheus provides an efficient and easy-to-manage monitoring solution for ECS instances. This solution meets the needs for observability and automated management in modern cloud computing environments.

The host monitoring solution provided by Managed Service for Prometheus allows you to integrate ECS instances, servers in self-managed data centers, and servers from third-party cloud service providers into Managed Service for Prometheus for monitoring. The host monitoring feature can automatically install different types of open source exporters and issue collection configurations for your ECS instances. Managed Prometheus agents can be used for automatic data collection. The collected data are stored and displayed and alert rules are managed in a centralized manner. The automatic service discovery mechanism does not apply to third-party hosts. If you use a third-party host, you must manually install an Alibaba Cloud data collection agent when you integrate the third-party host into Managed Service for Prometheus. This way, monitoring data can be sent to Managed Service for Prometheus for storage.

Benefit	Description
Host discovery in seconds	Adaptability: The automatic service discovery mechanism allows the monitoring system to quickly adapt to changes in cloud resources. This ensures that all running instances are monitored in a timely manner. Diversity: Multiple service discovery methods are supported to meet monitoring needs in different scenarios. For example, automatic discovery of services in Kubernetes clusters and the integration of other types of cloud services are supported.
Exporter installation in seconds	Ready to use: Exporters can be automatically installed. This way, the monitoring system can immediately identify newly started compute nodes and collect the metrics of the compute nodes without any manual operations. Comprehensive monitoring: Multiple exporters, including the Node Exporter, the Process Exporter, the GPU Exporter, and exporters for middleware, are provided for comprehensive performance tracking.
Metric collection in seconds	Simplified configurations: The automated configuration generation feature frees O&M engineers from manual configurations and ensures that the metrics of all nodes and services are accurately collected. Flexibility: You can modify the configurations based on your monitoring requirements to keep up with complex and changing monitoring environments. After you create an ECS instance, the ECS instance can be included in the monitoring system within 30 to 60 seconds. The interval at which all metrics of ECS instances are monitored can be flexibly adjusted from 1 to 60 seconds. This helps implement all-around monitoring of ECS instances within seconds.
Serverless agents	Centralized management: Managed Prometheus agents are used to manage data collection in a centralized manner. This simplifies the monitoring architecture and improves O&M efficiency. During data collection, your business is not affected. High performance: Prometheus agents free you from configuring complex monitoring algorithms. This reduces the possibility of invalid configurations and improves the accuracy and timeliness of data monitoring.
Smart metric tags	The tags, resource groups, and regions of ECS instances are automatically extracted and injected into the entire metric system. You can add custom tags, such as business tags, environment tags, and data source tags, to ECS instances. This improves the flexibility of tag management.
Ultra-large-scale data collection and storage	Multiple ECS instances can be integrated at a time, including dedicated and shared ECS instances. Hosted resources can be dynamically adjusted based on the number of ECS instances to be integrated. This ensures the timeliness and accuracy of the integration. Ultra-large-scale metric storage and high-performance query and display are supported
Comprehensive monitoring data from upstream and downstream applications	Systematic and comprehensive observability requires the integration of monitoring data from different dimensions and the implementation of end-to-end monitoring. This way, the monitoring system can reflect the health status and performance of the whole application and service ecosystem. The monitoring solution provided by Managed Service for Prometheus includes monitoring policies that cover underlying hardware, applications, and external services, such as remote direct memory access (RDMA) networks, Object Storage Service (OSS) buckets, and Redis. This monitoring solution applies to hosts, networks, and dependencies.
Process-level monitoring	Process-level monitoring allows you to track and analyze running processes on operating systems. This helps you understand the performance and resource utilization of the processes. Process-level monitoring plays a key role in implementing system-level monitoring and is designed to visualize the health status and performance of the applications that run on servers. During process-level monitoring, key performance metrics, such as the CPU utilization, memory usage, and disk reads and writes of a process, are captured. Other information, such as the startup time of a process, the number of open file handles, and the number of threads initiated in a process, is also collected. Near real-time monitoring capabilities are provided for immediate feedback. This allows system administrators to identify and resolve issues at the earliest opportunity. Process-level monitoring provides administrators with diverse fault diagnosis methods. It helps identify the processes that cause system performance degradation or faults, such as the processes in which memory leaks or resource contention issues occur and the processes that cause high CPU utilization or resource consumption.
Grafana dashboards	By default, Managed Service for Prometheus is integrated with Grafana dashboards, including the ECS Overview dashboard, ECS Detail dashboard, GPU Overview dashboard, GPU Detail dashboard, and Node Process dashboard. ECS instances can be integrated into Managed Service for Prometheus for monitoring within a few clicks, and the monitoring feature can be used out of the box.

Step 1: Integrate the ECS instances

Log on to the ARMS console. In the left-side navigation pane, click Integration Center.
In the left-side navigation pane of the Integration Center page, click Infrastructure and then click Host Monitor.
Note
- Managed Service for Prometheus relies on Resource Center to obtain the data of resources such as virtual private clouds (VPCs) and ECS instances within the current Alibaba Cloud account. If you have not activated Resource Center, you must activate it as prompted. For more information, see Activate Resource Center.
- The activation of Resource Center is an asynchronous operation. If the ARMS console still prompts that Resource Center is still not activated, wait for 10 to 20 seconds and then click Redetect again.

In the panel that appears, select a VPC and specify the parameters in the Configuration Information section. The following table describes the parameters.

Parameter	Description
NodeExporter Installation Mode	Automatic Installation(recommended): Managed Service for Prometheus automatically installs the Node Exporter for the ECS instance that you select. Self Installation: You must manually install the Node Exporter.
NodeExporter Installation Mode
Host Service Discovery Mode	Stain Label Selection: the blacklist mode. Specify the ECS instances that you do not want to integrate into Managed Service for Prometheus. The data of ECS instances that are not matched is collected. Unconditional: Collect metrics from all ECS instances in the VPC. Tag Label Selection: the whitelist mode. Specify the ECS instances that you want to integrate into Managed Service for Prometheus. The data of ECS instances that are not matched is not collected. IP CIDR: Specify a CIDR block. If the IP address of an ECS instance is matched, the data of the instance is collected. If you specify the CIDR block of the VPC, the data of all ECS instances in the current VPC is collected. Instance ID: Specify the IDs of ECS instances that you want to integrate. Separate multiple instance IDs with commas (,).




ECS Stain Label	Each taint consists of a key and a value. You can configure multiple taints.
Collect TextFile	Specify a text file from which metrics are collected.
Collect Process Status Metrics	By default, the process data of each ECS instance is collected.
Node-Exporter Service Port	Default value: 9100.
Metric Scrape interval (unit/second)	The interval at which data is collected. Unit: seconds. Default value: 15.
Security Groups Are Automatically Configured	By default, the switch is turned on.
Custom ECS Tag Injection	An ECS tag key for automatically injecting the corresponding key-value pair into Managed Service for Prometheus metrics.

Click OK. Wait for about 1 to 2 minutes before the ECS instance metrics are integrated.

Note

If no data is displayed in the dashboards after the integration is complete, make sure that the security groups of the ECS instances allow inbound access to the 9100 and 9256 ports from the 100.64.0.0/10 and 192.168.0.0/18 CIDR blocks. For more information, see Search for security groups. 9100 is the default port of each Node Exporter and 9256 is the default port of each Process Exporter. You can adjust the specific ports based on your needs.

Step 2: View the dashboards

In the left-side navigation pane, click Integration Management.
On the Integrated Environments tab of the Integration Management page, click ECS Instance.
On the ECS Instance tab, click the name of the environment instance.
In the Addon Type section of the Component Management tab, click Dashboards to view the built-in Grafana dashboard.

Step 3: Configure alerting

Log on to the Managed Service for Prometheus console. In the left-side navigation pane, click Integration Management.
On the Integrated Environments tab of the Integration Management page, click ECS Instance.
On the ECS Instance tab, click the name of the environment instance.
In the Addon Type section of the Component Management tab, click Alert Rule to view the built-in alert rules.

Note

The built-in alert rules generate alert events without sending alert notifications. If you want to send alert notifications by using emails or other channels, click Edit to configure the notification methods. On the Edit Prometheus Alert Rules page, you can specify custom alert thresholds, duration, and alert content. For more information, see Create an alert rule for a Prometheus instance.
If you set the Alert Notification parameter to Simple Mode, you can configure the contacts that receive alert notifications and the notification period, and specify whether alert notifications are repeatedly sent.