CloudMonitor provides the quota monitoring and alerting feature. Quota monitoring helps you obtain real-time insights into your resource quota usage based on a comprehensive set of metrics. Quota alerting allows you to configure alert rules for quotas and notifies you when metrics such as CPU utilization, exceed the specified threshold. This topic describes how to view metric data, configure quota alerts, and subscribe to a metric to receive relevant data by using CloudMonitor or Application Real-Time Monitoring Service (ARMS).
Prerequisites
A resource quota is created. For information about how to create a resource quota, see Overview.
Limits
Feature | Supported resources | Regions |
Quota | Lingjun resources |
|
General computing resources |
|
Metrics
Quota monitoring provides metrics on the performance of CPU, memory, disk, network, and GPU. The following table describes specific key metrics. For information about all supported metrics, visit the PAI-Quota TimeSeries Metrics page.
Metric | Description |
QUOTA_CPU_REQUEST | The number of scheduled CPU cores of the specified quota. |
QUOTA_CPU_TOTAL | The total number of CPU cores of the specified quota. |
QUOTA_CPU_UTIL | The CPU utilization of the specified quota. |
QUOTA_GPU_ACCELERATOR_DUTTY_UTIL | The GPU computing power usage of the specified quota. |
QUOTA_GPU_ACCELERATOR_MEMORY_UTIL | The GPU memory usage of the specified quota. |
QUOTA_GPU_ACCELERATOR_REQUEST | The number of scheduled GPUs of the specified quota. |
QUOTA_GPU_ACCELERATOR_TOTAL | The total number of GPUs of the specified quota. |
QUOTA_GPU_POWER_USAGE | The GPU power consumption of the specified quota. |
QUOTA_MEMORY_UTIL | The memory usage of the specified quota. |
Use CloudMonitor
CloudMonitor is a service that monitors Alibaba Cloud resources and Internet applications. CloudMonitor provides a one-stop, out-of-the-box, and enterprise-class monitoring solution. You can log on to the CloudMonitor console to view metic data about PAI-Quota and configure alerts. CloudMonitor also provides API operations that you can use to subscribe to metrics and create a custom monitoring dashboard. For more information, see What is CloudMonitor?.
Billing
CloudMonitor provides a specific amount of free quota. For more information, see Pay-as-you-go.
View metric data
Log on to the CloudMonitor console.
In the left-side navigation pane, choose
.On the Cloud Service Monitoring Dashboard page, select PAI-Quota from the drop-down list. Enter the resource quota name in the search box or select a resource quota name from the drop down list. The charts for the quota usage are displayed on the dashboard.
You can perform the following operations on the dashboard:
Switch dimensions: Filter metric data by using the quota and node dimensions.
Modify the time range of statistics:
Expand the chart: In the upper-right corner of the chart, click the icon to view the details.
Configure quota alerts
You can proactively monitor the quota usage and configure alert rules based on your business requirements. An alert notification is sent when a metric breaches the threshold specified in the alert rule. The following section describes how to configure quota alerts in the CloudMonitor console.
Step 1: Configure alert contacts
Log on to the CloudMonitor console.
In the left-side navigation pane, choose .
On the Alert Contacts tab, click Create Alert Contact.
In the Set Alert Contact panel, enter the name, email address, and webhook URL of the alert contact.
Click OK.
On the Alert Contact Group tab, click Create Alert Contact Group.
In the Create Alert Contact Group panel, enter a name for the alert contact group and add alert contacts to the alert contact group.
Click Confirm.
Step 2: Configure alert rules
In the left-side navigation pane of the CloudMonitor console, choose .
On the Cloud Service Monitoring page, search for PAI-Quota.
Go to the PAI-Quota page, select the region where the service is deployed, and then click Create Alert Rule.
In the Create Alert Rule panel, configure the parameters and click Confirm. The following table describes the parameters.
Parameter
Description
Product
The service that you want to monitor by using CloudMonitor. In this example, select PAI-Quota from the drop-down list.
Resource Range
The resources to which you want to apply the alert rule. Valid values:
All Resources: An alert notification is sent when a resource quota meets the condition specified by the alert rule.
Instances: Click Add Instance and add the resource quotas that you want to monitor. An alert notification is sent only if the selected resource quotas meet the condition specified by the alert rule.
Rule Description
The condition that triggers the alert. For more information about how to configure this parameter, see Create an alert rule.
Mute For
The interval at which another alert notification is sent when the alert is not cleared.
Effective Period
The period of time during which the alert rule takes effect. CloudMonitor monitors the specified resource quotas and generates alerts only within the specified effective period.
Alert Contact Group
The contact group to which alert notifications are sent. Select a contact group that has alert contacts.
Tag
The tag of the custom alert rule. A tag consists of a name and a value.
On the PAI-Quota page, click View Alert Rules to view the details of the rules that you created. Click Alert History in the Actions column to view the alert history. You can also modify alert rules.
You can call API operations to configure and manage quota alerts, such as viewing the alert history, managing alert templates, creating alert rules, and adding alert contacts. For information about how to call CloudMonitor API operations to configure and manage quota alerts, see Alert service.
Subscribe to a metric
CloudMonitor provides a comprehensive set of API operations that you can use to subscribe to metrics and create a custom resource monitoring dashboard. For more information, see List of operations by function.
API operation | Description |
Queries the latest monitoring data of a metric. | |
Queries the monitoring data of a metric for a cloud service. | |
Queries the monitoring data of a metric for a cloud service. | |
Queries the details of metrics that are supported in CloudMonitor. | |
Queries the information about monitored services in CloudMonitor. | |
Queries the latest monitoring data of a metric for a cloud service. The data can be sorted in a specified order. |
In this example, the DescribeMetricList operation is used to show how to query the data of a specific metric of Deep Learning Containers (DLC) of PAI.
Go to the PAI-Quota TimeSeries Metrics page.
Find the metric to which you want to subscribe and click Obtain Metric Data in the Actions column.
In OpenAPI Portal, configure the key parameters. Use the default values for other parameters. The following table describes the key parameters. For information about all parameters, see DescribeMetricList.
Parameter
Description
Namespace
The namespace of the cloud service. Example: acs_pai_quota.
MetricName
The name of the metric that you want to monitor. Example: QUOTA_CPU_REQUEST.
StartTime
The start of the time range for the query. Example: 2024-05-15 00:00:00.
EndTime
The end of the time range for the query. Example: 2024-05-28 00:00:00.
NoteThe time range must be less than or equal to 31 days.
After you configure the parameters, click Initiate Call to view the metric data in the specified time range.
Use ARMS
Application Real-Time Monitoring Service (ARMS) is a cloud-native observability platform. Based on the capabilities of ARMS, you can build custom Grafana dashboards for PAI-Quota or configure flexible Prometheus alerts. For more information, see What is ARMS?.
Billing
For billing information, see Billing overview.
Integrate monitoring data
Perform the following steps:
Log on to the ARMS console.
In the left-side navigation pane, click Integration Center.
In the left-side navigation panel of the Integration Center page, click AI. Then, click the Aliyun PAI-Quota tab in the AI section.
Optional. In the Aliyun PAI-Quota configuration panel, you can preview the monitoring dashboard and view the collection metrics and all alert rule templates.
Preview
Click the Preview tab to preview the monitoring dashboard.
Collect Metrics
Click the Collect Metrics tab to view the collection metrics.
Alert Rule Template
Click the Alert Rule Template tab to view the alert rule template.
Click the Start Integration tab to start the integration of monitoring data. Then, configure the relevant parameters and click OK. The following table describes the parameters.
Parameter
Description
Select a Region
Select a region in which you want to store data.
Name
Follow the on-screen instructions in the CloudMonitor console to configure the access name.
The integration process requires approximately 1 to 2 minutes.
In the left-side navigation panel, click Integration Management to view information about the integrated environments.
View the Grafana dashboards
Log on to the ARMS console.
In the left-side navigation pane, click Integration Management.
On the Integrated Environments tab of the Integration Management page, click Cloud Service Region.
On the Cloud Service Region tab, click the name of the environment instance that you want to manage.
In the Component Management page, find the Addon Type section, and click Dashboards to view the built-in dashboards.
Click the dashboard name to view quota information.
Configure Prometheus alerts
Perform the following steps to configure Prometheus alerts:
Log on to the ARMS console.
In the left-side navigation pane, click Integration Management.
On the Integrated Environments tab of the Integration Management page, click Cloud Service Region.
On the Cloud Service Region tab, click the name of the environment instance that you want to manage.
On the Component Management page, click Alert Rule in the Addon Type section to view the built-in alert rules.
The built-in alert rules generate alert events but do not send alert notifications. You can use one of the following method to send alert notifications by using emails or other platforms:
Create a notification policy and specify matching rules for alert events. If a matching rule is triggered, alert notifications are sent to the contacts by using the specified notification methods. For more information, see Create and manage a notification policy.
Click Edit in the Actions column and configure a notification method.
On the edit Prometheus alert rule page, you can specify the alert condition, duration, alert message and alert notification. For more information, see Create an alert rule for a Prometheus instance.