You can use the service monitoring and alerting feature to monitor the status of your services. If the threshold that is specified in an alert rule is exceeded, an alert notification is sent.
Background information
The following table describes the metrics that you can collect for services that are deployed in Elastic Algorithm Service (EAS).
Metric | Description |
CPUConsumption | The number of CPU cores that are consumed by the service. |
GPUUtilization | The ratio of the GPU usage to the total GPU capacity. |
GPUMemory | The amount of GPU memory that is consumed by the service. |
MemoryConsumption | The amount of memory that is consumed by the service. Unit: MB. |
QueryPerSecondTotal | The total number of calls per second. |
ResponsePerSecondWithStatusCode2xx | The number of responses with the status code 2xx per second. |
2xxResponseRatio | The ratio of the responses with the status code 2xx to the total number of responses. |
ResponsePerSecondWithStatusCode4xx | The number of responses with the status code 4xx per second. |
4xxResponseRatio | The ratio of the responses with the status code 4xx to the total number of responses. |
ResponsePerSecondWithStatusCode5xx | The number of responses with the status code 5xx per second. |
5xxResponseRatio | The ratio of the responses with the status code 5xx to the total number of responses. |
TP5ResponseTime | The maximum response time for the top 5% of all requests. |
TP80ResponseTime | The maximum response time for the top 80% of all requests. |
TP90ResponseTime | The maximum response time for the top 90% of all requests. |
TP95ResponseTime | The maximum response time for the top 95% of all requests. |
TP99ResponseTime | The maximum response time for the top 99% of all requests. |
TP100ResponseTime | The maximum response time for all requests. |
IngressTraffic | The amount of inbound data. Unit: KB/s. |
EgressTraffic | The amount of outbound data. Unit: KB/s. |
Step 1: Configure alert contacts
Create an alert contact.
Log on to the CloudMonitor console.
In the left-side navigation pane, choose .
On the Alert Contacts tab, click Create Alert Contact.
In the Set Alert Contact panel, specify the alert contact name, email address, and DingTalk chatbot.
Configure the Language of Alert Notifications parameter. Click and hold the slider in the lower part of the panel and then drag the slider to the right.
The following valid values are supported:
Automatic: Notifications are sent in the language of the Alibaba Cloud site on which your account is registered. If your account is registered on the China site (aliyun.com), notifications are sent in Chinese. If your account is registered on the International site (alibabacloud.com) or Japan site (jp.alibabacloud.com), notifications are sent in English.
Chinese
English
Click OK.
Create an alert contact group
In the left-side navigation pane, choose .
Click the Alert Contact Group tab.
On the Alert Contact Group tab, click Create Alert Contact Group.
In the Create Alert Contact Group panel, enter a name for the alert contact group and add alert contacts to the alert contact group.
Click Confirm.
Step 2: Configure alert rules
In the left-side navigation pane of the CloudMonitor console, click .
On the Cloud Service Monitoring page, enter PAI-EAS inference service in the search box and click PAI-EAS inference service.
On the PAI-EAS inference service page, select the region where the service resides and click Monitoring Charts in the Actions column.
Click Create Alert Rule.
In the Create Alert Rule panel, configure the following parameters and click Confirm. The following table describes the parameters.
Parameter
Description
Product
The name of the service that is managed by CloudMonitor. Set this parameter to PAI-EAS inference service.
Resource Range
Select the resources on which the alert rule takes effect. Valid values: All Resources, Application Groups, and Instances.
All Resources: An alert notification is sent when the monitoring data of an EAS service triggers alert rules.
Instances: An alert notification is sent only when the monitoring data of one or more selected services triggers alert rules.
Rule Description
This parameter specifies the content of the alert rule. If the metric value meets the specified alert condition, an alert is triggered. To configure this parameter, perform the following steps:
Click Add Rule.
In the Add Rule Description panel, specify the rule name, metric type, metric, threshold, alert level, and notification method.
Click OK.
Mute For
The interval at which another alert notification is sent if the alert is not cleared.
Effective Period
The period of time during which the alert rule takes effect. CloudMonitor monitors the specified instances and generates alerts only within the specified period.
Alert Contact Group
The contact group to which alert notifications are sent. Select a contact group with which alert contacts are associated.
Alert Callback
The callback URL that can be accessed over the Internet. CloudMonitor pushes an alert notification to the specified callback URL by sending an HTTP POST request. Only the HTTP protocol is supported.
Auto Scaling
The auto scaling feature is unavailable if an alert rule configured for EAS services is triggered. You do not need to enable this feature.
Log Service
If you turn on Log Service, the alert information is written to the specified Logstore in Simple Log Service when an alert is triggered. In this case, you must set the Region, ProjectName, and Logstore parameters.
For more information about how to create a project name and a Logstore, see Getting started.
Simple Message Queue (formerly MNS)
If enabled, the alert information is written to the specified topic in Simple Message Queue (formerly MNS) when an alert is triggered. In this case, you must configure the Region and topicName parameters. For more information about how to create a topic, see Create a topic.
Method for handling alerts when no monitoring data is found
The method that you want to use to handle alerts when no monitoring data is found. Valid values:
Do not do anything (default value)
Send alert notifications
Treated as normal