All Products
Search
Document Center

Platform For AI:Enable service monitoring and alerting

Last Updated:Dec 03, 2024

You can use the service monitoring and alerting feature to monitor the status of your services. If the threshold that is specified in an alert rule is exceeded, an alert notification is sent.

Background information

The following table describes the metrics that you can collect for services that are deployed in Elastic Algorithm Service (EAS).

Metric

Description

CPUConsumption

The number of CPU cores that are consumed by the service.

GPUUtilization

The ratio of the GPU usage to the total GPU capacity.

GPUMemory

The amount of GPU memory that is consumed by the service.

MemoryConsumption

The amount of memory that is consumed by the service. Unit: MB.

QueryPerSecondTotal

The total number of calls per second.

ResponsePerSecondWithStatusCode2xx

The number of responses with the status code 2xx per second.

2xxResponseRatio

The ratio of the responses with the status code 2xx to the total number of responses.

ResponsePerSecondWithStatusCode4xx

The number of responses with the status code 4xx per second.

4xxResponseRatio

The ratio of the responses with the status code 4xx to the total number of responses.

ResponsePerSecondWithStatusCode5xx

The number of responses with the status code 5xx per second.

5xxResponseRatio

The ratio of the responses with the status code 5xx to the total number of responses.

TP5ResponseTime

The maximum response time for the top 5% of all requests.

TP80ResponseTime

The maximum response time for the top 80% of all requests.

TP90ResponseTime

The maximum response time for the top 90% of all requests.

TP95ResponseTime

The maximum response time for the top 95% of all requests.

TP99ResponseTime

The maximum response time for the top 99% of all requests.

TP100ResponseTime

The maximum response time for all requests.

IngressTraffic

The amount of inbound data. Unit: KB/s.

EgressTraffic

The amount of outbound data. Unit: KB/s.

Step 1: Configure alert contacts

  1. Create an alert contact.

    1. Log on to the CloudMonitor console.

    2. In the left-side navigation pane, choose Alerts > Alert Contacts.

    3. On the Alert Contacts tab, click Create Alert Contact.

    4. In the Set Alert Contact panel, specify the alert contact name, email address, and DingTalk chatbot.

    5. Configure the Language of Alert Notifications parameter. Click and hold the slider in the lower part of the panel and then drag the slider to the right.

      The following valid values are supported:

      • Automatic: Notifications are sent in the language of the Alibaba Cloud site on which your account is registered. If your account is registered on the China site (aliyun.com), notifications are sent in Chinese. If your account is registered on the International site (alibabacloud.com) or Japan site (jp.alibabacloud.com), notifications are sent in English.

      • Chinese

      • English

    6. Click OK.

  2. Create an alert contact group

    1. In the left-side navigation pane, choose Alerts > Alert Contacts.

    2. Click the Alert Contact Group tab.

    3. On the Alert Contact Group tab, click Create Alert Contact Group.

    4. In the Create Alert Contact Group panel, enter a name for the alert contact group and add alert contacts to the alert contact group.

    5. Click Confirm.

Step 2: Configure alert rules

  1. In the left-side navigation pane of the CloudMonitor console, click Cloud Service Monitoring > Cloud Service Monitoring.

  2. On the Cloud Service Monitoring page, enter PAI-EAS inference service in the search box and click PAI-EAS inference service.

  3. On the PAI-EAS inference service page, select the region where the service resides and click Monitoring Charts in the Actions column.

    监控图表

  4. Click Create Alert Rule.

  5. In the Create Alert Rule panel, configure the following parameters and click Confirm. The following table describes the parameters.

    Parameter

    Description

    Product

    The name of the service that is managed by CloudMonitor. Set this parameter to PAI-EAS inference service.

    Resource Range

    Select the resources on which the alert rule takes effect. Valid values: All Resources, Application Groups, and Instances.

    • All Resources: An alert notification is sent when the monitoring data of an EAS service triggers alert rules.

    • Instances: An alert notification is sent only when the monitoring data of one or more selected services triggers alert rules.

    Rule Description

    This parameter specifies the content of the alert rule. If the metric value meets the specified alert condition, an alert is triggered. To configure this parameter, perform the following steps:

    1. Click Add Rule.

    2. In the Add Rule Description panel, specify the rule name, metric type, metric, threshold, alert level, and notification method.

    3. Click OK.

    Mute For

    The interval at which another alert notification is sent if the alert is not cleared.

    Effective Period

    The period of time during which the alert rule takes effect. CloudMonitor monitors the specified instances and generates alerts only within the specified period.

    Alert Contact Group

    The contact group to which alert notifications are sent. Select a contact group with which alert contacts are associated.

    Alert Callback

    The callback URL that can be accessed over the Internet. CloudMonitor pushes an alert notification to the specified callback URL by sending an HTTP POST request. Only the HTTP protocol is supported.

    Auto Scaling

    The auto scaling feature is unavailable if an alert rule configured for EAS services is triggered. You do not need to enable this feature.

    Log Service

    If you turn on Log Service, the alert information is written to the specified Logstore in Simple Log Service when an alert is triggered. In this case, you must set the Region, ProjectName, and Logstore parameters.

    For more information about how to create a project name and a Logstore, see Getting started.

    Simple Message Queue (formerly MNS)

    If enabled, the alert information is written to the specified topic in Simple Message Queue (formerly MNS) when an alert is triggered. In this case, you must configure the Region and topicName parameters. For more information about how to create a topic, see Create a topic.

    Method for handling alerts when no monitoring data is found

    The method that you want to use to handle alerts when no monitoring data is found. Valid values:

    • Do not do anything (default value)

    • Send alert notifications

    • Treated as normal