EAS service monitoring metrics - Platform For AI - Alibaba Cloud Documentation Center

After you deploy a service by using Elastic Algorithm Service (EAS) of Platform for AI (PAI), you can view the service-related metrics on the Service Monitoring tab to learn about the calls to the service and the running status of the service. This topic describes the service monitoring metrics and how to view service monitoring information.

Prerequisites

A model service is deployed. For more information, see Model service deployment by using the PAI console.

View the service monitoring information

Go to the Elastic Algorithm Service (EAS) page.
1. Log on to the PAI console.
2. In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace that you want to manage.
3. In the left-side navigation pane, choose Model Deployment > Elastic Algorithm Service (EAS). The Elastic Algorithm Service (EAS) page appears.
Find the service that you want to manage and click the icon in the Service Monitoring column to go to the Service Monitoring tab.
View the service monitoring information.
Switch between dashboards.
Switch between time ranges.
Click in the upper-right corner of the Monitoring Information section to switch between time ranges.
Important
Minute-level metrics can be retained for up to one month, and second-level metrics can be retained for up to one hour.

Metrics

Minute-level dashboard

You can view the following metrics in this dashboard:

QPS Queries per second (QPS) indicates the number of requests sent to the service per second. If the service contains multiple instances, this metric indicates the number of requests sent to all instances of the service. The number of requests is calculated separately by response code.	Response Response indicates the number of requests sent to the service within a specific time range. The number of requests is calculated separately by response code. If the service contains multiple instances, this metric indicates the number of requests sent to all instances of the service.
CPU CPU indicates the average number of CPU cores used by the service at a specific point in time. Unit: cores. If the service contains multiple instances, this metric indicates the average number of CPU cores used by all instances of the service.	CPU Utilization CPU Utilization indicates the average CPU utilization of the service at a specific point in time. Calculation formula: CPU Utilization = Average number of used CPU cores/Maximum number of CPU cores. If the service contains multiple instances, this metric indicates the average CPU utilization of all instances of the service.
Memory Utilization Memory Utilization indicates the average memory usage of the service at a specific point in time. Calculation formula: Memory Utilization = rss/total. If the service contains multiple instances, this metric indicates the average memory usage of all instances of the service.	GPU If the service uses GPU resources, this metric indicates the average GPU utilization of the service at a specific point in time. If the service contains multiple instances, this metric indicates the average GPU utilization of all instances of the service.
GPU Memory If the service uses GPU resources, this metric indicates the amount of GPU memory used by the service at a specific point in time. If the service contains multiple instances, this metric indicates the average amount of GPU memory used by all instances of the service.	Replicas Replicas indicates the number of instances contained in the service at a specific point in time.
CPU Total CPU Total indicates the total number of CPU cores available for the service at a specific point in time. Calculation formula: CPU Total = Number of CPU cores available for a single instance × Number of instances.	Daily Invoke Daily Invoke indicates the number of calls to the service. The number of calls is calculated separately by response code.
RT RT indicates the response time of requests. Sub-metrics: avg: the average response time of all requests sent at a specific point in time. tpXX: the maximum response time of the top XX% of requests sent at a specific point in time. Examples: tp5 (the maximum response time of the top 5% of requests) and tp100 (the maximum response time of all requests). If the service contains multiple instances, tp100 indicates the maximum response time of requests sent to all instances, and tp5 indicates the average maximum response time of the top 5% of requests sent to all instances.	Memory Memory indicates the average amount of memory used by the service at a specific point in time. If the service contains multiple instances, this metric indicates the average amount of memory used by all instances of the service. Sub-metrics: rss: the amount of resident physical memory. cache: the cache size. total: the maximum amount of physical memory available for a single instance.
Traffic Traffic indicates the amount of data received and sent by the service per second. Unit: bit/s. If the service contains multiple instances, this metric indicates the average amount of traffic received and sent by all instances of the service. Sub-metrics: in: the amount of data received by the service. out: the amount of data sent by the service.	TCP Connections TCP Connections indicates the number of TCP connections.

Second-level dashboard

You can view the following metrics in this dashboard:

Instance QPS Fine

Instance QPS Fine indicates the number of requests received by each instance of the service per second. The number of requests is calculated separately by response code.

Important

Data is accurate to 5 seconds. Only data of the most recent hour is retained.

Instance QPS Fine The data of each instance can be identified based on the value of the ip:port field.

Instance RT Fine

Instance RT Fine indicates the average response time of requests received by each instance of the service.

Important

Data is accurate to 5 seconds. Only data of the most recent hour is retained.

Instance RT Fine The data of each instance can be identified based on the value of the ip:port field.

Minute-level per-instance dashboard

You can view the following metrics in this dashboard:

Instance QPS Instance QPS indicates the number of requests received by each instance of the service per second. The number of requests is calculated separately by response code. The data of each instance can be identified based on the value of the ip:port field.	Instance RT Instance RT indicates the average response time of each instance of the service. The data of each instance can be identified based on the value of the ip:port field.
Instance CPU Instance CPU indicates the number of CPU cores used by each instance of the service. Unit: cores. The data of each instance can be identified based the value of the ip:port field.	Instance Memory Instance Memory indicates the amount of memory used by each instance of the service. The data of each instance can be identified based on the value of the ip:port field.
Instance GPU Instance GPU indicates the GPU utilization of each instance of the service.	Instance GPU Memory Instance GPU Memory indicates the amount of GPU memory used by each instance of the service.
Instance TCP Connections Instance TCP Connections indicates the number of TCP connections on a single instance.

Reference

You can use the service monitoring and alerting feature to monitor the status of your services. If the threshold that is specified in an alert rule is exceeded, an alert notification is sent. For more information, see Enable service monitoring and alerting.
You can view ServiceInstance events, perform O&M or audits on the events, and configure alert rules for the events in the CloudMonitor console or by calling API operations. For more information, see View ServiceInstance events in CloudMonitor.
You can configure a custom monitoring metric based on your business requirements and configure service auto scaling based on the custom metrics. For more information, see Configure a custom monitoring and scaling metric.

Prerequisites

View the service monitoring information

Switch between dashboards.

Switch between time ranges.

Metrics

Minute-level dashboard

Second-level dashboard

Minute-level per-instance dashboard

Reference