After you deploy a service by using Elastic Algorithm Service (EAS) of Platform for AI (PAI), you can view the service-related metrics on the Service Monitoring tab to learn about the calls to the service and the running status of the service. This topic describes the service monitoring metrics and how to view service monitoring information.
Prerequisites
A model service is deployed. For more information, see Model service deployment by using the PAI console.
View the service monitoring information
Go to the Elastic Algorithm Service (EAS) page.
Log on to the PAI console.
In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace that you want to manage.
In the left-side navigation pane, choose . The Elastic Algorithm Service (EAS) page appears.
Find the service that you want to manage and click the icon in the Service Monitoring column to go to the Service Monitoring tab.
View the service monitoring information.
Switch between dashboards.
Service name: minute-level dashboard that contains most common metrics that are accurate to the minute. By default, this dashboard is displayed.
Service name (fine): second-level dashboard.
Service name (per): minute-level per-instance dashboard.
After you deploy a service, the following dashboards are created:
NoteService name is the name of the service that you want to manage.
Click the icon to the right of the service name to switch between the dashboards to view related metrics. For information about the metrics, see the Metrics section of this topic.
Switch between time ranges.
Click in the upper-right corner of the Monitoring Information section to switch between time ranges.
ImportantMinute-level metrics can be retained for up to one month, and second-level metrics can be retained for up to one hour.
Metrics
Minute-level dashboard
You can view the following metrics in this dashboard:
QPS Queries per second (QPS) indicates the number of requests sent to the service per second. If the service contains multiple instances, this metric indicates the number of requests sent to all instances of the service. The number of requests is calculated separately by response code. | Response Response indicates the number of requests sent to the service within a specific time range. The number of requests is calculated separately by response code. If the service contains multiple instances, this metric indicates the number of requests sent to all instances of the service. |
CPU CPU indicates the average number of CPU cores used by the service at a specific point in time. Unit: cores. If the service contains multiple instances, this metric indicates the average number of CPU cores used by all instances of the service. | CPU Utilization CPU Utilization indicates the average CPU utilization of the service at a specific point in time. Calculation formula: CPU Utilization = Average number of used CPU cores/Maximum number of CPU cores. If the service contains multiple instances, this metric indicates the average CPU utilization of all instances of the service. |
Memory Utilization Memory Utilization indicates the average memory usage of the service at a specific point in time. Calculation formula: Memory Utilization = rss/total. If the service contains multiple instances, this metric indicates the average memory usage of all instances of the service. | GPU If the service uses GPU resources, this metric indicates the average GPU utilization of the service at a specific point in time. If the service contains multiple instances, this metric indicates the average GPU utilization of all instances of the service. |
GPU Memory If the service uses GPU resources, this metric indicates the amount of GPU memory used by the service at a specific point in time. If the service contains multiple instances, this metric indicates the average amount of GPU memory used by all instances of the service. | Replicas Replicas indicates the number of instances contained in the service at a specific point in time. |
CPU Total CPU Total indicates the total number of CPU cores available for the service at a specific point in time. Calculation formula: CPU Total = Number of CPU cores available for a single instance × Number of instances. | Daily Invoke Daily Invoke indicates the number of calls to the service. The number of calls is calculated separately by response code. |
RT RT indicates the response time of requests. Sub-metrics:
| Memory Memory indicates the average amount of memory used by the service at a specific point in time. If the service contains multiple instances, this metric indicates the average amount of memory used by all instances of the service. Sub-metrics:
|
Traffic Traffic indicates the amount of data received and sent by the service per second. Unit: bit/s. If the service contains multiple instances, this metric indicates the average amount of traffic received and sent by all instances of the service. Sub-metrics:
| TCP Connections TCP Connections indicates the number of TCP connections. |
Second-level dashboard
You can view the following metrics in this dashboard:
Instance QPS Fine Instance QPS Fine indicates the number of requests received by each instance of the service per second. The number of requests is calculated separately by response code. Important Data is accurate to 5 seconds. Only data of the most recent hour is retained. The data of each instance can be identified based on the value of the ip:port field. | Instance RT Fine Instance RT Fine indicates the average response time of requests received by each instance of the service. Important Data is accurate to 5 seconds. Only data of the most recent hour is retained. The data of each instance can be identified based on the value of the ip:port field. |
Minute-level per-instance dashboard
You can view the following metrics in this dashboard:
Instance QPS Instance QPS indicates the number of requests received by each instance of the service per second. The number of requests is calculated separately by response code. The data of each instance can be identified based on the value of the ip:port field. | Instance RT Instance RT indicates the average response time of each instance of the service. The data of each instance can be identified based on the value of the ip:port field. |
Instance CPU Instance CPU indicates the number of CPU cores used by each instance of the service. Unit: cores. The data of each instance can be identified based the value of the ip:port field. | Instance Memory Instance Memory indicates the amount of memory used by each instance of the service. The data of each instance can be identified based on the value of the ip:port field. |
Instance GPU Instance GPU indicates the GPU utilization of each instance of the service. | Instance GPU Memory Instance GPU Memory indicates the amount of GPU memory used by each instance of the service. |
Instance TCP Connections Instance TCP Connections indicates the number of TCP connections on a single instance. |
Reference
You can use the service monitoring and alerting feature to monitor the status of your services. If the threshold that is specified in an alert rule is exceeded, an alert notification is sent. For more information, see Enable service monitoring and alerting.
You can view ServiceInstance events, perform O&M or audits on the events, and configure alert rules for the events in the CloudMonitor console or by calling API operations. For more information, see View ServiceInstance events in CloudMonitor.
You can configure a custom monitoring metric based on your business requirements and configure service auto scaling based on the custom metrics. For more information, see Configure a custom monitoring and scaling metric.