All Products
Search
Document Center

Platform For AI:Service monitoring

Last Updated:May 11, 2024

After you deploy a service by using Elastic Algorithm Service (EAS) of Platform for AI (PAI), you can view the service-related metrics on the Service Monitoring tab to learn about the calls to the service and the running status of the service. This topic describes the service monitoring metrics and how to view service monitoring information.

Prerequisites

A model service is deployed. For more information, see Model service deployment by using the PAI console.

View the service monitoring information

  1. Go to the Elastic Algorithm Service (EAS) page.

    1. Log on to the PAI console.

    2. In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace that you want to manage.

    3. In the left-side navigation pane, choose Model Deployment > Elastic Algorithm Service (EAS). The Elastic Algorithm Service (EAS) page appears.

  2. Find the service that you want to manage and click the image.png icon in the Service Monitoring column to go to the Service Monitoring tab.

  3. View the service monitoring information.

    Switch between dashboards.

      After you deploy a service, the following dashboards are created:

      • Service name: minute-level dashboard that contains most common metrics that are accurate to the minute. By default, this dashboard is displayed.

      • Service name (fine): second-level dashboard.

      • Service name (per): minute-level per-instance dashboard.

      Note

      Service name is the name of the service that you want to manage.

      Click the 按钮 icon to the right of the service name to switch between the dashboards to view related metrics. For information about the metrics, see the Metrics section of this topic. 22800e34cbc151919b64fb72b94db403.png

    Switch between time ranges.

    Click 按钮 in the upper-right corner of the Monitoring Information section to switch between time ranges. image.png

    Important

    Minute-level metrics can be retained for up to one month, and second-level metrics can be retained for up to one hour.

Metrics

Minute-level dashboard

You can view the following metrics in this dashboard:

QPS

Queries per second (QPS) indicates the number of requests sent to the service per second. If the service contains multiple instances, this metric indicates the number of requests sent to all instances of the service. The number of requests is calculated separately by response code. image

Response

Response indicates the number of requests sent to the service within a specific time range. The number of requests is calculated separately by response code. If the service contains multiple instances, this metric indicates the number of requests sent to all instances of the service. image

CPU

CPU indicates the average number of CPU cores used by the service at a specific point in time. Unit: cores. If the service contains multiple instances, this metric indicates the average number of CPU cores used by all instances of the service. CPU

CPU Utilization

CPU Utilization indicates the average CPU utilization of the service at a specific point in time. Calculation formula: CPU Utilization = Average number of used CPU cores/Maximum number of CPU cores. If the service contains multiple instances, this metric indicates the average CPU utilization of all instances of the service. CPU Utilization

Memory Utilization

Memory Utilization indicates the average memory usage of the service at a specific point in time. Calculation formula: Memory Utilization = rss/total. If the service contains multiple instances, this metric indicates the average memory usage of all instances of the service. Memory Utilization

GPU

If the service uses GPU resources, this metric indicates the average GPU utilization of the service at a specific point in time. If the service contains multiple instances, this metric indicates the average GPU utilization of all instances of the service. GPU

GPU Memory

If the service uses GPU resources, this metric indicates the amount of GPU memory used by the service at a specific point in time. If the service contains multiple instances, this metric indicates the average amount of GPU memory used by all instances of the service. GPU Memory

Replicas

Replicas indicates the number of instances contained in the service at a specific point in time. Replicas

CPU Total

CPU Total indicates the total number of CPU cores available for the service at a specific point in time. Calculation formula: CPU Total = Number of CPU cores available for a single instance × Number of instances. CPU Total

Daily Invoke

Daily Invoke indicates the number of calls to the service. The number of calls is calculated separately by response code. Daily Invoke

RT

RT indicates the response time of requests. image

Sub-metrics:

  • avg: the average response time of all requests sent at a specific point in time.

  • tpXX: the maximum response time of the top XX% of requests sent at a specific point in time.

    Examples: tp5 (the maximum response time of the top 5% of requests) and tp100 (the maximum response time of all requests).

    If the service contains multiple instances, tp100 indicates the maximum response time of requests sent to all instances, and tp5 indicates the average maximum response time of the top 5% of requests sent to all instances.

Memory

Memory indicates the average amount of memory used by the service at a specific point in time. If the service contains multiple instances, this metric indicates the average amount of memory used by all instances of the service. Memory Sub-metrics:

  • rss: the amount of resident physical memory.

  • cache: the cache size.

  • total: the maximum amount of physical memory available for a single instance.

Traffic

Traffic indicates the amount of data received and sent by the service per second. Unit: bit/s. If the service contains multiple instances, this metric indicates the average amount of traffic received and sent by all instances of the service. image

Sub-metrics:

  • in: the amount of data received by the service.

  • out: the amount of data sent by the service.

TCP Connections

TCP Connections indicates the number of TCP connections.

image.png

Second-level dashboard

You can view the following metrics in this dashboard:

Instance QPS Fine

Instance QPS Fine indicates the number of requests received by each instance of the service per second. The number of requests is calculated separately by response code.

Important

Data is accurate to 5 seconds. Only data of the most recent hour is retained.

Instance QPS Fine The data of each instance can be identified based on the value of the ip:port field.

Instance RT Fine

Instance RT Fine indicates the average response time of requests received by each instance of the service.

Important

Data is accurate to 5 seconds. Only data of the most recent hour is retained.

Instance RT Fine The data of each instance can be identified based on the value of the ip:port field.

Minute-level per-instance dashboard

You can view the following metrics in this dashboard:

Instance QPS

Instance QPS indicates the number of requests received by each instance of the service per second. The number of requests is calculated separately by response code. Instance QPS The data of each instance can be identified based on the value of the ip:port field.

Instance RT

Instance RT indicates the average response time of each instance of the service. Instance RT The data of each instance can be identified based on the value of the ip:port field.

Instance CPU

Instance CPU indicates the number of CPU cores used by each instance of the service. Unit: cores. Instance CPU The data of each instance can be identified based the value of the ip:port field.

Instance Memory

Instance Memory indicates the amount of memory used by each instance of the service. Instance Memory The data of each instance can be identified based on the value of the ip:port field.

Instance GPU

Instance GPU indicates the GPU utilization of each instance of the service. Instance GPU

Instance GPU Memory

Instance GPU Memory indicates the amount of GPU memory used by each instance of the service. image

Instance TCP Connections

Instance TCP Connections indicates the number of TCP connections on a single instance. image.png

Reference

  • You can use the service monitoring and alerting feature to monitor the status of your services. If the threshold that is specified in an alert rule is exceeded, an alert notification is sent. For more information, see Enable service monitoring and alerting.

  • You can view ServiceInstance events, perform O&M or audits on the events, and configure alert rules for the events in the CloudMonitor console or by calling API operations. For more information, see View ServiceInstance events in CloudMonitor.

  • You can configure a custom monitoring metric based on your business requirements and configure service auto scaling based on the custom metrics. For more information, see Configure a custom monitoring and scaling metric.