All Products
Search
Document Center

Application Real-Time Monitoring Service:Application monitoring metrics

Last Updated:Dec 13, 2024

This topic describes the metrics that are used in the Application Monitoring sub-service of Application Real-Time Monitoring (ARMS). You can use these metrics to create custom Grafana dashboards.

Note

Applications that are connected through Managed Service for OpenTelemetry only support displaying and using business metrics. Other metrics such as JVM and system metrics are not supported.

Business metrics

Common dimensions

Dimension name

Dimension key

Service name

service

Service PID

pid

Server IP address

serverIp

Interface

rpc

Metrics

The following table describes the metrics that available for all access types. When you perform a query operation, you only need to replace $callType with a specific access type. For more information about access types, see the Service access types and available dimensions section.

For example, to query the number of HTTP requests, you only need to change arms_$callType_requests_count to arms_http_requests_count.

Metric description

Metric

Measurement

Collection interval (seconds)

Unit

Dimension

Number of requests

arms_$callType_requests_count

Gauge

15

None

Different service access types have different dimensions. For more information, see Service access types and available dimensions.

Number of failed requests

arms_$callType_requests_error_count

Gauge

15

None

Request duration

arms_$callType_requests_seconds

Gauge

15

Seconds

Number of slow requests

arms_$callType_requests_slow_count

Gauge

15

None

Quantile of request duration

arms_$callType_requests_latency_seconds

Summary

15

Seconds

This metric is used only when the service access type is HTTP and quantile statistics is enabled. For more information, see Advanced settings.

Quantile values:

  • 0.5

  • 0.75

  • 0.90

  • 0.99

Note

All preceding metrics other than the arms_$callType_requests_latency_secondsmetric are of the Gauge type. For one of these Gauge metrics provided by ARMS, the value at each point represents the cumulative total within the collection interval, which is different from the Gauge metrics generated by open source application frameworks. For example, to calculate the average queries per second (QPS) in a minute, the Prometheus Query Language (PromQL) statement for an ARMS Gauge metric is sum_over_time(arms_$callType_requests_count[1m])/60, and that for a metric in an open source application framework is rate(http_server_requests_count[1m]).

Aggregation metrics

  • Different business metrics are provided for various call types. However, an application related to multiple call types, such as HTTP and Dubbo, leads to lengthy PromQL statements.

  • Business metrics are used to monitor all dimensions of an application, which is not necessary in some scenarios and may increase the performance of metric query.

To solve the preceding problems, ARMS provides aggregation metrics.

Categories

Aggregation metrics are classified into the following categories:

  • General-purpose

    Monitors the number of requests, number of errors, number of slow requests, and average request duration of all calls types.

  • Database

    Monitors the number of requests, number of errors, number of slow requests, and average request duration for database calls.

  • SQL

    Monitors the number of requests, number of errors, number of slow requests, and average request duration for database calls, including SQL database calls.

  • Exception

    Monitors the number of requests and the average request duration for exceptions of all call types.

  • Status code

    Monitors the number of requests with different HTTP status codes.

  • Quantile

    Monitors the quantiles of request duration for all call types.

Aggregation metrics other than quantile-specific metrics are further classified into basic aggregation metrics named in the xxx_raw format, and dimensionality reduction metrics named in the xxx_ign_x_y format. In the name of a dimensionality reduction metric, x and y are aggregated dimensions.

Measurement and data collection interval

Unless intentionally specified, all aggregation metrics use Gauge, and data is collected every 15 seconds.

Common dimensions

The following dimensions are applicable to all aggregation metrics.

Dimension

Description

pid

The application PID.

service

The application name.

serverIp

The IP address of the server.

source

The source of the metric. The following sources are supported:

  • apm

    Indicates that the application is monitored through an ARMS agent.

  • xtrace

    Indicates that the application is monitored in Managed Service for OpenTelemetry.

  • ebpf

    Indicates that the application is monitored through an Application Monitoring eBPF agent.

Metrics

Category

Metric description

Metric

Unit

Dimension

General-purpose

Number of requests

arms_app_requests_count_raw

None

  • callType: the call type.

  • rpc: the interface name.

  • prpc: the upstream interface name.

  • ppid: the upstream application PID.

  • parent: the upstream application name.

  • endpoint: If the value of callType is http_client, a peer interface endpoint is displayed. Otherwise, a remote address is displayed.

  • destId: If the value of callType is a database, a database name is displayed. Otherwise, a remote address is displayed.

arms_app_requests_count_ign_destid_endpoint_rpc

None

destId, endpoint, and rpc are excluded.

arms_app_requests_count_ign_destid_endpoint_ppid_prpc

None

destId, endpoint, ppid, and prpc are excluded.

arms_app_requests_count_ign_destid_endpoint_ppid_prpc_rpc

None

destId, endpoint, ppid, prpc, and rpc are excluded.

arms_app_requests_count_ign_parent_ppid_prpc_rpc

None

parent, ppid, prpc, and rpc are excluded.

arms_app_requests_count_ign_endpoint_parent_ppid_prpc_rpc

None

endpoint, parent, ppid, prpc, and rpc are excluded.

Number of request errors

arms_app_requests_error_count_raw

None

  • callType: the call type.

  • rpc: the interface name.

  • prpc: the upstream interface name.

  • ppid: the upstream application PID.

  • parent: the upstream application name.

  • endpoint: If the value of callType is http_client, a peer interface endpoint is displayed. Otherwise, a remote address is displayed.

  • destId: If the value of callType is a database, a database name is displayed. Otherwise, a remote address is displayed.

arms_app_requests_error_count_ign_destid_endpoint_rpc

None

destId, endpoint, and rpc are excluded.

arms_app_requests_error_count_ign_destid_endpoint_ppid_prpc

None

destId, endpoint, ppid, and prpc are excluded.

arms_app_requests_error_count_ign_destid_endpoint_ppid_prpc_rpc

None

destId, endpoint, ppid, prpc, and rpc are excluded.

arms_app_requests_error_count_ign_parent_ppid_prpc_rpc

None

parent, ppid, prpc, and rpc are excluded.

arms_app_requests_error_count_ign_endpoint_parent_ppid_prpc_rpc

None

endpoint, parent, ppid, prpc, and rpc are excluded.

Number of slow requests

arms_app_requests_slow_count_raw

None

  • callType: the call type.

  • rpc: the interface name.

  • prpc: the upstream interface name.

  • ppid: the upstream application PID.

  • parent: the upstream application name.

  • endpoint: If the value of callType is http_client, a peer interface endpoint is displayed. Otherwise, a remote address is displayed.

  • destId: If the value of callType is a database, a database name is displayed. Otherwise, a remote address is displayed.

arms_app_requests_slow_count_ign_destid_endpoint_rpc

None

destId, endpoint, and rpc are excluded.

arms_app_requests_slow_count_ign_destid_endpoint_ppid_prpc

None

destId, endpoint, ppid, and prpc are excluded.

arms_app_requests_slow_count_ign_destid_endpoint_ppid_prpc_rpc

None

destId, endpoint, ppid, prpc, and rpc are excluded.

arms_app_requests_slow_count_ign_parent_ppid_prpc_rpc

None

parent, ppid, prpc, and rpc are excluded.

arms_app_requests_slow_count_ign_endpoint_parent_ppid_prpc_rpc

None

endpoint, parent, ppid, prpc, and rpc are excluded.

Request duration

arms_app_requests_seconds_raw

Seconds

  • callType: the call type.

  • rpc: the interface name.

  • prpc: the upstream interface name.

  • ppid: the upstream application PID.

  • parent: the upstream application name.

  • endpoint: If the value of callType is http_client, a peer interface endpoint is displayed. Otherwise, a remote address is displayed.

  • destId: If the value of callType is a database, a database name is displayed. Otherwise, a remote address is displayed.

arms_app_requests_seconds_ign_destid_endpoint_rpc

Seconds

destId, endpoint, and rpc are excluded.

arms_app_requests_seconds_ign_destid_endpoint_ppid_prpc

Seconds

destId, endpoint, ppid, and prpc are excluded.

arms_app_requests_seconds_ign_destid_endpoint_ppid_prpc_rpc

Seconds

destId, endpoint, ppid, prpc, and rpc are excluded.

arms_app_requests_seconds_ign_parent_ppid_prpc_rpc

Seconds

parent, ppid, prpc, and rpc are excluded.

arms_app_requests_seconds_ign_endpoint_parent_ppid_prpc_rpc

Seconds

endpoint, parent, ppid, prpc, and rpc are excluded.

Database

Number of database requests

arms_db_requests_count_raw

None

  • callType: the call type.

  • rpc: the interface name.

  • endpoint: the address of the database instance.

  • destId: the name of the database.

arms_db_requests_count_ign_rpc

None

The interface dimension is excluded.

Number of database request errors

arms_db_requests_error_count_raw

None

  • callType: the call type.

  • rpc: the interface name.

  • endpoint: the address of the database instance.

  • destId: the name of the database.

arms_db_requests_error_count_ign_rpc

None

The interface dimension is excluded.

Number of slow database requests

arms_db_requests_slow_count_raw

None

  • callType: the call type.

  • rpc: the interface name.

  • endpoint: the address of the database instance.

  • destId: the name of the database.

arms_db_requests_slow_count_ign_rpc

None

The interface dimension is excluded.

Database request duration

arms_db_requests_seconds_raw

Seconds

  • callType: the call type.

  • rpc: the interface name.

  • endpoint: the address of the database instance.

  • destId: the name of the database.

arms_db_requests_seconds_ign_rpc

Seconds

The interface dimension is excluded.

SQL

Number of SQL requests

arms_sql_requests_count_raw

  • callType: the call type.

  • rpc: the interface name.

  • endpoint: the address of the database instance.

  • destId: the name of the database.

  • sqlId: the ID of the SQL statement.

arms_sql_requests_count_ign_rpc

The interface dimension is excluded.

Number of SQL request errors

arms_sql_requests_error_count_raw

None

  • callType: the call type.

  • rpc: the interface name.

  • endpoint: the address of the database instance.

  • destId: the name of the database.

  • sqlId: the ID of the SQL statement.

arms_sql_requests_error_count_ign_rpc

None

The interface dimension is excluded.

Number of slow SQL requests

arms_sql_requests_slow_count_raw

None

  • callType: the call type.

  • rpc: the interface name.

  • endpoint: the address of the database instance.

  • destId: the name of the database.

  • sqlId: the ID of the SQL statement.

arms_sql_requests_slow_count_ign_rpc

None

The interface dimension is excluded.

SQL request duration

arms_sql_requests_seconds_raw

Seconds

  • callType: the call type.

  • rpc: the interface name.

  • endpoint: the address of the database instance.

  • destId: the name of the database.

  • sqlId: the ID of the SQL statement.

arms_sql_requests_seconds_ign_rpc

Seconds

The interface dimension is excluded.

Exception

Number of requests with exceptions

arms_exception_requests_count_raw

None

  • callType: the call type.

  • rpc: the interface name.

  • endpoint: If the value of callType is http_client, a peer interface endpoint is displayed. Otherwise, a remote address is displayed.

  • destId: If the value of callType is a database, a database name is displayed. Otherwise, a remote address is displayed.

  • excepInfo: the encoding method of the exception.

  • excepType: the encoding ID of the exception.

  • excepName: the name of the exception.

arms_exception_requests_count_ign_rpc

None

The interface dimension is excluded.

Duration of requests with exceptions

arms_exception_requests_seconds_raw

Seconds

  • callType: the call type.

  • rpc: the interface name.

  • endpoint: If the value of callType is http_client, a peer interface endpoint is displayed. Otherwise, a remote address is displayed.

  • destId: If the value of callType is a database, a database name is displayed. Otherwise, a remote address is displayed.

  • excepInfo: the encoding method of the exception.

  • excepType: the encoding ID of the exception.

  • excepName: the name of the exception.

arms_exception_requests_seconds_ign_rpc

Seconds

The interface dimension is excluded.

Status code

Number of different status code requests

arms_requests_by_status_count_raw

None

  • rpc: the interface name.

  • status: the status code. Values:

    • 200

    • 2xx

    • 3xx

    • 4xx

    • 5xx

arms_requests_by_status_count_ign_rpc

None

The interface dimension is excluded.

Quantile

Quantile of request duration

Note

The ARMS agent v4.X and later supports the metric.

arms_uni_requests_latency_seconds

  • callType: the call type.

  • rpc: the interface name.

  • quantile: the quantile. Values:

    • 0.5: 50th percentile

    • 0.75: 75th quantile

    • 0.90: 90th quantile

    • 0.99: 99th quantile

  • endpoint: If the value of callType is http_client, a peer interface endpoint is displayed. Otherwise, a remote address is displayed.

  • destId: If the value of callType is a database, a database name is displayed. Otherwise, a remote address is displayed.

  • excepInfo: the encoding method of the exception.

  • excepType: the encoding ID of the exception.

  • excepName: the name of the exception.

  • status: the status code. Values:

    • 200

    • 2xx

    • 3xx

    • 4xx

    • 5xx

Examples

How do I select a metric if I want to use promQL to monitor and query the number of requests of all interfaces for an application?

  1. First, narrow down the metrics to general-purpose metrics because only general-purpose metrics can be used to monitor the number of interface requests.

  2. Secondly, select metrics that include the interface dimension rather than other dimensions, such as upstream interface, upstream application, or remote address.

In summary, the optimal metric is arms_app_requests_count_ign_destid_endpoint_ppid_prpc.

JVM metrics

Common dimensions

Dimension name

Dimension key

Service name

service

Service PID

pid

Server IP address

serverIp

Metrics

Metric description

Metric

Measurement

Collection interval (seconds)

Unit

Dimension

Cumulative GC occurrences

arms_jvm_gc_total

Counter

15

None

GC generation:

  • Young: Young Generation

  • Old: Old Generation

Cumulative GC duration

arms_jvm_gc_seconds_total

Counter

15

Seconds

Occurrences of GC between two collection intervals

arms_jvm_gc_delta

Gauge

15

None

Duration of GC between two collection intervals

arms_jvm_gc_seconds_delta

Gauge

15

Seconds

Number of JVM threads

arms_jvm_threads_count

Gauge

15

None

Thread status:

  • Blocked

  • Live

  • Daemon

  • New

  • Dead-lock

  • Runnable

  • Terminated

  • Timed-wait

  • Wait

Initial size of JVM memory area

arms_jvm_mem_init_bytes

Gauge

15

Bytes

Area:

  • Heap memory

  • Non-heap memory

  • Total

ID space:

  • Eden Space

  • Old Generation

  • Survivor Space

  • Metaspace

  • Code Cache

  • Compressed Class Space

  • Total

Maximum size of JVM memory area

arms_jvm_mem_max_bytes

Gauge

15

Bytes

Used size of JVM memory area

arms_jvm_mem_used_bytes

Gauge

15

Bytes

Committed size of JVM memory area

arms_jvm_mem_committed_bytes

Gauge

15

Bytes

Usage ratio of JVM memory area

arms_jvm_mem_usage_ratio

Gauge

15

Ratio (0 to 1)

Loaded JVM classes

arms_class_load_loaded

Counter

15

None

None

Unloaded JVM classes

arms_class_load_un_loaded

Counter

15

None

None

JVM cache pool size

arms_jvm_buffer_pool_total_bytes

Gauge

15

Bytes

ID space:

  • Direct

  • Mapped

Used size of JVM cache pool

arms_jvm_buffer_pool_used_bytes

Gauge

15

Bytes

Number of JVM cache pools

arms_jvm_buffer_pool_count

Gauge

15

None

Number of opened file descriptors

arms_file_desc_open_count

Gauge

15

None

None

File descriptor opening ratio (Number of opened file descriptors/Maximum number allowed)

arms_file_desc_open_ratio

Gauge

15

Ratio (0 to 1)

None

System metrics

Common dimensions

Dimension name

Dimension key

Service name

service

Service PID

pid

Server IP address

serverIp

Metrics

Metric description

Metric

Measurement

Collection interval (seconds)

Unit

Idle CPU percentage

arms_system_cpu_idle

Gauge

15

Percentage

I/O wait CPU percentage

arms_system_cpu_io_wait

Gauge

15

Percentage

System CPU percentage

arms_system_cpu_system

Gauge

15

Percentage

User CPU percentage

arms_system_cpu_user

Gauge

15

Percentage

System load (1 minute)

arms_system_load

Gauge

15

None

Idle disk size

arms_system_disk_free_bytes

Gauge

15

Bytes

Total disk size

arms_system_disk_total_bytes

Gauge

15

Bytes

Disk usage

arms_system_disk_used_ratio

Gauge

15

Ratio (0 to 1)

Memory buffer size

arms_system_mem_buffers_bytes

Gauge

15

Bytes

Memory cache size

arms_system_mem_cached_bytes

Gauge

15

Bytes

Idle memory size

arms_system_mem_free_bytes

Gauge

15

Bytes

Idle memory swap size

arms_system_mem_swap_free_bytes

Gauge

15

Bytes

Memory swap size

arms_system_mem_swap_total_bytes

Gauge

15

Bytes

Memory size

arms_system_mem_total_bytes

Gauge

15

Bytes

Used memory size

arms_system_mem_used_bytes

Gauge

15

Bytes

Inbound network traffic

arms_system_net_in_bytes

Gauge

15

Bytes

Outbound network traffic

arms_system_net_out_bytes

Gauge

15

Bytes

Number of network ingress errors

arms_system_net_in_err

Gauge

15

None

Number of network egress errors

arms_system_net_out_err

Gauge

15

None

Thread pool and connection pool metrics

Common dimensions

Dimension name

Dimension key

Service name

service

Service PID

pid

Server IP address

serverIp

Thread pool name (ARMS agent earlier than V4.1.x)

name

Thread pool type (ARMS agent earlier than V4.1.x)

type

Metrics

ARMS agent V4.1.x and later

Thread pool metrics

Metric description

Metric

Measurement

Collection interval (seconds)

Dimension

Number of core threads

arms_thread_pool_core_pool_size

Gauge

15

  • thread_name_pattern: the thread name pattern. Example: http-nio-8080-*.

  • thread_pool_usage: the purpose of the thread pool. Examples: Tomcat, Dubbo, and Undertow.

Maximum number of idle connections

arms_thread_pool_max_pool_size

Gauge

15

  • thread_name_pattern: the thread name pattern. Example: http-nio-8080-*.

  • thread_pool_usage: the purpose of the thread pool. Examples: Tomcat, Dubbo, and Undertow.

Number of active threads

arms_thread_pool_active_thread_count

Gauge

15

  • thread_name_pattern: the thread name pattern. Example: http-nio-8080-*.

  • thread_pool_usage: the purpose of the thread pool. Examples: Tomcat, Dubbo, and Undertow.

Current number of threads

arms_thread_pool_current_thread_count

Gauge

15

  • thread_name_pattern: the thread name pattern. Example: http-nio-8080-*.

  • thread_pool_usage: the purpose of the thread pool. Examples: Tomcat, Dubbo, and Undertow.

Maximum historical number of threads

arms_thread_pool_max_thread_count

Gauge

15

  • thread_name_pattern: the thread name pattern. Example: http-nio-8080-*.

  • thread_pool_usage: the purpose of the thread pool. Examples: Tomcat, Dubbo, and Undertow.

Number of scheduled tasks

arms_thread_pool_scheduled_task_count

Counter

15

  • thread_name_pattern: the thread name pattern. Example: http-nio-8080-*.

  • thread_pool_usage: the purpose of the thread pool. Examples: Tomcat, Dubbo, and Undertow.

Number of completed tasks

arms_thread_pool_completed_task_count

Counter

15

  • thread_name_pattern: the thread name pattern. Example: http-nio-8080-*.

  • thread_pool_usage: the purpose of the thread pool. Examples: Tomcat, Dubbo, and Undertow.

Number of rejected tasks

arms_thread_pool_rejected_task_count

Counter

15

  • thread_name_pattern: the thread name pattern. Example: http-nio-8080-*.

  • thread_pool_usage: the purpose of the thread pool. Examples: Tomcat, Dubbo, and Undertow.

Task queue size

arms_thread_pool_queue_size

Gauge

15

  • thread_name_pattern: the thread name pattern. Example: http-nio-8080-*.

  • thread_pool_usage: the purpose of the thread pool. Examples: Tomcat, Dubbo, and Undertow.

Connection pool metrics

Metric description

Metric

Measurement

Collection interval (seconds)

Dimension

Number of connections

arms_connection_pool_connection_count

Gauge

15

  • state: the status of the connection. Valid values:

    • active

    • idle

  • pool_type: the type of the connection pool. Examples: Druid and c3p0.

  • url: the connection string of the database.

Minimum number of idle connections

arms_connection_pool_connection_min_idle_count

Gauge

15

  • pool_type: the type of the connection pool. Examples: Druid and c3p0.

  • url: the connection string of the database.

Maximum number of idle connections

arms_connection_pool_connection_max_idle_count

Gauge

15

  • pool_type: the type of the connection pool. Examples: Druid and c3p0.

  • url: the connection string of the database.

Maximum number of connections

arms_connection_pool_connection_max_count

Gauge

15

  • pool_type: the type of the connection pool. Examples: Druid and c3p0.

  • url: the connection string of the database.

Number of blocked connection requests

arms_connection_pool_pending_request_count

Counter

15

  • pool_type: the type of the connection pool. Examples: Druid and c3p0.

  • url: the connection string of the database.

ARMS agent earlier than V4.1.x

Metric description

Metric

Measurement

Collection interval (seconds)

Dimension

Number of core threads

arms_threadpool_core_size

Gauge

15

None

Maximum number of threads

arms_threadpool_max_size

Gauge

15

None

Number of active threads

arms_threadpool_active_size

Gauge

15

None

Thread pool queue size

arms_threadpool_queue_size

Gauge

15

None

Current size of the thread pool

arms_threadpool_current_size

Gauge

15

None

Number of tasks in different states in the thread pool

arms_threadpool_task_total

Gauge

15

The status of the task. Valid values:

  • Scheduled: The task is scheduled.

  • Completed: The task is completed.

  • Rejected: The task is rejected.

Scheduled task metrics

The following metrics are available only for scheduled tasks.

Common dimensions

Dimension name

Dimension key

Service name

service

Service PID

pid

Server IP address

serverIp

Task ID

rpc

Metrics

Metric description

Metric

Measurement

Collection interval (seconds)

Unit

Scheduling delay

arms_$callType_delay_milliseconds

Gauge

15

Milliseconds

Service access types and available dimensions

Clients

  • Access types

    • http_client

    • dubbo_client

    • hsf_client

    • dsf_client

    • notify_client

    • grpc_client

    • thrift_client

    • sofa_client

    • mq_client

    • kafka_client

  • Dimensions

    • parent: the name of the upstream service

    • ppid: the PID of the upstream service

    • destId: the extension information of the request peer

    • endpoint: the endpoint of the request peer

    • excepType: the ID of the exception

    • excepInfo: the ID encoding rule of the exception

    • excepName: the name of the exception

    • stackTraceId: the ID of the exception stack

Databases

  • Access types

    • mysql

    • oracle

    • mariadb

    • postgresql

    • ppas

    • sqlserver

    • mongodb

    • dmdb

  • Dimensions

    • parent: the name of the upstream service

    • ppid: the PID of the upstream service

    • destId: the name of the database

    • endpoint: the endpoint of the database

    • excepType: the ID of the exception

    • excepInfo: the ID encoding rule of the exception

    • excepName: the name of the exception

    • stackTraceId: the ID of the exception stack

    • sqlId: the ID of the SQL statement

Servers

  • Access types

    • http

    • dubbo

    • hsf

    • dsf

    • user_method

    • mq

    • kafka

    • grpc

    • thrift

    • sofa

  • Dimensions

    • prpc: the upstream interface

    • parent: the name of the upstream service

    • ppid: the PID of the upstream service

    • endpoint: the endpoint of the service

    • excepType: the ID of the exception

    • excepInfo: the ID encoding rule of the exception

    • excepName: the name of the exception

    • stackTraceId: the ID of the exception stack

Scheduled tasks

  • Access types

    • xxl_job

    • spring_scheduled

    • quartz

    • elasticjob

    • jdk_timer

    • schedulerx

  • Dimensions

    • prpc: the upstream interface

    • parent: the name of the upstream service

    • ppid: the PID of the upstream service

    • excepType: the ID of the exception

    • excepInfo: the ID encoding rule of the exception

    • excepName: the name of the exception

    • stackTraceId: the ID of the exception stack