This topic describes the trace sampling modes that are supported by Application Real-Time Monitoring Service (ARMS). You can select an appropriate mode based on your scenarios so that you can obtain the trace data that you want at a low cost.
Generally, numerous traces of distributed systems are duplicate or unimportant. Sampling is required to monitor only the necessary traces and reduce monitoring costs.
The basic principle of trace sampling is to preferentially record the traces that you are most concerned about and most likely to access. ARMS provides the following trace sampling modes:
Fixed-rate sampling
Fixed-rate sampling records a specific proportion of trace data based on the ordinal number of TraceId. For example, if the fixed rate is 10%, one out of every 10 pieces of trace data is recorded. Fixed-rate sampling avoids incomplete trace data. The data of an entire trace is retained or discarded.
You can perform the following steps to configure fixed-rate sampling:
Log on to the ARMS console. In the left-side navigation pane, choose .
On the Application List page, select a region in the top navigation bar and click the name of the application that you want to manage.
NoteIf the icon is displayed in the Language column, the application is connected to Application Monitoring. If a hyphen (-) is displayed, the application is connected to Managed Service for OpenTelemetry.
- In the left-side navigation pane, click Application Settings. On the page that appears, click the Custom Configuration tab.
In the Sampling rate setting section, you can set a sampling rate. Set the Sample rate model parameter to Fixed sampling rate. In the Sampling Rate Settings field, enter a percent value. For example, if you enter 10, the sampling rate is 10%.
NoteThe modifications take effect immediately. You do not need to restart the application. The default value is 10. If you increase the sampling rate, additional system resources are consumed. We recommend that you keep the default value.
Adaptive sampling
To further reduce monitoring costs and optimize trace query experience, Application Monitoring introduced the adaptive sampling mode. Adaptive sampling dynamically determines whether to sample a trace based on multiple sampling policies. This realizes a balance between the low sampling rate and high sampling rate, and between low costs and comprehensive monitoring. We recommend that you use the adaptive sampling mode in scenarios where business traffic is heavy or changes greatly.
The following sampling policies are supported: full sampling for specific interfaces, sampling for top N requests, and minimum sampling for all interfaces.
Full sampling for specific interfaces: You can enter names, prefixed, or suffixes to specify the interfaces whose traces you want to completely sample. For requests sent from a specified interface, all the traces are sampled. Full sampling results in the increase in the amount of data collected. Make sure that full sampling is enabled only for the key interfaces or when necessary.
Sampling for top N requests: Based on the Least Frequently Used (LFU) algorithm, sampling is performed only for the certain entries of an interface. This ensures that data collected does not increase linearly with the traffic of the interface.
Minimum sampling for all interfaces: The traces of each interface are sampled at least once in a period of time. This ensures that valuable information is recorded for each interface when the traffic is low.
You can only modify the full sampling settings of specific interfaces in a fine-grained manner. Settings of sampling for top N requests and minimum sampling for all interfaces cannot be modified.
To enable adaptive sampling, perform the following steps:
Log on to the ARMS console. In the left-side navigation pane, choose .
On the Application List page, select a region in the top navigation bar and click the name of the application that you want to manage.
NoteIf the icon is displayed in the Language column, the application is connected to Application Monitoring. If a hyphen (-) is displayed, the application is connected to Managed Service for OpenTelemetry.
- In the left-side navigation pane, click Application Settings. On the page that appears, click the Custom Configuration tab.
In the Sampling rate setting section, set the Sample rate model parameter to Adaptive sampling. You can specify the interface names, prefix, and suffix for full sampling.
NoteThe modifications take effect immediately. You do not need to restart the application. The adaptive sampling mode is supported only by the ARMS agent V2.8.3 and later.
Basic Edition sampling
Basic Edition sampling is available only to users who have activated Application Monitoring Basic Edition. A free sampling policy and multiple custom sampling policies are provided.
Free sampling policy: By default, ARMS collects one trace per minute for each agent of all interfaces within your account for free.
Custom sampling policy: You can configure a custom sampling policy based on your needs. Each custom sampling policy allows you to sample traces based on a fixed proportion or fixed traffic, and can apply to all or specific interfaces.
To configure a custom sampling policy, perform the following steps:
Log on to the ARMS console. In the left-side navigation pane, choose .
On the Application List page, select a region in the top navigation bar and click the name of the application that you want to manage.
NoteIf the icon is displayed in the Language column, the application is connected to Application Monitoring. If a hyphen (-) is displayed, the application is connected to Managed Service for OpenTelemetry.
- In the left-side navigation pane, click Application Settings. On the page that appears, click the Custom Configuration tab.
In the Sampling rate setting section, click Add Client Sampling Policy. Set the following parameters and click OK.
Parameter
Description
Policy Name
The name of the sampling policy.
Sampling type and Sampling value
Fixed proportion sampling: Traces are sampled based on a specified fixed ratio. After you select this option, you must enter a ratio in the Sampling value field, such as 10%.
Flow limit: A specified number of traces are sampled within a specified time interval. After you select this option, you must set the number of traces to be collected by each agent and the time interval. For example, 5 traces are collected by each agent every 1 second.
Applicable interface
The range of interfaces to which the sampling policy applies. Valid values: Each interface or Specify Interface. If you select Specify Interface, you must enter an interface name.
NoteIf you select Specify Interface, you can enter only one interface name. If you want to specify multiple interfaces, you must set a sampling policy for each interface.
Examples:
For the /elastic/update interface, 20 traces are sampled per minute.
For all interfaces, traces are sampled based on the 20% of traffic.
Sampling for failed or slow requests
If a request meets one of the following conditions, the relevant spans are sampled.
The request has errors reported. Scenarios:
Status codes other than 2xx or 3xx are returned for the HTTP interface.
Errors are thrown to the framework due to services exceptions and caught by LocalRootSpan.
The duration of the request is longer than the 99th percentile of the historical request duration of the same interface. The 99th percentile of the historical request duration may be deviated due to bucket aggregations.
Exceptions are thrown by a method of the request. The condition is supported only by the ARMS agent V4.1.x.
Note that the sampling policy cannot guarantee the integrity of the entire trace. When the sampling is triggered, only spans within the application are saved.
Usage notes
To prevent trace collection from affecting your business when traffic is heavy, ARMS limits the maximum number of traces collected by the ARMS agent per second to 100. The value takes effect for both fixed-rate sampling and adaptive sampling. To modify the value, configure the Throttling Threshold parameter on the Custom Configuration tab.
If you specify a larger value, additional system resources are consumed. We recommend that you keep the default value.
References
After traces are sampled, you can configure filter conditions and aggregation dimensions to analyze the trace data in real time. For more information, see Trace Explorer.