Application Real-Time Monitoring Service (ARMS) provides the Trace Explorer feature to analyze stored full trace data. Trace Explorer allows you to combine filter conditions and aggregation dimensions for real-time analysis based on the stored full trace data. This can meet the custom diagnostics requirements in various scenarios.
Prerequisites
Application Monitoring provides a new application details page for users who have enabled the new billing mode. For more information, see Billing (new).
If you have not enabled the new billing mode, you can click Switch to New Version on the Application List page to view the new application details page.
An ARMS agent is installed for the application. For more information, see Application Monitoring overview.
Procedure
Log on to the ARMS console. In the left-side navigation pane, choose .
On the Application List page, select a region in the top navigation bar and click the name of the application that you want to manage.
NoteIf the icon is displayed in the Language column, the application is connected to Application Monitoring. If a hyphen (-) is displayed, the application is connected to Managed Service for OpenTelemetry.
In the top navigation bar, click Trace Explorer.
In the upper-right corner of the Trace Explorer page, select a time range that you want to query.
Specify filter conditions.
In the Quick Filter section, query traces by status, duration, application name, span name, or host address.
The filter conditions that you specify are displayed in the search box.
Click the search box at the top. In the drop-down dialog box, configure the existing filter conditions or add custom filter conditions.
In the search box, enter a query statement. For more information about the syntax, see Usage methods of Trace Explorer.
NoteYou can click the icon next to the search box to save the current filter conditions.
You can click Saved View to view the saved filter conditions and click a filter condition to view the corresponding trace data.
You can aggregate the queried data based on specific dimensions.
Trace list
After you specify filter conditions, trace data is displayed on the Trace Explorer tab. The trace data includes the number of calls and the number of HTTP errors in column charts, call duration in a time series curve, and trace list.
In the trace list, you can perform the following operations:
Click the ID of the trace that you want to call or click Details in the Actions column to view trace details and its topology view. For more information, see the Analysis of failed and slow traces section.
Click Logs in the Actions column to view the logs of a trace. For more information, see Use the log analysis feature.
Click the icon in the upper-right corner to add or hide the fields of the list.
Move the pointer over a trace and click the icon to add the current parameter values as a filter condition.
Scatter chart
On the Scatter plot tab, time points are distributed along the X axis and the duration is distributed along the Y axis. You can move the pointer over a point to view the basic information of the trace, and click a point to view the details of the trace. For more information, see the Trace details section.
Trace aggregation details
Trace Explorer allows you to analyze a queried span based on various dimensions. However, you may need to analyze traces that consist of a large number of spans. The trace aggregation feature allows you to query up to 5,000 distributed traces by using specified conditions and query the corresponding spans based on the trace IDs. Then, you can aggregate the queried spans to obtain the results. The integrity of the aggregated traces is guaranteed in this process.
When you use the trace aggregation feature, note that aggregate queries are performed on the trace data based on the specified conditions. If you specify multiple query conditions, the calculation may not be completed in real time. We recommend that you wait patiently.
Parameter | Description |
spanName | The name of the span. |
serviceName | The name of the application that corresponds to the span. |
Number of requests/percentage of requests | The request ratio indicates the ratio of the requests that call the current span to the total number of requests. For example, 10% indicates that 10% requests call the current span. Calculation formula: Request ratio = Number of requests that call the current span/Total number of requests × 100% |
span/Request Multiple | The request multiple indicates the average number of times that the current span is called by each request. For example, 1.5 indicates that the current span is called 1.5 times by each request. Calculation formula: Request multiple = Number of spans/Number of requests |
Average self-consumption/proportion | The average duration of a span excludes the duration of its child spans. For example, if Span A takes 10 milliseconds and its child span (Span B) takes 8 milliseconds, the average duration of Span A is 2 milliseconds. Calculation formula: Average duration of a span = Duration of the span - Duration of all child spans Important For asynchronous calls, the average duration of a span includes the duration of its child spans. |
Average Duration | The average duration of the span. |
Number of exceptions/percentage of exceptions | The exception ratio indicates the ratio of requests with exceptions to the total number of requests. For example, 3% indicates that exceptions occur in 3% of requests. Calculation formula: Number of requests with exceptions/Total number of requests Important The number of requests with exceptions is not equal to the number of exceptions. If the request multiple is greater than 1, a request may have multiple exceptions. |
Example: Span A calls Span B and Span C. The following table shows the parameters.
spanName | serviceName | Number of requests/percentage of requests | span/Request Multiple | Average self-consumption/proportion | Average Duration | Number of exceptions/percentage of exceptions | |
A | - | demo | 10/100.00% | 10/1.00 | 5.00ms/25.00% | 20ms | 2/20.00% |
- | B | demo | 4/40.00% | 8/2.00 | 16.00ms/100.00% | 16ms | 2/50.00% |
- | C | demo | 1/10.00% | 1/1.00 | 4.00ms/100.00% | 4ms | 1/100.00% |
The Number of requests/percentage of requests parameter of Span A indicates that the total number of requests is 10 and the request ratio is 100%. The Number of requests/percentage of requests parameter of Span B indicates that only 4 requests call Span B. Similarly, only one request calls Span C. The request ratio of Span B is 40% and request ratio of Span C is 10%. Other requests do not call Span B and Span C due to logical judgments or exceptions. This reflects the distribution of requests.
The span/Request Multiple parameter of Span A is 10/1.00, which indicates that Span A is called only once by each request. However, for Span B, eight spans are called by four requests. Therefore, Span B is called twice by each request. This reflects the distribution of spans in each request.
The Average self-consumption/proportion parameter of Span A is 5.00 ms/25.00%, which indicates that the average duration of Span A (except Span B and Span C) is 5 milliseconds. The average duration of Span A accounts for only 25% of the overall average duration. However, the average duration of Span B and Span C is equal to their overall average duration because Span B and Span C do not have child spans. This reflects the distribution of average duration.
The Number of exceptions/percentage of exceptions parameter of Span A is 2/20.00%, which indicates that Span A has two exceptions, accounting for 20% of the total number of requests. The Number of exceptions/percentage of exceptions parameter of Span B is 2/50.00%. Given that each request calls Span B twice, the total number of requests is 4, and the exception ratio is 50%, two requests have exceptions. Therefore, the distribution of exceptions in Span B may be: Among the total four requests, two requests are successful. Among the remaining two requests, an exception occurs in the first call of each request whereas the second call is successful.
To view the details of a specific trace, move the pointer over the blue span name. You can click the recommended traceId to view the details.
Trace topology
The Full Link Topology tab displays the inter-application topology of aggregated traces. The following figure shows that the two applications have call relationships. The following information is displayed for each application: the number of requests, the number of errors, and the response time.
Analysis of failed and slow traces
Analysis of failed and slow traces helps you analyze the common dimensions of multiple failed and slow traces. Traces may be concentrated in one host, or belong to one interface. You can query traces by host or interface, or combine multiple filter conditions to query traces and locate problems. Example: serviceName="arms-demo" AND ip="192.168.1.1"
. Analysis of failed and slow traces can also help you sort out slow interfaces and perform directional optimization on the system.
Analysis of slow traces
ARMS analyzes 1,000 traces with the longest duration and displays five dimensions that are most related to slow traces.
Slow trace details
ARMS selects 1,000 traces with the longest duration from traces whose duration is greater than the threshold, samples 1,000 traces whose duration is less than the threshold, compares these traces, and then discovers three characteristics that are most related to slow calls.
You can configure the threshold based on your business requirements. Assume that you want to discover the characteristics of traces that take more than 1 minute, you can set the threshold to 60000 milliseconds.
Analysis of failed traces
ARMS randomly selects and analyzes 1,000 failed traces, and displays five dimensions that are most related to the traces.
Failed trace details
ARMS compares failed traces with normal traces and discovers three characteristics that are most related to failed calls.
Trace details
Component tags (section marked with the number 1 in the preceding figure)
The tags show the call types and the number of spans.
The call types are defined by the attributes.component.name field.
Click a tag to hide or show the spans related to the call type.
Horizontal bar chart of the trace (section marked with the number 2 in the preceding figure)
The bar chart shows the entire trace and the distribution of spans.
Each bar represents a span. Only spans whose duration is greater than 1% of the total duration are displayed.
Different colors represent different applications. As shown in the preceding figure, the color blue represents the opentelemetry-demo-adservice application.
The length of a black line in the chart represents the self-time of a span, which is the total time of the span minus the time spent in its child spans. Assume Span A calls Span B. Span A takes 10 milliseconds and Span B takes 8 milliseconds. In this case, Span A takes 2 milliseconds.
The timeline represents the time range of the trace.
Trace focus and filtering (section marked with the number 3 in the preceding figure)
Each row in this section represents a span and shows the hierarchical relationship between the parent span and the child span. Each parent span is preceded by a number, which indicates the number of child spans owned by the parent span. In this section, you can perform the following operations:
Collapse: Click the icon to collapse or expand a span.
Focus: Select a span and click the icon. The system displays only the data of the span and the downstream data.
Defocus: Click the icon to defocus a span.
Filter: Enter the information of a span in the search box, such as the span name, application name, or attributes to view the trace data ranging from the span to the entry span. To cancel the filtering, delete the input information from the search box and click the Search icon.
Zoom in and out: Click the icon to zoom in the trace and hide the bar chart. Click the icon to show the bar chart.
Span details (section marked with the number 4 in the preceding figure)
The Span Details section provides the details of the current span, and the relevant metric data, logs, and exception information. You can also manage custom interaction events and configure triggering for interaction events.
Additional Information: displays the attributes, resources, details, and events of the span. The additional information is grouped by type. For information about the fields in the span details, see Trace Explorer parameters.
Metrics: displays the metrics related to the span. For traces of Java applications monitored in ARMS, metrics about JVMs and hosts are displayed. For traces reported by an open source agent, metrics defined by the RED Method, including rate, errors, and duration, are displayed.
Logs: displays the business logs related to the trace. If you have configured a Simple log Service (SLS) Logstore for the application, you can go to the Logstore and query the business logs based on the trace ID.
Exceptions: displays exception information related to the span, if any.
Event Config: allows you to configure interaction events for one or more attributes of the trace. This way, you can query more details about the trace or view the logs and metrics related to the trace. For information about how to configure custom interaction events, see Configure a custom interaction event for a trace.
Custom development
Trace data is stored in SLS. The project name is proj-xtrace-<encode>-<region-id>. The Logstore name is logstore-tracing. The region-id parameter is the region where you use Trace Explorer, for example, cn-hangzhou. For information about data formats, see Trace Explorer parameters. You can perform custom development on the stored full trace data. You can analyze the stored full trace data based on filter conditions or aggregation dimensions. This way, the requirements of custom diagnostics in various scenarios can be met. For more information, see Analyze trace data in real time by using Trace Explorer.