After the trace data of an application is reported to Managed Service for OpenTelemetry, Managed Service for OpenTelemetry starts to monitor the application. Managed Service for OpenTelemetry provides the Trace Explorer feature that allows you to combine filter conditions and aggregation dimensions based on your business requirements to analyze the stored full trace data in real time. This meets the custom diagnostics requirements in various scenarios.
Prerequisites
The data of the application that you want to monitor is reported to Managed Service for OpenTelemetry. For more information, see Connection Description.
Filter traces
Log on to the Managed Service for OpenTelemetry console.
In the left-side navigation pane, click Trace Explorer. In the top navigation bar, select the region from which the traces are reported.
In the upper-right corner of the Trace Explorer page, select a time range that you want to query.
Specify filter conditions.
In the Quick Filter section, query traces by status, duration, application name, span name, or host address.
The filter conditions that you specify are displayed in the search box.
Click the search box and modify the specified filter conditions in the dialog box that appears. You can also add custom filter conditions.
Enter a query statement in the search box. For information about the syntax of a query statement, see Use Trace Explorer to query traces.
NoteYou can click the icon next to the search box to save the current filter conditions.
You can click Saved View to view the saved filter conditions and click a filter condition to view the corresponding trace data.
You can aggregate the queried data based on the specified dimensions.
Trace list
The trace data that meets the specified filter conditions is displayed on the Trace Explorer page. The trace data includes the column charts of spans and HTTP errors, the time series curve of duration, and a span list.
In the span list, you can perform the following operations:
Click Details in the Actions column to view the complete trace information of a trace. For more information, see the Trace details section of this topic.
Click Logs in the Actions column to view the logs of a trace.
Click the icon in the upper-right corner of the trace list to add or hide the fields of the list.
Move the pointer over a span and click the icon to add the current parameter values as a filter condition.
Scatter chart
On the Scatter plot tab, time points are distributed along the X axis and the response time is distributed along the Y axis. To view the basic information about a trace, move the pointer over the corresponding point. You can click a point to view the call details of the corresponding trace. For more information, see the Trace details section of this topic.
Trace aggregation
Trace Explorer allows you to analyze a single span based on various dimensions. However, you may need to analyze traces that consist of a large number of spans. The trace aggregation feature allows you to query up to 5,000 distributed traces by using the specified conditions and query the corresponding spans based on the trace IDs. Then, you can aggregate the queried spans to obtain the results. During this process, the integrity of the aggregated traces is guaranteed.
When you use the trace aggregation feature, the aggregate queries are performed on the trace data based on the specified conditions. Multiple query conditions may lead to a calculation delay. Wait until the calculation is complete.
Parameter | Description |
spanName | The name of the span. |
serviceName | The name of the application that corresponds to the span. |
Number of requests/percentage of requests | The request ratio indicates the ratio of the requests that call the current span to the total number of requests. For example, a value of 10% indicates that 10% of requests call the current span. The request ratio is calculated based on the following formula: Request ratio = Number of requests that call the current span/Total number of requests × 100% |
span/Request Multiple | The request multiple indicates the average number of times that the current span is called by each request. For example, a value of 1.5 indicates that the current span is called 1.5 times by each request. The request multiple is calculated based on the following formula: Request multiple = Number of spans/Number of requests |
Average self-consumption/proportion | The average exclusive duration of a span excludes the duration of its child spans. For example, if Span A takes 10 milliseconds and its child span (Span B) takes 8 milliseconds, the exclusive duration of Span A is 2 milliseconds. The exclusive duration is calculated based on the following formula: Exclusive duration of a span = Duration of the span - Duration of all child spans Important For asynchronous calls, the exclusive duration of a span includes the duration of its child spans. |
Average Duration | The average duration of the span. |
Number of exceptions/percentage of exceptions | The exception ratio indicates the ratio of requests with exceptions to the total number of requests. For example, a value of 3% indicates that exceptions occur in 3% of requests. The exception ratio is calculated based on the following formula: Exception ratio = Number of requests with exceptions/Total number of requests Important The number of requests with exceptions is not equal to the number of exceptions. If the request multiple is greater than 1, a request may have multiple exceptions. |
In this example, Span A calls its child spans Span B and Span C. The following section describes the parameters.
spanName | serviceName | Number of requests/percentage of requests | span/Request Multiple | Average self-consumption/proportion | Average Duration | Number of exceptions/percentage of exceptions | |
A | - | demo | 10/100.00% | 10/1.00 | 5.00ms/25.00% | 20ms | 2/20.00% |
- | B | demo | 4/40.00% | 8/2.00 | 16.00ms/100.00% | 16ms | 2/50.00% |
- | C | demo | 1/10.00% | 1/1.00 | 4.00ms/100.00% | 4ms | 1/100.00% |
The Number of requests/percentage of requests parameter of Span A indicates that the total number of requests is 10 and the request ratio is 100%. The Number of requests/percentage of requests parameter of Span B indicates that only four requests call Span B. Similarly, only one request calls Span C. The request ratio of Span B is 40% and the request ratio of Span C is 10%. Other requests do not call Span B and Span C due to logical judgments or exceptions. This reflects the distribution of requests.
The span/Request Multiple parameter of Span A is 10/1.00, which indicates that Span A is called only once by each request. However, for Span B, eight spans are called by four requests. Therefore, Span B is called twice by each request. This reflects the distribution of spans in each request.
The Average self-consumption/proportion parameter of Span A is 5.00 ms/25.00%, which indicates that the average exclusive duration of Span A is 5 milliseconds. This duration excludes the duration of Span B and Span C. The average exclusive duration of Span A accounts for only 25% of the overall average duration. The average duration of Span B and Span C is equal to their overall average duration because Span B and Span C do not have child spans. This reflects the distribution of average duration.
The Number of exceptions/percentage of exceptions parameter of Span A is 2/20.00%, which indicates that Span A has two exceptions, accounting for 20% of the total number of requests. The Number of exceptions/percentage of exceptions parameter of Span B is 2/50.00%. Given that each request calls Span B twice, the total number of requests for Span B is 4, and the exception ratio of Span B is 50%, two requests for Span B have exceptions. Therefore, the exceptions in Span B may be distributed in the following circumstance: Among the total four requests, two requests are successful. Among the remaining two requests, an exception occurs in the first call of each request whereas the second call is successful.
To view the details of a specific trace, move the pointer over the span name and click the recommended trace ID that appears.
Trace topology
The Full Link Topology tab displays the inter-application topology of aggregated traces. The following figure shows that the two applications have call relationships. The following information is displayed for each application: the number of requests, number of errors, and response time.
Analysis of failed and slow traces
Analysis of failed and slow traces helps you analyze the common dimensions of multiple failed and slow traces. Traces may be concentrated in one host, or belong to one interface. You can query traces by host or interface, or combine multiple filter conditions to query traces and locate problems. Example: serviceName="arms-demo" AND ip="192.168.1.1"
. Analysis of failed and slow traces can also help you sort out slow interfaces and perform directional optimization on the system.
Analysis of slow traces
ARMS analyzes 1,000 traces with the longest duration and displays five dimensions that are most related to slow traces.
Slow trace details
ARMS selects 1,000 traces with the longest duration from traces whose duration is greater than the threshold, samples 1,000 traces whose duration is less than the threshold, compares these traces, and then discovers three characteristics that are most related to slow calls.
You can configure the threshold based on your business requirements. Assume that you want to discover the characteristics of traces that take more than 1 minute, you can set the threshold to 60000 milliseconds.
Analysis of failed traces
ARMS randomly selects and analyzes 1,000 failed traces, and displays five dimensions that are most related to the traces.
Failed trace details
ARMS compares failed traces with normal traces and discovers three characteristics that are most related to failed calls.
Trace details
In the details panel of a trace, you can view all the spans, start time, errors, total duration of the trace, and the duration of each span.
In the trace details panel, you can perform the following operations:
Click the icon next to a span to view the method stacks and analysis overview.
Method stacks
Analysis overview
Click the name of a span to view the additional information, metric details, and logs on the right of the panel.
References
To prevent errors from being diagnosed after the errors occur, you can use the alerting feature to create alert rules for one or all operations. This way, the system sends notifications to the O&M team when the errors occur. For more information about how to create an alert rule, see Create an alert rule.