All Products
Search
Document Center

Alibaba Cloud Service Mesh:Diagnose high response latency by using access logs

Last Updated:Mar 11, 2026

After sidecar proxies are injected into workloads, they intercept and route traffic based on the specified policy, adding a small amount of processing overhead per request. With adequate node performance, this overhead is negligible for concurrent processing. When response latency exceeds expectations, use the Envoy access log timing fields to isolate the source.

This guide walks through a two-step diagnostic process:

  1. Compare duration values across components to identify which one introduces the delay.

  2. Examine detailed timing fields to determine whether slow network transmission or slow upstream processing is the root cause.

Access log timing fields

The following Envoy access log fields are used throughout this guide:

FieldDescription
durationTotal time consumed by a data plane component to process a request, from receiving the request through sending the complete response
request_durationTime to receive the request from the downstream node
request_tx_durationTime to forward the request to the upstream service
response_durationTime from sending the request to receiving the first byte of the response
response_tx_durationTime to forward the response to the downstream node

Step 1: Identify the component that causes high latency

The duration field represents the total time a data plane component spends on a single request, including:

  • Receiving and forwarding the request to the upstream service

  • Waiting for the upstream service to return a response

  • Receiving and forwarding the response to the downstream node

To isolate the problematic component, trace the request path from the entry point upstream:

  1. Check the duration value at the entry point of the request path.

  2. If the value is higher than expected, move to the next upstream component and check its duration.

  3. If the upstream component shows a normal duration, the previous (downstream) component is the source of the delay.

  4. If the upstream component also shows high duration, continue upstream until you find the first component with a normal value.

The component immediately downstream of the first normal-duration component is the one causing the latency.

Step 2: Determine the root cause

After you identify the problematic component, examine its detailed timing fields to determine whether the root cause is slow network transmission or slow upstream processing.

Slow network transmission

Compare request_duration and request_tx_duration:

  • High request_duration: The data plane component (sidecar proxy or gateway) is slow to receive the request from the downstream node.

  • High request_tx_duration: The component is slow to forward the request to the upstream service.

For HTTP requests with a body, receiving and forwarding happen simultaneously -- the body is streamed to the upstream service as it is received, rather than buffered first. A high request_duration can therefore cause a correspondingly high request_tx_duration.

Interpret the results based on the pattern:

PatternLikely causeAction
Only request_tx_duration is highThe request is received quickly but forwarded slowlyInvestigate the network path between the component and its upstream service
Both request_duration and response_tx_duration are highThe response is received slowly from the upstream service or forwarded slowly to the downstream nodeInvestigate overall network conditions between the component and its peers

Slow upstream processing

Calculate the upstream processing time:

upstream processing time = response_duration - request_tx_duration
  • response_duration: Time from sending the request to receiving the first byte of the response.

  • request_tx_duration: Time spent forwarding the request.

The difference represents the time the upstream service spends processing the request. A large value indicates slow upstream processing or high network latency on the upstream path. Investigate the upstream service's performance and the network between the component and the upstream service.