After sidecar proxies are injected into workloads, they intercept and route traffic based on the specified policy, adding a small amount of processing overhead per request. With adequate node performance, this overhead is negligible for concurrent processing. When response latency exceeds expectations, use the Envoy access log timing fields to isolate the source.
This guide walks through a two-step diagnostic process:
Compare
durationvalues across components to identify which one introduces the delay.Examine detailed timing fields to determine whether slow network transmission or slow upstream processing is the root cause.
Access log timing fields
The following Envoy access log fields are used throughout this guide:
| Field | Description |
|---|---|
duration | Total time consumed by a data plane component to process a request, from receiving the request through sending the complete response |
request_duration | Time to receive the request from the downstream node |
request_tx_duration | Time to forward the request to the upstream service |
response_duration | Time from sending the request to receiving the first byte of the response |
response_tx_duration | Time to forward the response to the downstream node |
Step 1: Identify the component that causes high latency
The duration field represents the total time a data plane component spends on a single request, including:
Receiving and forwarding the request to the upstream service
Waiting for the upstream service to return a response
Receiving and forwarding the response to the downstream node
To isolate the problematic component, trace the request path from the entry point upstream:
Check the
durationvalue at the entry point of the request path.If the value is higher than expected, move to the next upstream component and check its
duration.If the upstream component shows a normal
duration, the previous (downstream) component is the source of the delay.If the upstream component also shows high
duration, continue upstream until you find the first component with a normal value.
The component immediately downstream of the first normal-duration component is the one causing the latency.
Step 2: Determine the root cause
After you identify the problematic component, examine its detailed timing fields to determine whether the root cause is slow network transmission or slow upstream processing.
Slow network transmission
Compare request_duration and request_tx_duration:
High
request_duration: The data plane component (sidecar proxy or gateway) is slow to receive the request from the downstream node.High
request_tx_duration: The component is slow to forward the request to the upstream service.
For HTTP requests with a body, receiving and forwarding happen simultaneously -- the body is streamed to the upstream service as it is received, rather than buffered first. A highrequest_durationcan therefore cause a correspondingly highrequest_tx_duration.
Interpret the results based on the pattern:
| Pattern | Likely cause | Action |
|---|---|---|
Only request_tx_duration is high | The request is received quickly but forwarded slowly | Investigate the network path between the component and its upstream service |
Both request_duration and response_tx_duration are high | The response is received slowly from the upstream service or forwarded slowly to the downstream node | Investigate overall network conditions between the component and its peers |
Slow upstream processing
Calculate the upstream processing time:
upstream processing time = response_duration - request_tx_durationresponse_duration: Time from sending the request to receiving the first byte of the response.request_tx_duration: Time spent forwarding the request.
The difference represents the time the upstream service spends processing the request. A large value indicates slow upstream processing or high network latency on the upstream path. Investigate the upstream service's performance and the network between the component and the upstream service.