A common issue that affects the service performance is connection timeouts caused by slow requests. The slow log feature allows you to find the IP addresses of the clients that send these requests and troubleshoot issues.
Background information
Slow logs record requests that take longer than a specified threshold to execute in Tair. Slow logs are classified into slow logs from data nodes and slow logs from proxy nodes.
- If the Tair instance uses the standard architecture, only slow logs from data nodes are collected.
- For more information, see Modify parameters of an instance.
Slow log type | Description | Parameter |
---|---|---|
Slow logs from data nodes |
|
|
Slow logs from proxy nodes |
| rt_threshold_ms: the threshold of the command execution duration for slow logs from proxy nodes. Default value: 500. Unit: milliseconds. We recommend that you set the threshold to a value close to the client timeout period, which is from 200 milliseconds to 500 milliseconds. |
Methods used to query slow logs
Slow log type | Method |
---|---|
Slow logs from data nodes |
|
Slow logs from proxy nodes | Log on to the Tair console or call an API operation: |
Procedure
In most cases, service timeouts are caused by slow requests. We recommend that you perform the following steps to troubleshoot the timeout issues:
- If a service timeout issue occurs, first check the slow logs generated on proxy nodes. For more information, see View slow logs. Note
- If the instance uses the standard architecture, go to Step 3 to analyze slow logs from data nodes.
- If no slow logs from proxy nodes exist, you can check the network between the client and the instance.
- Find the command that caused the earliest slow log from proxy nodes. Note If slow requests accumulate on data nodes, these requests are recorded in slow logs from proxy nodes.
In this example, the earliest recorded slow log is caused by the KEYS command. The IP address on the right of the log entry is the IP address of the client that sends the command.
- Check the slow logs from data nodes to find the slow logs from proxy nodes that cause the timeout issue. Note Typically, the command that first generates slow logs on proxy nodes can also generate slow logs on data nodes. The number of slow logs from a data node is usually smaller than that of a proxy node. This is because the two slow log types have different definitions of execution time and slow log thresholds.
In this example, after you view slow logs from proxy nodes, you can find that the slow log caused by the KEYS command also exists in slow logs from data nodes. No other slow logs that are displayed on the Proxy tab exist on the Data nodes tab. This shows that the KEYS command causes the timeout.
- In slow logs from proxy nodes, you can search for the client IP address for optimization based on the command found in Step 2.