Tair (Redis OSS-compatible) instances run at the data layer that is closer to the application layer. Therefore, data is frequently written to or read from instances. This consumes large amounts of bandwidth resources. The maximum bandwidth available to an instance varies based on the instance type. If the maximum bandwidth of an instance is exceeded, applications may be unable to access data that resides on the instance.
Step 1: Analyze traffic usage
Check the traffic usage of an instance within a specific period of time. For more information, see View performance monitoring data.
In this example, the traffic usage in both the inbound and outbound directions stays at 100%, as shown in the following figure.
In most cases, if the average traffic usage stays around 80%, we recommend that you troubleshoot the issue. This helps prevent bandwidth resources from being exhausted.
You must check the Intranet In Ratio and Intranet Out Ratio metrics, which separately indicate the inbound traffic usage and outbound traffic usage.
Step 2: Optimize traffic usage
Adjust the bandwidth of the instance to reduce the impact on your business. This also provides you with more time to troubleshoot the issue. For more information, see Manually increase the bandwidth of an instance.
The amount of user traffic may not match the expected bandwidth consumption. For example, the trend of traffic usage growth and the trend of queries per second (QPS) growth are inconsistent. In this case, use the offline key analysis feature to identify large keys on the instance. For more information, see Use the offline key analysis feature.
Optimize large keys. Keys are typically classified as large keys when their size exceeds 10 KB. For example, you can split large keys, reduce access to large keys, or delete large keys that you no longer need.
For Tair (Enterprise Edition) DRAM-based instances that use the cluster architecture, enable proxy query cache to address heavy traffic or skewed requests that are caused by hotkeys. For more information, see Use the real-time key statistics feature and Use proxy query cache to address issues caused by hotkeys.
Optional: For cluster instances, connect to the instances in direct connection mode to deal with heavy network traffic. For more information, see Enable the direct connection mode.
NoteIn direct connection mode, the bandwidth limit of the instance is equal to the bandwidth limit of each data shard multiplied by the number of data shards. For example, if a cluster instance contains 128 data shards and the bandwidth limit of each data shard is 96 Mbit/s, the bandwidth limit of the cluster instance is 12,288 Mbit/s after you enable the direct connection mode.
If the traffic usage is still high after you perform the preceding optimizations, upgrade your instance to an instance type that has more memory. An upgrade improves instance performance and allows the instance to handle more traffic. For more information, see Change the configurations of an instance.
NoteBefore you upgrade your instance, you can purchase a pay-as-you-go instance to test whether the instance type to which you want to upgrade meets the requirements of your workloads. You can release the pay-as-you-go instance after you complete the test. For more information, see Release pay-as-you-go instances.