When you use Tair (Redis OSS-compatible), you may encounter performance degradation, a poor user experience, and even large-scale failures if you do not identify and handle large keys or hotkeys in a timely manner. This topic describes the causes of large keys and hotkeys, the issues that may be caused by large keys and hotkeys, and how to identify and optimize large keys and hotkeys and prevent large keys and hotkeys from affecting business.
Identify large keys and hotkeys
Alibaba Cloud self-developed tools
Tair (Redis OSS-compatible) provides the top key statistics and offline key analysis features in the console to help you quickly identify large keys and hotkeys.
Method | Limits | Description |
Use the top key statistics feature (recommended) | This feature is available only for Redis Open-Source Edition instances that run Redis 5.0 or later and Tair (Enterprise Edition) DRAM-based and persistent memory-optimized instances. |
|
The offline key analysis feature is unavailable for ESSD/SSD-based instances. |
|
If your instance cannot use the preceding features, use the following methods.
Other methods to identify large keys and hotkeys
Optimize large keys and hotkeys
Category | Optimization method | Description |
Large keys | Compress large keys | Before you save data to a cache database, you can use serialization or compression algorithms to compress the values of large keys, which helps reduce memory usage. However, if the value of a large key is still excessively large after compression, you can split the large key. |
Split large keys | For example, you can split a HASH key that contains tens of thousands of members into multiple HASH keys that each have an appropriate number of members. For cluster instances, you can split large keys to balance the memory usage across multiple data shards. | |
Delete large keys | You can store unsuitable data in other storage engines and delete the data from the instance. Note
| |
Delete expired data on a regular basis | The accumulation of expired data leads to large keys. For example, if you incrementally write a large amount of data to a HASH key and ignore the TTL of the data, the HASH key may end up as a large key. You can use scheduled tasks to delete invalid data. Note To prevent the instance from being blocked when you delete invalid hash data, we recommend that you run the HSCAN and HDEL commands. | |
Hotkeys | Replicate hotkeys for cluster instances | Requests made for a hotkey in a data shard cannot be redistributed to other data shards in the instance because the smallest unit at which a hotkey can be migrated in a Tair cluster instance is the key. This results in a constant high workload for a single data shard. In this case, you can replicate the hotkey in the data shard to generate identical keys and migrate these new keys to other data shards. For example, you can replicate a hotkey named foo in a data shard to generate three identical hotkeys named foo2, foo3, and foo4. Then, you can migrate foo2, foo3, and foo4 to other data shards to reduce the pressure on the data shard that contains foo. Note The disadvantage of this method is that you must modify the corresponding code and data inconsistency may occur because you must update multiple keys instead of one key. For this reason, we recommend that you consider this method only as a temporary solution. |
Enable read/write splitting | If hotkeys are generated from read requests, you can enable read/write splitting to reduce the read load on each data shard. If the read load remains high after you enable this feature, you can further alleviate the load by increasing the number of read replicas. Note The read/write splitting feature also has its disadvantages. If a large number of requests are sent to a read/write splitting instance, some amount of latency is unavoidable, and dirty data may be read from the instance. Therefore, read/write splitting is not the optimal solution for scenarios that have high requirements for read and write capabilities and data consistency. | |
Enable the proxy query cache feature | After you enable the proxy query cache feature,Tair (Redis OSS-compatible) uses effective sorting and statistical algorithms to identify hotkeys. Hotkeys are keys that receive more than 5,000 queries per second (QPS). Proxy nodes cache only the request and response data of a hotkey, instead of the entire key. If a proxy node receives a duplicate request within the validity period of the cached data, the proxy node directly returns the response of the request to the client without the need to interact with backend data shards. For more information, see Use proxy query cache to address issues caused by hotkeys. |
Causes of large keys and hotkeys
In Tair (Redis OSS-compatible), keys serve as the smallest unit of data distribution. Each key is stored in a specific data shard and cannot be split. Large keys and hotkeys may occur due to a variety of reasons, such as incorrect use of Tair (Redis OSS-compatible), insufficient workload planning, accumulation of invalid data, and traffic spikes.
Large keys
Incorrect use of Tair (Redis OSS-compatible): If Tair (Redis OSS-compatible) is used in an improper scenario, the size of a key may be larger than necessary. For example, if a STRING key is used to store a large binary file, the size of the key may be larger than necessary.
Insufficient workload planning: Before a feature is released, a failure to sufficiently plan for workloads can result in problems. For example, members may not be properly split between keys and some keys may have more members than required.
Accumulation of invalid data: This occurs when invalid data is not deleted on a regular basis. For example, the number of members of a HASH key constantly increases when invalid data is not cleared in a timely manner.
Code failures: Code failures occur on consumer applications that use LIST keys, which causes the members of the keys to only increase.
Hotkeys
Unexpected traffic spikes: Unexpected traffic spikes may occur for a variety of reasons, such as high product popularity, hot news, a large number of "likes" flooding in from the viewers of a livestream, or battles between multiple large teams in a game.
Issues that may be caused by large keys and hotkeys
Category | Description |
Large keys |
|
Hotkeys |
|
Methods to prevent large keys and hotkeys from affecting business
Method | Description |
Configure an alert rule | You can specify appropriate alert thresholds in the monitoring system for metrics, such as CPU utilization, memory usage, and connection usage of an instance. For example, you can specify 70% as the alert threshold for the memory usage of an instance and 20% as the alert threshold for the memory usage increase of the instance over an 1-hour period. When an alert is triggered, you can identify and optimize large keys and hotkeys as mentioned earlier to address them before they affect business. For more information, see Alert settings. |
Use Tair (Enterprise Edition) | Tair (Enterprise Edition) provides the TairHash data structure for scenarios involving large keys of the HASH type. TairHash is a HASH data type that allows TTL and version numbers to be specified for fields. TairHash, similar to Redis HASH, provides a variety of data interfaces and high processing performance. However, Redis HASH allows only TTL to be specified for keys. TairHash also allows version numbers to be specified. TairHash uses the Active Expire algorithm to check the TTL of fields and delete expired fields. This process does not increase the database response time. The appropriate use of these advanced features can significantly reduce the O&M and troubleshooting workloads associated with Redis Open-Source Edition and simplify business code. For more information, see exHash. |