Large keys and hotkeys can lead to degraded service performance, poor user experience, or even system failures. This topic explains how to quickly identify and optimize large keys and hotkeys, analyze their causes and potential issues, and provide preventive measures to mitigate their impact on business operations.
Quickly identify large keys and hotkeys
Alibaba Cloud self-developed tools
Tair and Redis offer Top Key statistics and offline full key analysis features in the console to assist in quickly identifying large keys and hotkeys.
Method | Limits | Description |
Top Key statistics (recommended) | Only Redis open source edition 5.0 and later versions and Tair (enterprise edition)memory type, persistent memory type support this feature. |
|
disk type instances do not support this feature. |
|
If your instance cannot use the above features, consider the following methods.
Other methods to identify large keys and hotkeys
Optimize large keys and hotkeys
Category | Handling method | Description |
Large key | Compress large keys | It is recommended to reduce the storage space of large keys by using serialization or compression algorithms before data is written to the cache. If the key is still too large after compression, you can further split the key. |
Split large keys | For example, you can split a HASH key that contains tens of thousands of members into multiple HASH keys that each have an appropriate number of members. Splitting large keys can effectively prevent data skew. | |
Delete large keys | You can store unsuitable data in other storage engines and delete the data from the instance. Note
| |
Delete expired data on a regular basis | The accumulation of a large amount of expired data may lead to the generation of large keys. For example, in the HASH data type, a large amount of data may be continuously written in incremental form because the data timeliness is ignored. You can use scheduled tasks to delete invalid data. Note When you clear HASH data, it is recommended to use the HSCAN command together with the HDEL command to delete invalid data to avoid blocking the instance by deleting a large amount of data. | |
Hot key | Replicate hotkeys for cluster instances | Because a hotkey is stored as a whole in a single shard, requests cannot be distributed by migrating part of the data. As a result, the pressure on a single data shard cannot be reduced. In this case, you can replicate the corresponding hotkey and migrate it to other data shards. For example, you can replicate the hotkey foo into three identical keys named foo2, foo3, and foo4, and migrate these keys to other data shards to alleviate the pressure on a single data shard caused by the hotkey. Note The disadvantage of this solution is that you need to modify the code to maintain multiple replicas, and it is difficult to ensure data consistency among multiple replicas. For example, update operations need to be synchronized to all replicas. It is recommended to use this solution as a temporary solution to alleviate urgent issues. |
Enable read/write splitting | If a hotkey is caused by read requests, you can enable read/write splitting to reduce the read request load on each data shard. If the read request load is still high after read/write splitting is enabled, you can further alleviate the read request load by increasing the number of read-only nodes. Note Read/write splitting also has disadvantages. In scenarios with extremely high request volumes, read/write splitting may cause unavoidable latency, which may result in dirty data being read. Therefore, read/write splitting is not recommended for scenarios with high read and write pressure and high requirements for data consistency. | |
Enable the proxy query cache feature | After this feature is enabled, Tair and Redis identify hotkeys (usually hotkeys with QPS greater than 5,000) based on algorithms. The proxy node caches the requests and query results of hotkeys (only the query results of point keys are cached, and the entire key does not need to be cached). If a proxy node receives a duplicate request within the validity period of the cached data, the proxy server directly returns the response of the request to the client without the need to interact with backend data shards. For more information, see optimize hotkey issues by using the proxy query cache. |
Causes of large keys and hotkeys
Tair and Redis have a minimum data distribution granularity of keys. Each key is stored in a specific data shard and cannot be split. Insufficient workload planning, accumulation of invalid data, and unexpected traffic spikes may cause large keys and hotkeys in an instance, such as:
Large key
Inappropriate use of Tair and Redis may result in excessively large keyvalues. For instance, using a string key to store large binary files.
Lack of workload planning prior to releasing a feature can lead to some keys having more members than necessary.
Accumulation of invalid data: Not regularly deleting invalid data can cause the number of members for a HASH key to continually increase.
Code failures in consumer applications using LIST keys can result in an ever-increasing number of members.
Hot key
Unexpected traffic spikes can occur for various reasons, such as viral marketing, a surge of "likes" from a livestream audience, or a large-scale event in a game.
Potential issues caused by large keys and hotkeys
Category | Description |
Large key |
|
Hot key |
|
How to prevent large keys and hotkeys from affecting business
Method | Description |
Configure an alert rule | You can specify appropriate alert thresholds in the monitoring system for metrics, such as CPU utilization, memory usage, and connection usage of an instance. For example, you can specify 70% as the alert threshold for the memory usage of an instance and 20% as the alert threshold for the memory usage increase of the instance over an 1-hour period. When an alert is triggered, you can identify and optimize large keys and hotkeys as mentioned earlier to address them before they affect business. For more information, see alert settings. |
Use Tair (enterprise edition) to avoid clearing invalid data | For large key scenarios of the hash type, Tair (enterprise edition) provides an enhanced data structure TairHash. It supports setting expiration time and version for each field, breaking the limitation of Redis Hash that only allows setting expiration time for the entire key. Meanwhile, TairHash uses an efficient active expire algorithm to complete the expiration judgment and deletion of fields with almost no impact on response time. By using TairHash properly, you can significantly reduce the maintenance burden, simplify the complexity of business code, and effectively address the issues caused by large keys and hotkeys. |