By Yanquan
When using Redis, we often encounter BigKey and HotKey. If not detected and processed in time, BigKey and HotKey are likely to degrade service performance, deteriorate user experience, and even cause large-scale system failures.
We can often see the definitions of BigKey and HotKey in Guideline for Developing With Redis in companies, or in numerous articles about Redis best practices on the Internet. Even though the criteria for judging BigKey and HotKey are different, it is clear that the dimensions of judging are the same. BigKey is usually determined by the data size and number of members, while HotKey is determined by the frequency and number of requests Redis receives.
Generally, we call a key that contains large size of data or a large number of members and lists as BigKey. There are some examples to help you fully understand the characteristics of BigKey, find as below.
It should be noted that the definition of BigKey might be different according to actual use cases and business scenarios of Redis. This is to say that you should judge taking all factors into consideration. In given examples, you can see some specific numbers for the size, members, and elements in a key, which is not a common definition of a BigKey, it's only to simplify the explanation, cannot be regarded as a standard in fact.
When the workload of access to a key is significantly higher than that of other keys, we can call it a HotKey. To help you better understand what HotKeys looks like, here are some examples, please check as below.
When using Redis, BigKey and HotKey bring various problems. The most common ones are performance degradation, access timeout, access skew and data skew.
Insufficient business planning, incorrect use of Redis, accumulation of invalid data, and sudden increase in access can cause BigKey and HotKey issues. For example:
1) BigKey: Use Redis for inappropriate data types can introduce BigKey problems. For example, using String keys to store large binary files will make the values of the keys too large.
2) BigKey: Insufficient planning and design before business launch and no proper sharding policies or split plans to divide members in an individual key to multiple keys, resulting in an excessive number of members in a particular key.
3) BigKey: No regular cleanup on invalid data in HASH keys, resulting in a continuous increase of members in HASH keys, which may bring BigKey problems.
4) HotKey: Unexpected increase of access traffic owing to hot products, hot news, KOL (Key Opinion Leader) live streaming event or an online games battle that invloves a large number of players.
5) BigKey: Logical failures on business side that prevents LIST keys from being consumed, which result in an increasing number of members in the corresponding Key with no trend to decrease.
It's not that difficult to discover BigKeys and HotKeys since there are many ways and means to analyze keys in Redis and to find out "problem" keys, such as built-in functions, open source tools of Redis, and Key Analysis functions in CloudDBA which you can find on the ApsaraDB for Redis console.
Some built-in commands and tools in Redis can help us find these problem keys. If you already have a clear analysis target, for example, there are some specious keys in your mind that might be the BigKey and HotKey, run the following commands to analyze.
debug object <key_name>
(not recommended)You may choose to use the debug object command to analyze the keys. This command can analyze the key according to the incoming object (the key name) and return a variety of information, where the value of serializedlength is the serialization length of the key. You can choose to use this value to determine whether the corresponding key is a BigKey according to your determination criteria.
It should be noted that the serialization length of the key is not equal to its actual size in memory. In addition, debug object is a debugging command and costs a lot to run, it's generally regarded as a command in high risk. It blocks other requests during its its run time, in other words, the Redis is out of service until its execution is completed. The serialization length of the incoming object (key name) determines the time occupied, the larger the key is then the more time it spends. Therefore, this command is not recommended to use in production environments to analyze BigKey.
memory usage <key_name>
(not recommended)Redis has provided the MEMORY USAGE command since the version 4.0 to help you analyze the memory usage of keys. Even if its execution cost is lower than the debug object command, it still risks blocking when analyzing BigKey, because its time complexity is O(N),
strlen, hlen, scard, zcard, llen, xlen
(recommended)We recommend analyzing keys in a less risky way. Redis provides different commands to get the length or number of members of keys with different data types, as shown in the following table:
By using the preceding Redis built-in commands, we can easily and safely analyze keys without affecting online services. However, the length it results usually doesn't equal to a key's memory use, so it can only be used for reference.
If you do not have a clear target key for analysis but want to find out the BigKey in Redis, you can use redis-cli to achieve this goal by adding '--BigKeys' in the end.
[example] redis-cli -h xxx -p xxx -a xxx --bigKeys
Redis provides the BigKeys parameter to enable redis-cli to analyze all keys in the entire Redis instance in a traversal manner and return a standard summary report. It's convenient and safe, but it's also very obvious to see that the analysis results cannot be customized.
The BigKeys parameter can only output the largest keys of all six Redis data types respectively. If you want to analyze only the STRING keys or find the HASH keys with more than 10 members, then it's impossible to achive by using the BigKeys parameter. What you can turn to if customized report is really in need? There are some open source projects on GitHub that can implement the enhanced version of BigKeys parameter so that the results can be customized according to the configuration. In addition, you can write a python or other scripts with a loop for all data types, using all commands listed in the table above, combining with scan functions to realize a BigKey analysis tool at the Redis instance level.
Similarly, this solution returns a result that is not so accurate and real-time, it's only for reference.
Redis has provided the HotKeys parameter since the version 4.0 to facilitate instance-level HotKey analysis. It can return the number of times that all keys have been accessed but the prerequisite before using is to set the maxmemory-policy to 'allkeys-lfu', which is only supported after Redis 4.0 and later versions.
Every access to Redis comes from the business layer. This means that we can record, asynchronously summarize, and analyze the access to Redis by adding corresponding codes to the business layer. In this way, you can accurately analyze HotKeys in real time, but at the expense of business code simplicity.
The monitor command of Redis can print all requests in Redis according to the fact, including access time, client IP, commands, and keys. In emergency, we can execute the monitor command for a short period of time and redirect the output to the file because it causes extra cost of CPU, memory and network. After terminating the monitor command, HotKeys during this period can be found by classifying and analyzing the requests in the file.
As we mentioned before, the monitor command consumes the CPU, memory, and network resources of ApsaraDB for Redis. Therefore, for a Redis with a high workload, the monitor command may make things worse. Meanwhile, the timeliness of this asynchronous collection and analysis solution is poor, and the accuracy of the analysis depends on the execution duration of the monitor command. So, the accuracy of the result is not good enough in most online scenarios where the command cannot be executed for a long time.
The popularity of Redis enables us to easily find a large number of open source solutions to solve the current problems we are facing, such as getting accurate analysis reports without affecting online services.
The redis-rdb-tools is great if you want to accurately analyze the real memory usage of all keys in a Redis instance according to your own standards. It also can avoid disrupting online services, and you can get a concise, easy-to-understand report after the analysis finishes.
Redis rdstools allows you to perform customized analysis on RDB files of Redis. Since the analysis of RDB files works offline, it has no impact on online services. This is its biggest advantage but also its biggest disadvantage because offline analysis represents poor timeliness of analysis results. For a large RDB file, the analysis may last a long time.
For some public cloud service vendors, you can find one-click functions like 'Key Analysis Service', which allows you to analyze all the keys in the Redis instance in real time to discover the current and historical presence of BigKeys and HotKeys. In addition, it helps you know which BigKeys and HotKeys have appeared in the running timeline of Redis, so that you can have a comprehensive and accurate judgment on the running status of the entire Redis instance.
CloudDBA is an intelligent service for Alibaba Cloud database services (ApsaraDB). For ApsaraDB for Redis, it supports real-time analysis and discovery of BigKeys and HotKeys.
The Key Analysis Service is a kernel based function, which can directly discover and output information about BigKeys and HotKeys from the Redis kernel layer. It captures information from kernel which is the brain of the Redis and where all things organized, in other words, the data comes from the original source, therefore, the result is accurate and efficient. And because there is no extra calculating involved just some simple information capturing, it almost has no impact on performance. You can use this function by clicking "Key Analysis" under CloudDBA, as shown in figure 1-1:
Figure 1-1: CloudDBA on the ApsaraDB for Redis console
There are two tabs on the page of Key Analysis,'Real-time' and 'History', which allows you to analyze keys in the corresponding Redis instance in different time dimensions:
Now, we have found the problematic keys in Redis through various means, it is time to resolve to prevent potential problems in the future.
For a string type BigKey, you can consider splitting it into multiple key values. For hash or list types, consider splitting them into multiple hash or list types. But you should ensure that the number of members of each key is within a reasonable range. To avoid uneven memory space, the splitting plan of BigKeys plays a significant especially for a Redis Cluster.
Choose the right storage for your data is very important, particularly you can store data that is not suitable for Redis to other storage media and delete in Redis. As mentioned before, BigKeys may cause interruption to Redis cluster synchronization, so when deleting a key, it's better to use UNLINK command which can slowly and gradually clean up the incoming keys in a non-blocking way and it's provided since Redis 4.0. Through UNLINK, you can safely delete all kinds of BigKeys.
The sudden occurrence of a BigKey problem can often catch us off guard. Therefore, finding and dealing with it before any further problems is an important means to keep the service stable. We can monitor the system and set a reasonable memory alarm threshold for Redis to remind us that BigKeys may be generated at this time. For example, alarm will be sent when Redis memory usage exceeds 70% or Redis memory growth rate exceeds 20% in one hour.
Through monitoring, we can solve the problem before it occurs. For example, when the failure of the consumer program of LIST causes the number of members in a list to grow continuously and no trend to decrease, we can turn alarms into warnings so the owner of the module may notice and fix the problem in time, by which we can avoid failures.
For example, when we incrementally write a large amount of data to a HASH key, ignoring the timeliness of these data. These accumulated invalid data will cause the generation of a BigKey. You can clean up the invalid data by using scheduled tasks. HSCAN and HDEL are recommended to use, by which invalid data can be cleaned up without blocking.
You may have too many HASH keys, and a large number of invalid members needs to be cleaned up. For this scenario, scheduled tasks can no longer clean up invalid data in a timely manner. However, you can use Alibaba Cloud's Tair to solve such problems well.
Tair is the Enterprise Edition of Alibaba Cloud ApsaraDB for Redis. It provides a large number of additional advanced features except for all the features of ApsaraDB for Redis, including the high-performance features.
TairHash is a hash data type that allows you to set the expiration time and version for a field. Not only does It support a wide range of data interfaces and provide high processing performance like Redis Hash, but it also breaks the hash restriction that a hash key can only have expiration time in addtion to its value, and apparently it supports both expiration time and version setting for a TarHash key. This greatly improves the flexibility of the hash data and simplifies business code in many scenarios.
TairHash uses the efficient Active Expire algorithm to achieve the function of discovering and deleting expired data with almost no impact on the response time. The reasonable use of such advanced functions can liberate a large number of Redis O&M and fault handling work and reduce the complexity of the business code. By doing so, O&M personnel can devote their effort to other more valuable work, and R&D personnel can have more time to focus on business code.
As mentioned before, the minimum migration granularity is key, when the memory usage of a particular node in a Redis Cluster increased to a high level because of a HotKey, it can not be back to normal unless you can split the HotKey into pieces and distribute to other nodes. If you are facing such scenario, you can copy the corresponding HotKeys and migrate them to other nodes. For example, you can copy three keys with exactly the same content of the HotKey 'foo' and name them 'foo2', 'foo3', and 'foo4'. Then, migrate these three copies to other nodes to share the workload of a single node.
However the disadvantage of this solution is obvious that the code needs to be modified by linkage and key copy brings data consistency challenges. That is, compared with updating one key, you need to update multiple keys at the same time now. In many cases, this is recognized as a temporary solution for some difficult environment.
If HotKeys are generated from read requests, read/write splitting is a good solution. When using the read/write splitting architecture, the read request workload on each Redis node can be decreased by continuously adding secondary nodes.
However, this solution increases the complexity of business code and the complexity of the Redis cluster architecture. For example, for Redis layer, on the one hand, forwarding layers (such as Proxy and LVS) is needed now for multiple secondary nodes to implement load balancing. On the other hand, considering with high availability of the service, increased node failure rate and node tolerance should be taken into consideration when adding more secondary nodes. The changes of the Redis cluster architecture bring greater challenges to monitoring, O&M, and failure processing.
However, all these are extremely simple in Alibaba Cloud ApsaraDB for Redis which provides out-of-box services. At the same time, when business grows, Alibaba Cloud's ApsaraDB for Redis allows users to adjust the cluster architecture by changing configurations. For example, the primary/secondary architecture can be changed to read/write splitting architecture, the read/write splitting architecture can be changed to a cluster, the primary/secondary architecture can be changed to a cluster that supports read/write splitting, and Redis Community Edition can be directly changed to Redis Enterprise Edition (Tair) that supports a large number of advanced features.
The read/write splitting architecture can be used for dealing with HotKey issues but it also has shortcomings. In scenarios with a large number of requests, the read/write splitting architecture will incur an inevitable delay, that's where dirty data may be read. Therefore, the read/write splitting architecture is not appropriate in scenarios where both read and write workload is high and high data consistency is required.
Proxy Query Cache is one of the enterprise-level features of Alibaba Cloud Tair (Redis Enterprise Edition). Its principle is shown in figure 2-1:
Figure 2-1: Tair QueryCache principle
Alibaba Cloud ApsaraDB for Redis identifies hotkeys in instances based on efficient sorting and statistical algorithms. After enabling this function, Proxy servers will cache requests and corresponding query results based on the rules you set. Proxy servers cache only request and response data of a HotKey, instead of the entire key. If a proxy server receives a duplicate request within the validity period, the proxy server directly returns a response to the client without the need to interact with backend data shards. This improves the read speed and reduces the performance impact of HotKeys on data shards to avoid request skew.
So far, the same request from the client no longer needs to interact with the Redis behind Proxy. Instead, Proxy returns the data directly. The access request to the HotKey is processed by multiple proxy servers instead of a single Redis node, greatly reducing the HotKey workload on one particular Redis node. Moreover, the Proxy Query Cache function of Tair also provides a large number of commands to facilitate users to view and manage HotKeys. For example, run the querycache keys command to view all cached HotKeys, and run querycache listall to obtain all cached commands.
Combining Key Analysis Service in CloudDBA and Tair Proxy Query Cache feature, using ApsaraDB for Redis can reduce your cost of O&M and improve R&D efficiency.
For HotKeys with frequent write instead of frequent read access over and over, then traditional Redis read/write splitting and cluster structures have nothing to do for resolving this kind of HotKey issues. What you need now is Global Distributed Cache feature which can extend the write capability of Redis by parallel writes upon a same key on distributed nodes, reducing the write workload to a particular node.
Global Distributed Cache for Redis of Alibaba Cloud, also known as Global Replica, is an active geo-redundancy database system that is developed based on ApsaraDB for Redis. It is ideal for business scenarios in which multiple sites in different regions provide services at the same time. It helps enterprises quickly realize similar active geo-redundancy architecture to Alibaba. Global Distributed Cache for Redis solves the cross-region and cross-country active problems. In addition, as it supports distributed write on up to three nodes, it can be used to improve write performance as well at up to three times. The architecture of Global Distributed Cache for Redis is shown in figure 2-2:
Figure 2-2: Architecture of Global Distributed Cache for Redis
Compared with traditional Redis synchronization middleware, Global Distributed Cache for Redis features high reliability, high throughput, low latency, and high synchronization accuracy.
Use Case | Precision Marketing with Low-Cost: Game Publisher Best Practices
Alibaba Cloud Community - July 13, 2023
Alibaba Cloud Community - July 13, 2023
Alibaba Clouder - February 11, 2019
Alibaba Cloud Native Community - April 22, 2021
Alibaba Clouder - May 17, 2019
Alibaba Cloud Native Community - August 30, 2022
DBStack is an all-in-one database management platform provided by Alibaba Cloud.
Learn MoreProtect, backup, and restore your data assets on the cloud with Alibaba Cloud database services.
Learn MoreLeverage cloud-native database solutions dedicated for FinTech.
Learn MoreMigrate your legacy Oracle databases to Alibaba Cloud to save on long-term costs and take advantage of improved scalability, reliability, robust security, high performance, and cloud-native features.
Learn MoreMore Posts by ApsaraDB