×
Community Blog Design and Implementation of Redis 7.0 Multi-Part AOF

Design and Implementation of Redis 7.0 Multi-Part AOF

This article focuses on the AOF persistence and some existing problems and discusses the Redis 7.0 Multi-Part AOF.

Redis is a popular in-memory database. ApsaraDB for Redis provides high read/write performance by storing data in memory. However, once the process exits, all Redis data is lost.

Redis provides two persistence solutions to solve this problem: RDB and AOF to save data in the memory to disk to avoid data loss. This article focuses on the AOF persistence and some existing problems and discusses the Redis 7.0 (published RC1) Multi-Part AOF (MP-AOF) design and implementation details. This feature is contributed by Alibaba Cloud Database Tair Team.

AOF

Append only file (AOF) persistence records each write command as a separate log file and replays the commands in the AOF file when Redis is started to restore data.

AOF records each Redis write command in an appended manner. Therefore, as Redis processes more write commands, AOF files will become larger, and the playback time of commands will also increase. Redis introduces the AOF rewrite mechanism (AOFRW) to solve this problem. AOFRW will remove redundant write commands in AOF and rewrite and generate a new AOF file equivalently to reduce the size of the AOF file.

AOFRW

Figure 1 shows the implementation principle of AOFRW. When AOFRW is triggered, Redis first forks a child process to perform background rewriting. This operation rewrites all the data snapshots of Redis at the moment when the fork is executed into a temporary AOF file named temp-rewriteaof-bg-pid.aof.

Since the rewrite operation is performed in the background of the child process, the main process can still respond to user commands normally during the AOF rewrite. Therefore, the main process will write the executed write command to aof_buf and write a copy to the aof_rewrite_buf for caching so the child process can finally obtain the incremental changes generated by the main process during rewriting. the main process will send the accumulated data in the aof_rewrite_buf to the subprocess using a pipe in the later stage of subprocess rewriting. The subprocess will append the data to the temporary AOF file.

When the main process receives a large amount of write traffic, a large amount of data may accumulate in the aof_rewrite_buf. As a result, the child process cannot consume all the data in the aof_rewrite_buf during rewriting. At this point, aof_rewrite_buf the remaining data will be processed by the main process at the end of the rewrite.

When the child process completes the rewrite operation and exits, the main process will handle subsequent things in the backgroundRewriteDoneHandler. First, the unconsumed data in the aof_rewrite_buf during the rewrite is appended to the temporary AOF file. Second, when everything is ready, Redis will rename the temporary AOF file atom to server.aof_filename. In this case, the original AOF file will be overwritten. At this point, the entire AOFRW process ends.

1
Figure 1: AOFRW Implementation Principle

Problems with AOFRW

Memory Overhead

As shown in Figure 1, during AOFRW, the main process will write the data changes after fork into the aof_rewrite_buf, and the contents in aof_rewrite_buf and aof_buf are mostly duplicated, so this will bring additional memory redundancy overhead.

In the aof_rewrite_buffer_length field in Redis INFO, you can view the memory size occupied by the aof_rewrite_buf at the current time. As shown below, under high write traffic, aof_rewrite_buffer_length takes up almost as much memory space as aof_buffer_length, and almost twice as much memory is wasted.

aof_pending_rewrite:0
aof_buffer_length:35500
aof_rewrite_buffer_length:34000
aof_pending_bio_fsync:0

When the memory occupied by the aof_rewrite_buf exceeds a certain threshold, we will see the following information in the Redis log. The aof_rewrite_buf takes up 100MB of memory space and transfers 2135MB of data between the main process and the child process. (The child process also has the memory overhead of the internal read buffer when reading these data through a pipe.) This is a significant overhead for the memory-based database Redis.

3351:M 25 Jan 2022 09:55:39.655 * Background append only file rewriting started by pid 6817
3351:M 25 Jan 2022 09:57:51.864 * AOF rewrite child asks to stop sending diffs.
6817:C 25 Jan 2022 09:57:51.864 * Parent agreed to stop sending diffs. Finalizing AOF...
6817:C 25 Jan 2022 09:57:51.864 * Concatenating 2135.60 MB of AOF diff received from parent.
3351:M 25 Jan 2022 09:57:56.545 * Background AOF buffer size: 100 MB

The memory overhead caused by AOFRW may cause the memory of Redis to reach the maxmemory limit suddenly, which affects the normal writing of commands. It may even trigger the operating system limit to be killed by OOM Killer, which makes Redis unserviceable.

Total CPU Overhead

There are three main aspects of CPU overhead, which are explained below:

1.  During AOFRW, the main process takes CPU time to write data to the aof_rewrite_buf and uses the eventloop event loop to send the aof_rewrite_buf data to the child process:

/* Append data to the AOF rewrite buffer, allocating new blocks if needed. */
void aofRewriteBufferAppend(unsigned char *s, unsigned long len) {
    // Other details are omitted here...
  
    /* Install a file event to send data to the rewrite child if there is
     * not one already. */
    if (!server.aof_stop_sending_diff &&
        aeGetFileEvents(server.el,server.aof_pipe_write_data_to_child) == 0)
    {
        aeCreateFileEvent(server.el, server.aof_pipe_write_data_to_child,
            AE_WRITABLE, aofChildWriteDiffData, NULL);
    } 
  
    // Other details are omitted here...
}

2.  Later, when the child process performs the rewrite operation, the incremental data sent by the main process in the pipe is cyclically read and then appended to the temporary AOF file:

int rewriteAppendOnlyFile(char *filename) {
      // Other details are omitted here...
  
    /* Read again a few times to get more data from the parent.
     * We can't read forever (the server may receive data from clients
     * faster than it is able to send data to the child), so we try to read
     * some more data in a loop as soon as there is a good chance more data
     * will come. If it looks like we are wasting time, we abort (this
     * happens after 20 ms without new data). */
    int nodata = 0;
    mstime_t start = mstime();
    while(mstime()-start < 1000 && nodata < 20) {
        if (aeWait(server.aof_pipe_read_data_from_parent, AE_READABLE, 1) <= 0)
        {
            nodata++;
            continue;
        }
        nodata = 0; /* Start counting from zero, we stop on N *contiguous*
                       timeouts. */
        aofReadDiffFromParent();
    }
    // Other details are omitted here...
}

3.  After the child process completes the rewrite operation, the main process will finish the backgroundRewriteDoneHandler. One of the tasks is to write data that was not consumed in the aof_rewrite_buf during the rewrite to a temporary AOF file. If there is a lot of data left in the aof_rewrite_buf, it will also consume CPU time.

void backgroundRewriteDoneHandler(int exitcode, int bysignal) {
    // Other details are omitted here...
  
    /* Flush the differences accumulated by the parent to the rewritten AOF. */
    if (aofRewriteBufferWrite(newfd) == -1) {
        serverLog(LL_WARNING,
                "Error trying to flush the parent diff to the rewritten AOF: %s", strerror(errno));
        close(newfd);
        goto cleanup;
     }
    
     // Other details are omitted here...
}

The CPU overhead caused by AOFRW may cause RT jitter when Redis executes commands and could even cause client timeout.

Disk IO Overhead

As mentioned earlier, the main process will write the executed write command to aof_buf and a copy to the aof_rewrite_buf during AOFRW. The data in aof_buf is eventually written to the old AOF file that is currently in use, resulting in disk IO. At the same time, the data in the aof_rewrite_buf will be written to the new AOF file generated by the rewrite, resulting in disk IO. Therefore, the same data will generate disk IO twice.

Code Complexity

Redis uses the six pipes shown below for Data Transmission Service (DTS) and control interaction between the main process and child processes, which makes the entire AOFRW logic more complex and difficult to understand.

/* AOF pipes used to communicate between parent and child during rewrite. */
 int aof_pipe_write_data_to_child;
 int aof_pipe_read_data_from_parent;
 int aof_pipe_write_ack_to_parent;
 int aof_pipe_read_ack_from_child;
 int aof_pipe_write_ack_to_child;
 int aof_pipe_read_ack_from_parent;

MP-AOF Implementation

Overview of Configuration Methods

As the name implies, MP-AOF splits the original single AOF file into multiple AOF files. In the MP-AOF, we divide AOF into three types:

  • BASE: It indicates the base AOF, which is generally generated by the child process through rewriting. There is only one file at most.
  • INCR: It indicates incremental AOF, which is generally created when AOFRW starts to execute. Multiple files may exist.
  • HISTORY: It indicates historical AOF, which is changed from BASE and INCR AOF. Each time the AOFRW is completed, the BASE and INCR AOF before the current AOFRW are changed to HISTORY, and the HISTORY AOF is automatically deleted by Redis.

We have introduced a manifest file to track and manage these AOF files. At the same time, we put all AOF files and manifest files into a separate file directory to facilitate AOF backup and copy. The directory name is determined by the appenddirname configuration (Redis 7.0 adds configuration items).

2
Figure 2: MP-AOF Rewrite Principle

Figure 2 shows the general flow of performing AOFRW once in an MP-AOF. In the beginning, we will still fork a child process for rewriting operation. In the main process, we will open a new INCR-type AOF file at the same time. During the child process rewriting operation, all data changes will be written into this newly opened INCR AOF. The rewrite operation of the child process is completely independent. There will be no data and control interaction with the main process during the rewrite period. The final rewrite operation will generate a BASE AOF. The newly generated BASE AOF and the newly opened INCR AOF represent all the data of Redis at the current time. When AOFRW ends, the main process is responsible for updating the manifest file, adding the newly generated BASE AOF and INCR AOF information, and marking the previous BASE AOF and INCR AOF as HISTORY. (These HISTORY AOF will be asynchronously deleted by Redis.) Once the manifest file is updated, it marks the end of the entire AOFRW process.

As shown in Figure 2, we no longer need aof_rewrite_buf during AOFRW, so the corresponding memory consumption is removed. At the same time, there is no longer a Data Transmission Service (DTS) and control interaction between the main process and the child processes, so the corresponding CPU overhead is also removed. Correspondingly, the six pipes and their corresponding codes mentioned above are all deleted, making the AOFRW logic simpler and clearer.

Key Implementations

Manifest

Representation in Memory

The MP-AOF is strongly dependent on the manifest file. The manifest is represented in memory as the following structure:

  • aofInfo: It indicates the information of an AOF file, which only includes the file name, file serial number, and file type.
  • base_aof_info: It indicates BASE AOF information. If BASE AOF does not exist, this field is NULL.
  • incr_aof_list: It is used to store the information of all INCR AOF files. All INCR AOF files will be discharged according to the file opening order.
  • history_aof_list: It is used to store HISTORY AOF information. The elements in the history_aof_list are moved from base_aof_info and incr_aof_list.
typedef struct {
    sds           file_name;  /* file name */
    long long     file_seq;   /* file sequence */
    aof_file_type file_type;  /* file type */
} aofInfo;
typedef struct {
    aofInfo     *base_aof_info;       /* BASE file information. NULL if there is no BASE file. */
    list        *incr_aof_list;       /* INCR AOFs list. We may have multiple INCR AOF when rewrite fails. */
    list        *history_aof_list;    /* HISTORY AOF list. When the AOFRW success, The aofInfo contained in
                                         `base_aof_info` and `incr_aof_list` will be moved to this list. We
                                         will delete these AOF files when AOFRW finish. */
    long long   curr_base_file_seq;   /* The sequence number used by the current BASE file. */
    long long   curr_incr_file_seq;   /* The sequence number used by the current INCR file. */
    int         dirty;                /* 1 Indicates that the aofManifest in the memory is inconsistent with
                                         disk, we need to persist it immediately. */
} aofManifest;

We use pointers to reference aofManifest in the redisServer structure to facilitate atomic modification and rollback operations.

struct redisServer {
    // Other details are omitted here...
    aofManifest *aof_manifest;       /* Used to track AOFs. */
    // Other details are omitted here...
}

Representation on Disk

Manifest is essentially a text file containing multiple lines of records. Each line of records corresponds to an AOF file information, which is displayed by key/value pairs and is convenient for Redis processing and easy to read and modify. The following is the content of a possible manifest file:

file appendonly.aof.1.base.rdb seq 1 type b
file appendonly.aof.1.incr.aof seq 1 type i
file appendonly.aof.2.incr.aof seq 2 type i

The manifest format itself needs to have certain extensibility for later add or support of other functions. For example, you can easily add key/value and annotations (similar to the annotations in AOF) to ensure good forward compatibility.

file appendonly.aof.1.base.rdb seq 1 type b newkey newvalue
file appendonly.aof.1.incr.aof type i seq 1 
# this is annotations
seq 2 type i file appendonly.aof.2.incr.aof

Naming Conventions

Before MP-AOF, the file name of AOF is the setting value of the appendfilename parameter (default is appendonly.aof).

We can use basename.suffix to name multiple AOF files in the MP-AOF. The configuration content of appendfilename is used as the basename part. Suffix consists of three parts, and format is seq.type.format:

  • Seq is the serial number of the file, which increases monotonically from 1. BASE and INCR have separate file serial numbers.
  • Type is the type of AOF, indicating whether this AOF file is BASE or INCR.
  • Format is used to indicate the encoding method of this AOF. Redis supports the RDB prefix mechanism. Therefore, BASE AOF may be encoded in the RDB or AOF format:
#define BASE_FILE_SUFFIX           ".base"
#define INCR_FILE_SUFFIX           ".incr"
#define RDB_FORMAT_SUFFIX          ".rdb"
#define AOF_FORMAT_SUFFIX          ".aof"
#define MANIFEST_NAME_SUFFIX       ".manifest"

Therefore, when the appendfilename default configuration is used, the possible names of the BASE, INCR, and manifest files are:

appendonly.aof.1.base.rdb // Enable RDB preamble
appendonly.aof.1.base.aof // Disable RDB preamble
appendonly.aof.1.incr.aof
appendonly.aof.2.incr.aof

Compatible with Older Version Upgrades

Since the MP-AOF is strongly dependent on the manifest file, the corresponding AOF file will be loaded strictly according to the instructions of the manifest when Redis is started. However, when upgrading from old versions of Redis (referring to versions before Redis 7.0) to Redis 7.0, since there is no manifest file at this time, it is necessary to make Redis recognize that it is an upgrade process and load the old AOF correctly and safely.

The recognition capability is the first step in this important process. Before loading the AOF file, it will be checked whether there is an AOF file named server.aof_filename in the Redis working directory. If it exists, it means we may be upgrading from an old version of Redis. Next, we continue to judge that when one of the following three situations is met, we will think this is an upgrade start:

  1. The appenddirname directory does not exist.
  2. The appenddirname directory exists, there is no corresponding manifest file in the directory.
  3. The appenddirname directory exists. The manifest file exists in the directory and the manifest file only contains BASE AOF-related information. The BASE AOF has the same name with server.aof_filename and the appenddirname directory does not contain a file named server.aof_filename.
/* Load the AOF files according the aofManifest pointed by am. */
int loadAppendOnlyFiles(aofManifest *am) {
    // Other details are omitted here...
  
    /* If the 'server.aof_filename' file exists in dir, we may be starting
     * from an old redis version. We will use enter upgrade mode in three situations.
     *
     * 1. If the 'server.aof_dirname' directory not exist
     * 2. If the 'server.aof_dirname' directory exists but the manifest file is missing
     * 3. If the 'server.aof_dirname' directory exists and the manifest file it contains
     *    has only one base AOF record, and the file name of this base AOF is 'server.aof_filename',
     *    and the 'server.aof_filename' file not exist in 'server.aof_dirname' directory
     * */
    if (fileExist(server.aof_filename)) {
        if (!dirExists(server.aof_dirname) ||
            (am->base_aof_info == NULL && listLength(am->incr_aof_list) == 0) ||
            (am->base_aof_info != NULL && listLength(am->incr_aof_list) == 0 &&
             !strcmp(am->base_aof_info->file_name, server.aof_filename) && !aofFileExist(server.aof_filename)))
        {
            aofUpgradePrepare(am);
        }
    }
  
    // Other details are omitted here...
  }

Once it is recognized that this is an upgrade start, we will use the aofUpgradePrepare function to prepare for the upgrade.

The upgrade preparation work is mainly divided into three parts:

  1. Use server.aof_filename as the file name to construct a BASE AOF information
  2. Persist this BASE AOF information to the manifest file
  3. Use rename to move old AOF files to the appenddirname directory
void aofUpgradePrepare(aofManifest *am) {
    // Other details are omitted here...
  
    /* 1. Manually construct a BASE type aofInfo and add it to aofManifest. */
    if (am->base_aof_info) aofInfoFree(am->base_aof_info);
    aofInfo *ai = aofInfoCreate();
    ai->file_name = sdsnew(server.aof_filename);
    ai->file_seq = 1;
    ai->file_type = AOF_FILE_TYPE_BASE;
    am->base_aof_info = ai;
    am->curr_base_file_seq = 1;
    am->dirty = 1;
    /* 2. Persist the manifest file to AOF directory. */
    if (persistAofManifest(am) != C_OK) {
        exit(1);
    }
    /* 3. Move the old AOF file to AOF directory. */
    sds aof_filepath = makePath(server.aof_dirname, server.aof_filename);
    if (rename(server.aof_filename, aof_filepath) == -1) {
        sdsfree(aof_filepath);
        exit(1);;
    }
  
    // Other details are omitted here...
}

The upgrade preparation operation is Crash Safety. If a crash occurs in any of the three steps above, we can correctly identify and retry the entire upgrade operation in the next startup.

Multi-File Loading and Progress Calculation

When ApsaraDB for Redis loads AOF, ApsaraDB for Redis records the loading progress and displays it using the INFO loading_loaded_perc field. In the MP-AOF, the loadAppendOnlyFiles function loads the AOF file according to the incoming aofManifest. Before loading, we need to calculate the total size of the AOF file to be loaded in advance, pass it to the startLoading function, and then continuously report the loading progress in the loadSingleAppendOnlyFile.

Next, loadAppendOnlyFiles will load BASE AOF and INCR AOF in sequence according to aofManifest. After all AOF files are loaded, the stopLoading will be used to end the loading state.

int loadAppendOnlyFiles(aofManifest *am) {
    // Other details are omitted here...
    /* Here we calculate the total size of all BASE and INCR files in
     * advance, it will be set to `server.loading_total_bytes`. */
    total_size = getBaseAndIncrAppendOnlyFilesSize(am);
    startLoading(total_size, RDBFLAGS_AOF_PREAMBLE, 0);
    /* Load BASE AOF if needed. */
    if (am->base_aof_info) {
        aof_name = (char*)am->base_aof_info->file_name;
        updateLoadingFileName(aof_name);
        loadSingleAppendOnlyFile(aof_name);
    }
    /* Load INCR AOFs if needed. */
    if (listLength(am->incr_aof_list)) {
        listNode *ln;
        listIter li;
        listRewind(am->incr_aof_list, &li);
        while ((ln = listNext(&li)) != NULL) {
            aofInfo *ai = (aofInfo*)ln->value;
            aof_name = (char*)ai->file_name;
            updateLoadingFileName(aof_name);
            loadSingleAppendOnlyFile(aof_name);
        }
    }
  
    server.aof_current_size = total_size;
    server.aof_rewrite_base_size = server.aof_current_size;
    server.aof_fsync_offset = server.aof_current_size;
    stopLoading();
    
    // Other details are omitted here...
}

AOFRW Crash Safety

When the child process completes the rewrite operation, the child process will create a temporary AOF file named temp-rewriteaof-bg-pid.aof. The file is still invisible to Redis because it has not been added to the manifest file. We also need to rename it according to the naming rules mentioned above and add its information to the manifest file to make it recognized by Redis and load correctly when Redis starts.

Although AOF file rename and manifest file modification are two independent operations, we must ensure the atomicity of these two operations so Redis can load the corresponding AOF correctly at startup. MP-AOF uses two designs to solve this problem:

  1. The name of the BASE AOF contains the serial number of the file. This ensures that the created BASE AOF does not conflict with the previous BASE AOF.
  2. Execute the rename operation of AOF before modifying the manifest file.

To make it clear, let's assume the manifest file contains the following content before AOFRW starts:

file appendonly.aof.1.base.rdb seq 1 type b
file appendonly.aof.1.incr.aof seq 1 type i

After AOFRW starts to execute the manifest file, the content is:

file appendonly.aof.1.base.rdb seq 1 type b
file appendonly.aof.1.incr.aof seq 1 type i
file appendonly.aof.2.incr.aof seq 2 type i

After the subprocess is rewritten, we will rename the temp-rewriteaof-bg-pid.aof to appendonly.aof. 2.base.rdb and add it to the manifest in the main process. At the same time, the previous BASE and INCR AOF are marked as HISTORY. In this case, the content of the manifest file is listed below:

file appendonly.aof.2.base.rdb seq 2 type b
file appendonly.aof.1.base.rdb seq 1 type h
file appendonly.aof.1.incr.aof seq 1 type h
file appendonly.aof.2.incr.aof seq 2 type i

In this case, the result of this AOFRW is visible to Redis, and HISTORY AOF is asynchronously cleared by Redis.

The backgroundRewriteDoneHandler function implements the preceding logic in seven steps:

  1. Before modifying the server.aof_manifest in memory, dup a temporary manifest structure, and the following modifications will be made for this temporary manifest. The advantage of this is that once the later steps fail, we can simply destroy the temporary manifest to roll back the entire operation and avoid polluting the server.aof_manifest global data structure.
  2. Obtain the new BASE AOF file name (marked as new_base_filename) from the temporary manifest and mark the previous (if any) BASE AOF as HISTORY
  3. Rename the temp-rewriteaof-bg-pid.aof temporary file generated by the child process to new_base_filename
  4. Mark all the last INCR AOF in the temporary manifest structure as HISTORY
  5. Persist the information corresponding to the temporary manifest to the disk. (The persistAofManifest will ensure the atomicity of the manifest itself modification.)
  6. If the steps above are successful, we can safely point to the server.aof_manifest pointer in memory to the temporary manifest structure (and release the previous manifest structure), so the entire modification is visible to Redis.
  7. Clean up AOF that is HISTORY type. This step allows failure because it does not cause data consistency issues.
void backgroundRewriteDoneHandler(int exitcode, int bysignal) {
    snprintf(tmpfile, 256, "temp-rewriteaof-bg-%d.aof",
        (int)server.child_pid);
    /* 1. Dup a temporary aof_manifest for subsequent modifications. */
    temp_am = aofManifestDup(server.aof_manifest);
    /* 2. Get a new BASE file name and mark the previous (if we have)
     * as the HISTORY type. */
    new_base_filename = getNewBaseFileNameAndMarkPreAsHistory(temp_am);
    /* 3. Rename the temporary aof file to 'new_base_filename'. */
    if (rename(tmpfile, new_base_filename) == -1) {
        aofManifestFree(temp_am);
        goto cleanup;
    }
    /* 4. Change the AOF file type in 'incr_aof_list' from AOF_FILE_TYPE_INCR
     * to AOF_FILE_TYPE_HIST, and move them to the 'history_aof_list'. */
    markRewrittenIncrAofAsHistory(temp_am);
    /* 5. Persist our modifications. */
    if (persistAofManifest(temp_am) == C_ERR) {
        bg_unlink(new_base_filename);
        aofManifestFree(temp_am);
        goto cleanup;
    }
    /* 6. We can safely let `server.aof_manifest` point to 'temp_am' and free the previous one. */
    aofManifestFreeAndUpdate(temp_am);
    /* 7. We don't care about the return value of `aofDelHistoryFiles`, because the history
     * deletion failure will not cause any problems. */
    aofDelHistoryFiles();
}

Support for AOF Truncate

When a process crashes, AOF files are likely to be incomplete. For example, if only MULTI is written in a transaction, Redis crashes before EXEC is written. Redis cannot load this incomplete AOF by default, but Redis supports the AOF truncate feature (enabled through aof-load-truncated configuration). The principle is to use server.aof_current_size to track the last correct file offset of AOF and then use the ftruncate function to delete all the file contents after the offset. This way, some data may be lost, but the integrity of AOF can be guaranteed.

In MP-AOF, the server.aof_current_size no longer represents the size of a single AOF file but the total size of all AOF files. As only the last INCR AOF may have incomplete writing problems, we have introduced a separate field server.aof_last_incr_size to track the size of the last INCR AOF file. When the last INCR AOF is incompletely written, we only need to delete the contents of the file after the server.aof_last_incr_size.

if (ftruncate(server.aof_fd, server.aof_last_incr_size) == -1) {
      // Other details are omitted here...
 }

AOFRW Throttling

Redis can automatically execute AOFRW when the AOF size exceeds a certain threshold. When a disk failure or a code bug is triggered, Redis will repeatedly execute AOFRW until it succeeds. Before the MP-AOF appeared, this was not a big problem (it consumed some CPU time and fork overhead at best). However, in MP-AOF, each AOFRW will open an INCR AOF, and only when AOFRW succeeds will the previous INCR and BASE be converted to HISTORY and deleted. Therefore, successive AOFRW failures are bound to lead to the co-existence of multiple INCR AOF. In extreme cases, if AOFRW retries frequently, we will see hundreds of INCR AOF files.

Therefore, we introduce an AOFRW throttling mechanism. When the AOFRW has failed three times in a row, the next AOFRW will be forcibly delayed by one minute. If the next AOFRW still fails, the next AOFRW will be delayed by two minutes (and then four minutes, eight minutes, etc.) The current maximum delay time is one hour.

We can still run the bgrewriteaof command during AOFRW throttling to execute AOFRW immediately.

if (server.aof_state == AOF_ON &&
    !hasActiveChildProcess() &&
    server.aof_rewrite_perc &&
    server.aof_current_size > server.aof_rewrite_min_size &&
    !aofRewriteLimited())
{
    long long base = server.aof_rewrite_base_size ?
        server.aof_rewrite_base_size : 1;
    long long growth = (server.aof_current_size*100/base) - 100;
    if (growth >= server.aof_rewrite_perc) {
        rewriteAppendOnlyFileBackground();
    }
}

The introduction of the AOFRW throttling mechanism can also effectively avoid CPU and fork overhead caused by AOFRW high-frequency retries. Many RT jitters in Redis are related to fork.

Summary

The introduction of MP-AOF successfully solves the adverse effects of memory and CPU overhead of AOFRW on Redis instances and even service access. At the same time, in the process of solving these problems, we encountered many unexpected challenges. These challenges mainly come from a large number of Redis users and diverse usage scenarios. Therefore, we must consider the problems that users may encounter when using MP-AOF in various scenarios, such as compatibility, ease of use, and minimizing intrusiveness on Redis code. This is a top priority in the evolution of Redis community features.

At the same time, the introduction of MP-AOF also brings more chances anc challenges for Redis data persistence. For example, when aof-use-rdb-preamble is enabled, BASE AOF is essentially an RDB file, so we do not need to perform a BGSAVE operation alone when performing a full backup. Instead, back up BASE AOF directly. MP-AOF supports disabling the ability to clean up HISTORY AOF automatically, so those historical AOF have the opportunity to be retained. Redis already supports adding timestamp annotations to AOF, so we can even implement a simple PITR capability (point-in-time recovery) based on these.

The design prototype of the MP-AOF comes from the binlog implementation of ApsaraDB for Redis Enhanced Edition. This is a core feature that has been proven on Alibaba Cloud Tair services for a long time. In this core feature, Alibaba Cloud Tair has successfully built global multi-active, PITR, and other enterprise-level capabilities to meet the needs of users in more business scenarios. Today, we contribute this core capability to the Redis community. We hope community users can also enjoy these enterprise-level features and use these enterprise-level features to optimize and create their business code. Please refer to the relevant GitHub PR (#9788) for more details on the MP-AOF, more original designs, and complete codes.

0 0 0
Share on

ApsaraDB

443 posts | 93 followers

You may also like

Comments