How Does the Asynchronous Task Processing System Solve the Problems of Time Consuming and High Concurrency?

This article introduces the applicable scenarios and benefits of the asynchronous task processing system and discusses the architecture, functions, an...

By Buchen, Alibaba Cloud Serverless Technical Director

When we build an application, we always hope it can respond quickly and be cheap. However, our system faces various challenges in practice, such as unpredictable traffic peaks, slow response from dependent downstream services, and a small number of requests consuming a lot of CPU or memory resources. The entire system is often slowed down or cannot respond to requests. More computing resources have to be reserved in many cases to make application services always respond quickly, but most of the time, these computing resources are idle. A better way is to separate the time-consuming or resource-consuming processing logic from the main request processing logic and hand it over to a more resource-flexible system for asynchronous execution. This allows the request to be quickly processed and returned to the user but also saves costs.

Generally speaking, the time-consuming, resource-consuming, or prone to error logic is better stripped out of the main request process logic and executed asynchronously. For example, after a new user registers successfully, the system usually sends a welcome email. The logic of sending a welcome email can be stripped from the registration process. Another example is when a user uploads an image, thumbnails of different sizes are usually needed. However, image processing is not included in the image uploading. The user can finish the process after uploading the image successfully, and the processing logic (such as generating thumbnails) can be executed as asynchronous tasks. This way, the application server can avoid being overwhelmed by compute-intensive tasks (such as image processing), and users can get a faster response. Common asynchronous execution tasks include:

Send Email/Instant Message
Check for Spam
Document Processing (Conversion Format and Export)
Audio, Video, and Image Processing (Generating Thumbnails, Watermarking, Pornography Detection, and Transcoding)
Invoke External Third-Party Services
Rebuild Search Index
Import/Export Large Amounts of Data
Web Crawler
Data Cleansing

Slack [1], Pinterest [2], Facebook [3], and other companies are using asynchronous task processing systems to achieve better service availability and lower costs. According to Dropbox statistics [4], there are more than 100 different types of asynchronous tasks in their business scenarios. A fully functional asynchronous task processing system can bring significant benefits:

Faster System Response: The time-consuming and resource-consuming logic is stripped from the main request processing and executed asynchronously, which can reduce the request-response delay and bring a better user experience.
Better Handling of Massive Burst Requests: In many scenarios (such as e-commerce), there are often massive burst requests that impact the system. Similarly, if the resource-consuming logic is stripped from the main request processing and executed asynchronously, the asynchronous task processing system can respond to more requests than a system with the same resource capacity.
Lower Costs: The execution duration of asynchronous tasks usually ranges from hundreds of milliseconds to several hours. According to different task types, choosing task execution duration reasonably and using resources more flexibly can help achieve lower costs.
Better Retry Policy and Error Handling Capabilities: Tasks are reliably executed (at-least-once) and retried based on the configured retry policy to achieve better fault tolerance. For example, if you invoke a third-party downstream service to an asynchronous task, set a reasonable retry policy. Even if the downstream service is occasionally unstable, the success rate of the task is not affected.
Complete Task Processing Faster: The execution of multiple tasks is highly parallelized. A large number of tasks can be completed faster at a reasonable cost by scaling the resources of the asynchronous task processing system.
Better Task Priority Management and Traffic Control: Tasks are usually handled according to different priorities (depending on the type). The asynchronous task processing system can help users process tasks better at different priorities, allowing higher priority tasks to be processed faster without leaving lower priority tasks unprocessed.
More Diverse Task Triggering Methods: Tasks can be triggered in various methods, such as directly submitting them through the API, triggering them through events, or executing them on a scheduled basis.
Better Observability: The asynchronous task processing system typically provides task logs, metrics, status queries, tracing analysis, and other capabilities that allow asynchronous tasks to be observed better and problems to be diagnosed more easily.
Higher Efficiency in Research and Development: Users focus on the implementation of task processing logic. Task scheduling, resource scaling, high availability, traffic control, task priority judgment, and other functions are completed by the task processing system, improving the research and development efficiency.

Task Processing System Architecture

A task processing system usually consists of task API and observability, task distribution, and task execution. We introduce the functions of these three subsystems first and then discuss the technical challenges and solutions faced by the entire system.

Task API/Dashboard

This subsystem provides a set of task-related APIs, including task creation, query, and deletion. Users use system functions through GUI and command line tools that directly invoke API. Observability presented in the Dashboard and other ways is also important. A good task processing system should include the following observable capabilities:

Logs: It can collect and display task logs, and users can quickly query the logs of specified tasks.
Metrics: The system should provide key metrics (such as the number of queued tasks) to help users quickly judge the execution status of tasks.
Tracing Analysis: The time consumed in each step from task submission to execution (such as the time of queuing and the actual execution time). The following figure shows the tracing capabilities of the Netflix Cosmos platform.

Task Distribution

Task distribution is responsible for scheduling and distributing tasks. A task distribution system that can be applied to the production environment usually has the following functions:

Reliable Distribution of Tasks: Once a task is submitted, the system should ensure the task is scheduled for execution in any case.
Scheduled or Delayed Distribution of Tasks: Many tasks should be executed at a specified time, such as sending emails or messages or generating data reports on a scheduled basis. Another situation is that tasks can be delayed for a long time. For example, data analysis tasks submitted before getting off work can be completed before work the next day. Such tasks can be executed in the early morning when resource consumption is low, and the cost can be reduced through off-peak execution.
Task Deduplication: We always don't want tasks to be repeatedly executed. In addition to the waste of resources, the repeated implementation of tasks may have more serious consequences. For example, a metering task miscalculates the bill because of repeated execution. If you want to execute a task exactly-once, each step in the chain of task submission, distribution, and execution needs to be executed exactly once. This means when implementing the task processing code, users must implement exactly-once in various situations (such as successful execution or failed execution). How to implement a complete exactly-once is complex and beyond the scope of this article. In many cases, it is valuable for the system to provide simplified semantics; the task is executed only once successfully. Task deduplication requires the user to specify the task ID when submitting the task. The system uses the ID to determine whether the task has been submitted and executed successfully.
Retry of the Task in Error: A reasonable task retry policy is critical for efficient and reliable task completion. Several factors should be considered for task retry:

1) The processing capability of the downstream task execution system should be considered for task retry. For example, if a traffic control error from the downstream task execution system appears or the task execution has become a bottleneck, the exponential backoff is required to retry. The retry should not increase the pressure on the downstream system or crush the downstream system.

2) The retry strategy should be simple, clear, and easy for users to understand and configure. First of all, it is necessary to classify errors to distinguish the non-retryable error, retryable errors, and traffic control errors. The non-retryable error refers to the error that fails deterministically. On this occasion, task retry is meaningless, such as parameter errors and permission issues. Retryable error means the factors that cause a task to fail are contingent, and the task will eventually succeed by retrying, such as network timeout and other internal system errors. The traffic control error is a special retryable error, which usually means the downstream is already fully loaded, and the task retry requires the backoff mode to control the number of requests sent to the downstream task.

Load Balancing of Tasks: The execution duration of tasks varies significantly from several hundred milliseconds to several hours. The round-robin task distribution mode may cause an uneven load on execution nodes. The common mode in practice is to place tasks in the queues, and the execution nodes actively pull tasks according to their task execution status. The tasks are placed in the queues and distributed to appropriate nodes according to the load capacity of nodes, allowing the load balancing of the nodes. Task load balancing usually requires the coordination of the distribution system and the execution subsystem.
Tasks Are Distributed by Priority: Task processing systems usually process tasks in many business scenarios, and the task types and priorities are different. On the one hand, the execution priority of tasks related to the core business experience is higher than edge tasks. A notification to a buyer about a product review on Taobao is less important than a notification about a nucleic acid test during the COVID-19 pandemic, even though they are both notifications. On the other hand, the system should remain fair and not allow high-priority tasks to hog resources while causing low-priority tasks to starve to death.
Task Traffic Control: Task traffic control typically works to cut traffic peaks and fill traffic valleys. For example, a user submits hundreds of thousands of tasks at a time but expects to process them slowly in several hours. Therefore, the system needs to limit the distribution rate of tasks to match the capability of the downstream task execution system. Task traffic control is also an important means to ensure the reliability of the system. When the number of submitted tasks explodes, traffic control is employed to limit the impact on the system and reduce the impact on other tasks.
Batch Suspension and Deletion of Tasks: It is very important to provide batch suspension and deletion of tasks in the actual production environment. Users are exposed to unexpected situations all the time. For example, if there are some problems in the execution of the task, it is better to suspend the execution of the subsequent tasks. After manually checking that there is no problem, resume the execution or temporarily suspend lower-priority tasks to release computing resources for higher-priority tasks. Another situation is when there is a problem with the submitted task, so the implementation is meaningless. Therefore, the system should allow users to easily delete tasks being executed and queued. The suspension and deletion of tasks require the coordination of the distribution system and the execution subsystem.

The architecture of task distribution can be divided into pull mode and push mode. Pull mode distributes tasks through the task queue. The instance that executes the task proactively pulls the tasks from the task queue and pulls the new task after it is processed. Compared with pull mode, push mode adds an allocator. The allocator reads the task from the task queue, schedules it, and pushes it to the appropriate task execution instance.

The architecture of the pull mode is clear. It can quickly build a task distribution system and perform well in simple task scenarios (based on popular software such as Redis). However, if the functions required by complex service scenarios (such as task deduplication, task priority judgment, batch suspension or deletion of tasks, and flexible resource scaling) are supported by the pull mode, the implementation complexity of the pull mode will increase rapidly. The pull mode faces the following major challenges in practice:

Auto-scaling of resources and load balancing are complex. The task execution instance establishes a connection with the task queue to pull tasks. When the scale of task execution instances is large, the connection resource consumption will increase significantly. Therefore, the operations of mapping and allocation are required to implement that the task instance is only connected to the corresponding task queue. The following figure shows the architecture of Slack's asynchronous task processing system. Worker nodes are only connected to some Redis instances. This implements the large-scale scalability of worker nodes but increases the complexity of scheduling and load balancing.

It is desirable to use different queues to meet the requirements of supporting task priority judgment, isolated processing of tasks at different priorities, and traffic control. However, too many queues increase the consumption of management and connection resources. Learning how to balance everything is challenging.
Functions (such as task deduplication and batch suspension or deletion of tasks) depend on the message queues. However, few message products can meet all requirements, and the development of new messaging products is often required. For example, from the perspective of scalability, it is usually impossible for each type of task to correspond to a separate task queue. When a task queue contains multiple types of tasks, it is complicated to suspend or delete a certain type of task in batches.
The task type of the task queue is coupled with the task processing logic. If the task queue contains multiple types of tasks, the corresponding task processing logic must be implemented, which is not user-friendly. The task processing logic of user A will not receive tasks from other users in practice. Therefore, task queues are usually managed by users, which increases the burden on users.

The core idea of push mode is to decouple the task queue from the task execution instance to clarify the boundary between the platform and the user. Users only need to focus on the implementation of task processing logic, and the platform is responsible for the management of task queues and resource pools of the task execution nodes. The decoupling of the task queue from the task execution instance also enables the capacity expansion of task execution nodes to be no longer limited by the connection resources of the task queue and achieve higher flexibility. However, the push mode also introduces high complexity. The priority management of tasks, load balancing, scheduling and distribution, and traffic control are all performed by the allocator, which needs to be linked with upstream and downstream systems.

In general, when the task scenario becomes complex, the system complexity in both the pull mode and the push mode remains high. However, the push mode makes the boundary between the platform and users clearer and simplifies the system complexity for users. Therefore, teams with strong technical strength usually choose the push mode when implementing a platform-level task processing system.

Task Execution

The task execution subsystem manages a batch of worker nodes that execute tasks flexibly and reliably. A typical task execution subsystem must have the following functions:

Reliable Execution of Tasks: Once the task is submitted, the system should ensure the task is executed in any case. For example, if the node that executes the task breaks down, the task should be scheduled to another node for execution. Reliable execution of tasks requires the coordination of the task distribution system and the task execution subsystem.
Shared Resource Pools: Different types of task processing resources share a unified resource pool to cut traffic peaks and fill traffic valleys, improving resource utilization efficiency and reducing costs. For example, if different types of tasks (such as compute-intensive and io-intensive tasks) are scheduled to the same Worker node, the CPU, memory, network, and other resources can be fully utilized. Shared resource pools impose higher requirements on capacity management, task resource quota management, task priority management, and resource isolation.
The Auto-Scaling of the Resource: The system can reasonably perform resource scaling on execution nodes based on the execution status of workloads, thus reducing costs. The timing and quantity of resource scaling are critical. Resource scaling occurs based on the CPU, memory, and other resource usage conditions of task execution nodes in common scenarios, which take a long time and cannot meet high real-time requirements. Many systems also perform resource scaling based on the metrics (such as the number of queued tasks). In addition, the expansion of execution nodes needs to match the capabilities of upstream and downstream systems. For example, when the task distribution subsystem distributes tasks through task queues, the expansion of worker nodes should match the connection capabilities of the queues.
Resource Isolation of Tasks: When multiple different tasks are executed on a worker node, the resources required for each task are isolated from each other. This is usually implemented with the isolation mechanism of containers.
Resource Quota of Tasks: Users use the task execution subsystem in a variety of scenarios, such as processing tasks with multiple types and priorities. The system should allow users to set resource quotas for processing functions or tasks with different priorities and allow reserving resources for high-priority tasks or limiting the resources available for low-priority tasks.
Simplify the Coding of Task Processing Logic: A good task processing system allows users to focus on implementing a single task processing logic, and the system automatically executes tasks in a parallel, flexible, and reliable manner.
Smooth Upgrade: The upgrade of the underlying system should not interrupt the execution of long-term tasks.
Notification of Execution Result: The system should notify the task execution status and result in real-time. The task that fails to be executed will be saved to the dead-letter queue, so users can manually retry again at any time.

The task execution subsystem typically uses the container cluster managed by Kubernetes as the resource pool. Kubernetes can manage nodes and schedule container instances that execute tasks to the appropriate nodes. Kubernetes also has built-in Jobs and Cron Jobs, which simplifies the complexity for users to use Job. Kubernetes helps implement shared resource pool management and resource isolation of tasks. However, the main capabilities of Kubernetes are POD or instance management. In many cases, more functions need to be developed to meet the requirements of asynchronous task processing. Examples:

It is generally difficult for the HPA in Kubernetes to meet the requirements of auto-scaling in task scenarios. Open-source projects (such as Keda [5]) provide a mode of scaling by metrics (such as the number of queued tasks). AWS also provides a similar solution [6] in conjunction with CloudWatch.
Kubernetes usually implement asynchronous tasks with queues. Users are responsible for the management of queue resources.
The native job scheduling and startup of Kubernetes are relatively slow, and the tps is generally less than 200, so Kubernetes is unsuitable for processing tasks with high tps and low latency.

Note: There are some differences between jobs in Kubernetes and tasks discussed in this article. The job in Kubernetes usually means processing one or more tasks. The task in this article is the atomic concept. A single task is only executed on one instance. The execution duration ranges from tens of milliseconds to several hours.

The Practice of the Large-Scale Multi-Tenant Asynchronous Task Processing System

Next, I will use the asynchronous task processing system of Alibaba Cloud Function Compute (FC) as an example and discuss some technical challenges faced by the large-scale multi-tenant asynchronous task [7] processing system and the corresponding counter strategies. On the Alibaba Cloud Function Compute (FC) platform, users only need to create a task processing function and then submit the task. The asynchronous task processing is flexible, highly available, and observable. We have adopted a variety of strategies to implement isolation, scaling, load balancing, and traffic control in a multi-tenant scenario to smoothly handle the highly dynamically changing load of a large number of users.

Dynamic Queue Resource Scaling and Traffic Routing

As mentioned earlier, asynchronous task systems usually rely on queues to implement task distribution. When the task processing mid-end has to deal with many business sides, it is no longer feasible to allocate separate queue resources for each application or function (or even each user). Since most applications are long-tailed, low-frequency calls will cause a lot of waste of queue and connection resources, and polling a large number of queues also weakens the scalability of the system.

However, if all users share the same batch of queue resources, they may face the noisy neighbor problem in multi-tenant scenarios. The load burst of application A will crowd out the processing capacity of the queue and affect other applications.

In practice, Function Compute built a dynamic queue resource pool. First, some queue resources will be preset in the resource pool, and applications will be mapped to some queues through the hash map. If the traffic of some applications increases rapidly, the system will adopt a variety of policies:

If these applications' traffic continues to remain high, resulting in a queue backlog, the system will automatically create a separate queue for them and divert the traffic to the new queue.
The system will also migrate some latency-sensitive or high-priority applications to other queues to avoid being affected by the queue backlog caused by high-traffic applications.
It allows users to set the expiration time of tasks. For real-time tasks, it quickly discards expired tasks when a backlog occurs to ensure new tasks can be processed faster.

Random Load Sharding

In a multi-tenant scenario, preventing spoilers from causing catastrophic damage to the system is the biggest challenge in system design. The spoiler may be a user attacked by DDoS or the load that may have triggered the system bug in some corner cases. The following figure shows a very popular architecture in which the traffic from all users is evenly sent to multiple servers in round-robin mode. When the traffic from all users meets expectations, the system works well, each server achieves load balancing, and the downtime of some servers does not affect the availability of the overall service. However, when a spoiler appears, the availability of the system will be at great risk.

As shown in the following figure, if the red user is attacked by DDoS or some of his requests may trigger a bug that causes server downtime, his load may destroy all servers and cause the entire system to become unavailable.

The essence of the problems above is that the traffic from any user will be routed to all servers. This mode is quite fragile when faced with spoilers without any load isolation capability. If any user's load will only be routed to some servers, could this problem be solved? As shown in the following figure, the traffic of any user is routed to two servers at most. Even if the two servers are down, the processing of requests from normal users is still not affected. This sharding load mode, which maps the user's load to some servers (but not all), can implement load isolation and reduce the risk of service unavailability. The cost is that the system needs to prepare more redundant resources.

Next, let's adjust how the user load is mapped. As shown in the following figure, the load of each user is evenly mapped to the two servers. The load is more balanced, and even if the two servers are down, no user's load is affected except for the red. If we set the partition size to 2, there are C_{3}^{2}=3 combinations of selecting 2 servers from 3 servers (or 3 possible partitioning methods). Based on the random algorithm, we map the load evenly to the partitions. Then, if any partition is not available, 1/3 of the load will be affected at most. Assuming that we have 100 servers and the size of the partition is still 2, there are C_{100}{2}=4950 types of partitioning methods. The unavailability of a single partition only affects 1/4950=0.2% of the load. As the number of servers increases, the positive effect of random partitioning becomes more obvious. Random partitioning load is a very simple but powerful model, which plays a key role in ensuring the availability of the multi-tenant system.

Task Distribution Adaptive to the Downstream Processing Capacity

Function Compute uses the push mode for task distribution, so users only need to focus on the development of task processing logic, and the boundary between the platform and users is also clear. There is a task allocator in push mode, which is responsible for pulling tasks from the task queue and scheduling them to downstream task processing instances. The task allocator should be able to adjust the task distribution speed adaptively according to the downstream processing capacity. When the queue backlog occurs, we hope to strengthen the dispatch worker pool's task distribution capability continuously. When the upper limit of downstream processing capability is reached, the worker pool must be able to perceive it and maintain a relatively stable distribution speed. When the task is processed, the work pool has to be scaled down to release the distribution capacity to other task processing functions.

In practice, we draw on the idea of the TCP congestion control algorithm and adopt the Additive Increase Multiplicative Decrease (AIMD) algorithm for the scaling of the worker pool. When users submit a large number of tasks in a short time, the allocator does not immediately distribute a large number of tasks to the downstream task. Instead, it linearly increases the distribution speed according to the additive increase policy to avoid the impact on downstream services. After receiving the traffic control error from the downstream service, the multiplicative decrease policy is adopted to scale down the worker pool according to a certain proportion. Only when the traffic control error meets the threshold of error rate and error number, scaling down is triggered to avoid frequent scaling of the worker pool.

Send Back Pressure to the Upstream Task Producer

If the task processing capacity always lags behind the production capacity, there will be more task backlogs in the queue, although multiple queues and traffic routing can be used to reduce the mutual influence between tenants.

However, when the task backlogs exceed a certain threshold, the processing system should be more actively informing this task processing pressure to the upstream task production system, such as the request for traffic control of task submission. In the multi-tenant resource sharing scenario, the implementation of back pressure will be even more challenging. For example, application A and application B share the resources of the task distribution system. If application A has a backlog of tasks, how can we do this?

Fair: Perform traffic control to application A instead of application B as much as possible. Traffic control is essentially a probability issue. Calculate the traffic control probability for each application. The more accurate the probability, the fairer the traffic control.
Timely: Back pressure must be transmitted to the outermost layer of the system. For example, traffic control is performed on application A when the task is submitted, so the impact on the system is minimal.

It is challenging to identify objects that need traffic control in multi-tenant scenarios. We have borrowed Sample and Hold algorithms [8] in practice and achieved good results. Readers interested in it can refer to relevant papers.

Capability Levels of the Asynchronous Task Processing System

Based on the preceding analysis of the architecture and functions of the asynchronous task processing system, we divide the capabilities of the asynchronous task processing system into the following three levels:

Level 1: A research and development team with 1-5 members is usually required, and the system is built by integrating the capabilities of open-source software and cloud services (such as Kubernetes and message queues). The capabilities of the system are limited by the open-source software and cloud services it relies on, and it is difficult to customize services according to the business needs. The usage of resources in such a system is static, and resource scaling and load balancing are not supported. The number of tasks that can be processed is limited. As the scale and complexity of tasks increase, the cost of system development and maintenance increases rapidly.
Level 2: A research and development team with 5-10 members is usually required. Based on open-source software and cloud services, it has certain independent research and development capabilities to meet common business requirements. The system built by such a team does not have the complete capabilities of task priority judgment, isolation, and traffic control. It usually configures different queues and computing resources for different business sides. The resource management is extensive, and the system also lacks the capabilities of real-time resource scaling and capacity management. The scalability and refined resource management capabilities are not supported, making it difficult for the system to deal with large-scale and complex business scenarios.
Level 3: A research and development team with more than ten members is usually required to build a platform-level system. The system has the capabilities to deal with large-scale and complex business scenarios. It has shared resource pools and has complete capabilities in task scheduling, isolation and traffic control, load balancing, and resource scaling. The boundary between the platform and users is clear, and the business side only needs to focus on the development of task processing logic. It is also observable.

Table_1

Summary

The asynchronous task processing system is an important means of building flexible, highly available, and responsive applications. This article introduces the applicable scenarios and benefits of the asynchronous task processing system and discusses the architecture, functions, and engineering practices of the typical asynchronous task system.

Implementing a flexible and scalable asynchronous task processing platform that can meet the needs of multiple business scenarios is highly complex. Alibaba Cloud Function Compute (FC) provides convenient asynchronous task processing services close to Level ß3 capabilities. Users only need to create task processing functions and submit tasks through the console, command line tools, APIs, SDKs, event triggers, or other ways to process tasks in a flexible, reliable, and observable manner.

Using Function Compute to process asynchronous tasks covers scenarios with task processing duration ranging from milliseconds to 24 hours. Function Compute is widely used by customers inside and outside Alibaba Group, including Alibaba Cloud database self-made service DAS, the Alipay applet stress testing platform, Netease CloudMusic, New Oriental, Focus Media, and Milian.