The Details of Asynchronous Tasks: Function Compute Task Triggered Deduplication

This article describes the technical details of task triggered deduplication of Function Compute Serverless Task to support scenarios that have strict requirements for task execution accuracy.

By Jianyi, Alibaba Cloud Serverless Senior Development Engineer

Preface

Whether in the field of big data processing or message processing, task systems have a critical capability – task triggered deduplication. This capability is essential in some scenarios that require high accuracy (such as in the financial field). As a serverless platform, Serverless Task needs to guarantee that accurate task can trigger semantics at the user application level and within its system.

This article focuses on the topic of message processing reliability to introduce the technical details of Function Compute asynchronous task functions and shows how to use the capabilities provided by Function Compute (FC) to enhance the reliability of task execution in practical applications.

Task Deduplication

When discussing asynchronous message processing systems, the basic semantics cannot be bypassed. In an asynchronous message processing system (task system), a message processing procedure is simplified (as shown in the following figure):

Figure 1

A user sends a task → Enters the queue → The task processing unit monitors and obtains the message → Schedules it to the actual worker for execution.

Downtime or other problems that happened to the component (procedure) during the message forwarding process of the task may lead to message delivery errors. A typical task system provides up to three levels of message processing semantics:

At-Most-Once: It ensures that a message can be delivered at most once. There will be message loss when network partitions and system components are down.
At-Least-Once: The message is delivered at least once. The message delivery procedure supports error retries. The message resending mechanism is used to ensure that the downstream must receive upstream messages. However, the same message may be delivered multiple times in the case of downtime or network partitioning.
The Exactly-Once mechanism can ensure the message is transmitted exactly once. This does not mean there is no retransmission in the case of downtime or network partition. It means retransmission does not change the state of the recipient, which is the same as the result of one transmission. In actual production, it often relies on the retransmission mechanism and recipient de-duplication (idempotent) to be Exactly Once.

Function Compute can provide the Exactly Once semantics of task distribution, which means, in any case, duplicate tasks will be considered the same trigger by the system. Then, only one task distribution will be performed.

In combination with Figure 1, if you want to achieve task deduplication, the system needs to provide at least two dimensions of guarantee:

System-Side Guarantee: The failover of the task scheduling system does not affect the correctness and uniqueness of message delivery.
Provide a mechanism for users, which can be combined with business scenarios to achieve the triggering of the entire business logic and execute deduplication.

Next, we will combine the simplified Serverless Task system architecture to discuss how Function Compute achieves the capabilities above.

The Implementation Background of Function Compute Asynchronous Task Triggered Deduplication

The following figure shows the architecture of the Function Compute task system:

Figure 2

First, the user invokes the Function Compute API to send a task (step 1) into the API-Server of the system. The API-Server passes the message into the internal queue after verification (step 2.1).

There is an asynchronous module in the background to monitor the internal queue in real-time (step 2.2). After that, the Resource Management module is invoked to obtain run time resources (steps 2.2-2.3).

After the run time resource is obtained, the scheduling module sends the task data to the VM-level client (step 3.1), and the client forwards the task to the actual user running resource (step 3.2).

We need to support the following levels to achieve the two dimensions mentioned above:

System-Side Guarantee: In steps 2.1-3.1, the failover of any intermediate process can only trigger the execution of step 3.2 once, which means the running of user instances will only be scheduled once.
User-Side Application-Level Deduplication Capability: It can support users to repeatedly execute step 1 multiple times, but the execution of step 3.2 is only triggered once.

Graceful Upgrade and the Task Distribution Deduplication Guarantee of Failover

After the user's message enters the Function Compute system (complete step 2.1), the user's request will receive a response from the HTTP status code 202, and the user can consider that the task has been submitted once. From the time the task message enters MQ, its lifecycle is maintained by Scheduler, so the stability of Scheduler and the stability of MQ will affect the implementation of the system Exactly Once.

In most open-source messaging systems (such as MQ and Kafka), messages are stored in multiple replicas and are consumed uniquely. The same is true for the message queue used by Function Compute (RocketMQ at the bottom). The three-replica implementation of the underlying storage eliminates the need to pay attention to the stability of message storage. In addition, the message queue used in Function Compute has the following characteristics:

The Uniqueness of Consumption: When messages in the queue are consumed, it enters the invisible model. In this model, other consumers cannot get the message.
The actual consumer of each message needs to update the invisible time of this model in real-time. When the consumer has completed consumption, the message needs to be displayed as deleted.

Therefore, the entire lifecycle of a message in the queue is shown in the following figure:

Figure 3

The scheduler is mainly responsible for message processing, and its tasks consist of the following parts:

According to the Function Compute load balancing module of the scheduling policies, monitor the queue it is responsible for
When a message appears in the queue, pull the message and maintain the status in the memory. Until the message consumption is completed (The user instance returns the function execution result), the message is continuously updated to ensure the message does not appear in the queue again.
When the task is completed, the message deleted is displayed.

In terms of the queue scheduling model, Function Compute adopts a single queue management model for the common user, which means all async execution requests of every user are separated by a single queue and responsible by a Scheduler. The mapping of this load is managed by the Function Compute load balancing service, as shown in the following figure (We will introduce this part in detail in subsequent articles):

Figure 4

When Scheduler 1 is down or upgraded, the task consists of two execution states:

The message has not been delivered to the user's execution instance (steps 3.1-3.2 in Figure 2). When the queue where the scheduler is responsible is picked up by other schedulers, the message will reappear after the consumption visible period. Therefore, Scheduler 2 will obtain the message again for subsequent triggering.
The message has started to execute (step 3.2). When the message reappears in Scheduler 2, we rely on the Agent in the user VM for status management. As a result, Scheduler 2 will send an execution request to the corresponding Agent. If the Agent finds the message exists in the memory, it will directly ignore the execution request and inform Scheduler 2 of the execution result through this procedure after execution, thus completing the failover recovery.

User-Side Business-Level Distribution Deduplication Implementation

The Function Compute system can accurately consume each message under a single point of failure, but if the user side repeatedly triggers function execution for the same piece of business data, Function Compute cannot identify whether different messages are logically the same task. This situation often occurs in network partitions. In Figure 2, if the user invokes 1 to time out, there are two possible situations:

The message did not reach the Function Compute system, and the task was not submitted.
The message has reached the Function Compute and is enqueued. The task is submitted, but the user cannot get the information about the submission due to the timeout.

The user will retry the submission in most cases. In case 2, the same task will be submitted and executed multiple times. Therefore, Function Compute needs to provide a mechanism to ensure the accuracy of the business in this scenario.

Function Compute provides the task concept (StatefulAsyncInvocationID) of TaskID. The ID is globally unique. You can specify an ID each time you submit a task. When a request timeout occurs, the user can make an unlimited number of retries.

All repeated retries will be verified on the Function Compute side. The Function Compute uses DB to store the Metadata of the task. When the same ID enters the system, the request is rejected, and a 400 error is returned. At this time, the client knows the task submission status.

Let’s use Go SDK as an example. You can edit the following code to trigger a task:

import fc "github.com/aliyun/fc-go-sdk"

func SubmitJob() {
    invokeInput := fc.NewInvokeFunctionInput("ServiceName", "FunctionName")
    invokeInput = invokeInput.WithAsyncInvocation().WithStatefulAsyncInvocationID("TaskUUID")
  invokeOutput, err := fcClient.InvokeFunction(invokeInput)
    ...
}

Then, a unique task is submitted.

Summary

This article describes the technical details of task triggered deduplication of Function Compute Serverless Task to support scenarios that have strict requirements for task execution accuracy.

After using Serverless Task, you do not need to worry about failover of any system components. Each task you submit is exactly executed once. You can set the globally unique ID of the task when submitting the task to support service-side semantic distribution deduplication. You can use the capabilities provided by Function Compute (FC) to help deduplicate the task.

Community

The Details of Asynchronous Tasks: Function Compute Task Triggered Deduplication

Preface

Task Deduplication

The Implementation Background of Function Compute Asynchronous Task Triggered Deduplication

Graceful Upgrade and the Task Distribution Deduplication Guarantee of Failover

User-Side Business-Level Distribution Deduplication Implementation

Summary

Read previous post:

Read next post:

Alibaba Cloud Serverless

You may also like

Comments

Alibaba Cloud Serverless

Related Products

Function Compute

ECS(Elastic Compute Service)

Elastic High Performance Computing Solution

Quick Starts