By Zhu Jinjun
Today, let's talk about how Alibaba Seata version 1.5.1 solves the idempotence, suspension, and empty rollback problems in TCC mode.
Try-Confirm-Cancel (TCC) mode is the most classic distributed transaction solution, which divides distributed transactions into two phases to execute. In the try phase, resources are reserved for each branch transaction. If all branch transactions are reserved, the global transaction is committed in the commit phase. If a node fails to reserve resources, the global transaction is rolled back into the cancel phase.
Let’s take the traditional order, inventory, and account services as an example. Try to reserve resources in the try phase. Services (such as inserting orders, deducting inventory, and deducting amounts) need to submit local transactions. Here, you can transfer resources to an intermediate table. In the commit phase, transfer the resources reserved in the try phase to the final table. In the cancel phase, the resources reserved in the try phase are released, such as returning the account amount to the customer.
Note: The try phase need to submit local transactions, such as deducting the amount of the order. The money must be deducted from the customer’s account. If not, in the commit phase, the customer account money is not enough, and there will be problems.
In the try phase, resources are reserved and deducted in the commit phase. As shown in the following figure:
Resources are reserved first in the try phase. Failure to deduct inventory when resources are reserved causes global transaction rollback, and resources are released in the cancel phase. As shown in the following figure:
The biggest advantage of TCC mode is its high efficiency. The locking of resources in the try phase in the TCC mode is not a real lock but a real commit of local transactions. It reserves resources to an intermediate state without blocking and waiting. Therefore, the efficiency is higher than other modes.
The TCC mode can be further optimized:
After the try phase succeeds, it does not enter the confirm/cancel phase. Instead, it assumes the global transaction has ended and starts a scheduled task to asynchronously execute confirm/cancel to deduct or release resources, which improves performance.
There are three roles in TCC mode:
The following figure is from the Seata official website:
When TM opens a global transaction, RM needs to send a registration message to the TC that saves the state of the branch transaction. When a TM request is submitted or rolled back, TC needs to send a commit or rollback message to RM. As such, there are four RPCs between TC and RM in a distributed transaction that contains two branch transactions.
The following figure shows the optimized process:
TC saves the state of the global transaction. When TM opens a global transaction, RM no longer needs to send a registration message to the TC but saves the branch transaction status locally. After the TM sends a commit or rollback message to the TC, the RM asynchronous thread finds out the uncommitted branch transaction saved locally and sends a message to the TC to get the global transaction status (where the local branch transaction is) to decide whether to commit or roll back the local transaction.
After this optimization, the number of RPCs is reduced by 50%, and the performance is improved.
Taking inventory service as an example, the following is the RM inventory service interface code:
@LocalTCC
public interface StorageService {
/**
* Deduct inventory
* @param xid Global xid
* @param productId Product ID
* @param count Number
* @return
*/
@TwoPhaseBusinessAction(name = "storageApi", commitMethod = "commit", rollbackMethod = "rollback", useTCCFence = true)
boolean decrease(String xid, Long productId, Integer count);
/**
* Commit a transaction
* @param actionContext
* @return
*/
boolean commit(BusinessActionContext actionContext);
/**
* Roll back the transaction
* @param actionContext
* @return
*/
boolean rollback(BusinessActionContext actionContext);
}
RM registers a branch transaction to TC during initialization using the annotation @LocalTCC. There is a @TwoPhaseBusinessAction annotation in the try phase (decrease method) where the resourceId, commit method, and cancel method of the branch transaction are defined. The useTCFence property will be discussed in the next section.
The three major problems in TCC mode are idempotence, suspension, and empty rollback. A transaction control table is added in Seata 1.5.1. The table name is tcc_fence_log to solve this problem. The attribute useTCFence mentioned in the @TwoPhaseBusinessAction annotation in the previous section specifies whether to enable this mechanism, and the default value of this attribute is false.
The following is the tcc_fence_log table creation statement (MySQL syntax):
CREATE TABLE IF NOT EXISTS `tcc_fence_log`
(
`xid` VARCHAR(128) NOT NULL COMMENT 'global id',
`branch_id` BIGINT NOT NULL COMMENT 'branch id',
`action_name` VARCHAR(64) NOT NULL COMMENT 'action name',
`status` TINYINT NOT NULL COMMENT 'status(tried:1;committed:2;rollbacked:3;suspended:4)',
`gmt_create` DATETIME(3) NOT NULL COMMENT 'create time',
`gmt_modified` DATETIME(3) NOT NULL COMMENT 'update time',
PRIMARY KEY (`xid`, `branch_id`),
KEY `idx_gmt_modified` (`gmt_modified`),
KEY `idx_status` (`status`)
) ENGINE = InnoDB
DEFAULT CHARSET = utf8mb4;
In the commit/cancel phase, TC needs to retry since it does not receive a response from the branch transaction, which requires the branch transaction to support idempotence.
Let's see how version 1.5.1 solves it. The following code is in the TCCResourceManager class:
@Override
public BranchStatus branchCommit(BranchType branchType, String xid, long branchId, String resourceId,
String applicationData) throws TransactionException {
TCCResource tccResource = (TCCResource)tccResourceCache.get(resourceId);
// Omit the judgment.
Object targetTCCBean = tccResource.getTargetBean();
Method commitMethod = tccResource.getCommitMethod();
// Omit the judgment.
try {
//BusinessActionContext
BusinessActionContext businessActionContext = getBusinessActionContext(xid, branchId, resourceId,
applicationData);
Object[] args = this.getTwoPhaseCommitArgs(tccResource, businessActionContext);
Object ret;
boolean result;
// Annotate whether the useTCFence property is set to true.
if (Boolean.TRUE.equals(businessActionContext.getActionContext(Constants.USE_TCC_FENCE))) {
try {
result = TCCFenceHandler.commitFence(commitMethod, targetTCCBean, xid, branchId, args);
} catch (SkipCallbackWrapperException | UndeclaredThrowableException e) {
throw e.getCause();
}
} else {
// Omit the logic.
}
LOGGER.info("TCC resource commit result : {}, xid: {}, branchId: {}, resourceId: {}", result, xid, branchId, resourceId);
return result ? BranchStatus.PhaseTwo_Committed : BranchStatus.PhaseTwo_CommitFailed_Retryable;
} catch (Throwable t) {
// Omit
return BranchStatus.PhaseTwo_CommitFailed_Retryable;
}
}
As can be seen from the preceding code, when executing branch commit method, judge whether the useTCFence attribute is true first. If it is true, the commitFence logic in the TCCFenceHandler class is used. Otherwise, the normal commit logic is used.
The commitFence method in the TCCFenceHandler class calls the commitFence method of the TCCFenceHandler class with the following code:
public static boolean commitFence(Method commitMethod, Object targetTCCBean,
String xid, Long branchId, Object[] args) {
return transactionTemplate.execute(status -> {
try {
Connection conn = DataSourceUtils.getConnection(dataSource);
TCCFenceDO tccFenceDO = TCC_FENCE_DAO.queryTCCFenceDO(conn, xid, branchId);
if (tccFenceDO == null) {
throw new TCCFenceException(String.format("TCC fence record not exists, commit fence method failed. xid= %s, branchId= %s", xid, branchId),
FrameworkErrorCode.RecordAlreadyExists);
}
if (TCCFenceConstant.STATUS_COMMITTED == tccFenceDO.getStatus()) {
LOGGER.info("Branch transaction has already committed before. idempotency rejected. xid: {}, branchId: {}, status: {}", xid, branchId, tccFenceDO.getStatus());
return true;
}
if (TCCFenceConstant.STATUS_ROLLBACKED == tccFenceDO.getStatus() || TCCFenceConstant.STATUS_SUSPENDED == tccFenceDO.getStatus()) {
if (LOGGER.isWarnEnabled()) {
LOGGER.warn("Branch transaction status is unexpected. xid: {}, branchId: {}, status: {}", xid, branchId, tccFenceDO.getStatus());
}
return false;
}
return updateStatusAndInvokeTargetMethod(conn, commitMethod, targetTCCBean, xid, branchId, TCCFenceConstant.STATUS_COMMITTED, status, args);
} catch (Throwable t) {
status.setRollbackOnly();
throw new SkipCallbackWrapperException(t);
}
});
}
As you can see from the code, when committing a transaction, it determines whether there are records in the tcc_fence_log table. If there are records, the transaction execution status is judged and returned. As such, if the transaction status is determined to be STATUS_COMMITTED, it will not be committed again, ensuring idempotence. If there is no record in the tcc_fence_log table, insert a record for later judgment when you try again.
The logic of rollback is similar to commit. The logic is in the rollbackFence method of the class TCCFenceHandler.
As shown in the following figure, the account service is a cluster with two nodes. In the try phase, the account service 1 node fails. In the try phase, the global transaction must go to the end state without considering retries. As such, the cancel operation needs to be executed on the account service, thus running an empty rollback operation.
The solution of Seata is to insert a record into the tcc_fence_log table in the try phase. The value of the status field is STATUS_TRIED. Judge whether the record exists in the Rollback phase. If not, the rollback operation is not performed. As shown in the following code:
// TCCFenceHandler Class
public static Object prepareFence(String xid, Long branchId, String actionName, Callback<Object> targetCallback) {
return transactionTemplate.execute(status -> {
try {
Connection conn = DataSourceUtils.getConnection(dataSource);
boolean result = insertTCCFenceLog(conn, xid, branchId, actionName, TCCFenceConstant.STATUS_TRIED);
LOGGER.info("TCC fence prepare result: {}. xid: {}, branchId: {}", result, xid, branchId);
if (result) {
return targetCallback.execute();
} else {
throw new TCCFenceException(String.format("Insert tcc fence record error, prepare fence failed. xid= %s, branchId= %s", xid, branchId),
FrameworkErrorCode.InsertRecordError);
}
} catch (TCCFenceException e) {
// Omit
} catch (Throwable t) {
// Omit
}
});
}
The following is the processing logic in the Rollback phase:
// TCCFenceHandler Class
public static boolean rollbackFence(Method rollbackMethod, Object targetTCCBean,
String xid, Long branchId, Object[] args, String actionName) {
return transactionTemplate.execute(status -> {
try {
Connection conn = DataSourceUtils.getConnection(dataSource);
TCCFenceDO tccFenceDO = TCC_FENCE_DAO.queryTCCFenceDO(conn, xid, branchId);
// non_rollback
if (tccFenceDO == null) {
// Do not execute the rollback logic.
return true;
} else {
if (TCCFenceConstant.STATUS_ROLLBACKED == tccFenceDO.getStatus() || TCCFenceConstant.STATUS_SUSPENDED == tccFenceDO.getStatus()) {
LOGGER.info("Branch transaction had already rollbacked before, idempotency rejected. xid: {}, branchId: {}, status: {}", xid, branchId, tccFenceDO.getStatus());
return true;
}
if (TCCFenceConstant.STATUS_COMMITTED == tccFenceDO.getStatus()) {
if (LOGGER.isWarnEnabled()) {
LOGGER.warn("Branch transaction status is unexpected. xid: {}, branchId: {}, status: {}", xid, branchId, tccFenceDO.getStatus());
}
return false;
}
}
return updateStatusAndInvokeTargetMethod(conn, rollbackMethod, targetTCCBean, xid, branchId, TCCFenceConstant.STATUS_ROLLBACKED, status, args);
} catch (Throwable t) {
status.setRollbackOnly();
throw new SkipCallbackWrapperException(t);
}
});
}
The following are the SQL statements executed by the updateStatusAndInvokeTargetMethod method:
update tcc_fence_log set status = ?, gmt_modified = ?
where xid = ? and branch_id = ? and status = ? ;
It shows that the value of the status field in the tcc_fence_log table is changed from STATUS_TRIED to STATUS_ROLLBACKED, and if the update succeeds, the rollback logic is executed.
Suspension means RM does not receive the try instruction at the beginning due to network, but after executing Rollback, RM receives the try instruction and reserves resources successfully. At this time, the global transaction has ended, and the reserved resources cannot be released. As shown in the following figure:
Seata solves this problem by determining whether the xid record exists in the current tcc_fence_log when executing the Rollback method. If not, it inserts a record into the tcc_fence_log table whose status is STATUS_SUSPENDED. and the rollback operation is no longer performed (as shown in the following code):
public static boolean rollbackFence(Method rollbackMethod, Object targetTCCBean,
String xid, Long branchId, Object[] args, String actionName) {
return transactionTemplate.execute(status -> {
try {
Connection conn = DataSourceUtils.getConnection(dataSource);
TCCFenceDO tccFenceDO = TCC_FENCE_DAO.queryTCCFenceDO(conn, xid, branchId);
// non_rollback
if (tccFenceDO == null) {
// Insert anti-suspension records
boolean result = insertTCCFenceLog(conn, xid, branchId, actionName, TCCFenceConstant.STATUS_SUSPENDED);
// Omit the logic
return true;
} else {
// Omit the logic
}
return updateStatusAndInvokeTargetMethod(conn, rollbackMethod, targetTCCBean, xid, branchId, TCCFenceConstant.STATUS_ROLLBACKED, status, args);
} catch (Throwable t) {
// Omit the logic
}
});
}
When the try method is executed, a record of the current xid is inserted into the tcc_fence_log table first. This causes a primary key conflict (as shown in the following code):
// TCCFenceHandler Class
public static Object prepareFence(String xid, Long branchId, String actionName, Callback<Object> targetCallback) {
return transactionTemplate.execute(status -> {
try {
Connection conn = DataSourceUtils.getConnection(dataSource);
boolean result = insertTCCFenceLog(conn, xid, branchId, actionName, TCCFenceConstant.STATUS_TRIED);
// Omit the logic
} catch (TCCFenceException e) {
if (e.getErrcode() == FrameworkErrorCode.DuplicateKeyException) {
LOGGER.error("Branch transaction has already rollbacked before,prepare fence failed. xid= {},branchId = {}", xid, branchId);
addToLogCleanQueue(xid, branchId);
}
status.setRollbackOnly();
throw new SkipCallbackWrapperException(e);
} catch (Throwable t) {
// Omit
}
});
}
Note: sql uses the for update
in the queryTCCFenceDO method. And you don't have to worry about not getting tcc_fence_log table records in the Rollback method and not determining the execution result of the local transaction in the try phase.
TCC mode is the most used mode for distributed transactions. Idempotence, suspension, and empty rollback have been issues that TCC mode needs to consider. The Seata framework addresses these problems in version 1.5.1.
Operations on tcc_fence_log tables need to consider transaction control. Seata uses proxy data sources to enable the tcc_fence_log table and RM business to be executed in the same local transaction, thus ensuring that both local operations and operations on tcc_fence_log succeed or fail at the same time.
Converged Database Ecosystem: Building CDC Applications with EventBridge
506 posts | 48 followers
FollowAlibaba Cloud Native Community - August 8, 2023
Stone Doyle - January 28, 2021
Alibaba Cloud Native Community - June 25, 2021
Alibaba Cloud Native Community - September 14, 2023
Alibaba Developer - October 20, 2021
Alibaba Cloud Community - May 8, 2024
506 posts | 48 followers
FollowMulti-source metrics are aggregated to monitor the status of your business and services in real time.
Learn MoreAlibaba Cloud Function Compute is a fully-managed event-driven compute service. It allows you to focus on writing and uploading code without the need to manage infrastructure such as servers.
Learn MoreProvides a control plane to allow users to manage Kubernetes clusters that run based on different infrastructure resources
Learn MoreHigh Performance Computing (HPC) and AI technology helps scientific research institutions to perform viral gene sequencing, conduct new drug research and development, and shorten the research and development cycle.
Learn MoreMore Posts by Alibaba Cloud Native Community