Disclaimer: This is a translated work of Qinxia's 漫谈分布式系统, all rights reserved to the original author.
When the business is very simple and the amount of data is not large, the traditional stand-alone relational database is sufficient. In each relational database, the stand-alone version of the transaction has been implemented very early, and summarized into the four characteristics of ACID.
Atomity, atomicity, all operations within a transaction are either all executed or none of them are executed.
Consistency, consistency, after the transaction is executed, the database still maintains a legal state, such as not violating the primary and foreign key constraints.
Isolation, isolation, transactions are isolated from each other, and concurrent execution is not affected.
Durability, persistence, after the transaction is committed, the changes involved will be persisted and will not be lost due to system failure.
Another very related but confusing concept is the CAP mentioned in the previous article.
Consistency, consistency, different requests at different times can return the same data.
Availability, availability, each external request can be effectively responded by the system.
Partition tolerance, partition tolerance, in the presence of network partitions, the system can still operate normally.
As you can see, ACID and C in CAP have completely different meanings. The former emphasizes that the database state is always legal and valid, while the latter emphasizes that the data on different replicas is always the same to the outside world.
MySQL as a typical stand-alone relational database can support ACID well, but when the data volume and request concurrency expand to a certain extent, it will inevitably expand horizontally into a distributed database.
A typical implementation is to do multiple databases and multiple tables on the basis of a stand-alone database. Whether it is vertical segmentation of the same type of data, such as dividing the user table into 100 database instances, or horizontal segmentation of different types of data, such as placing the user table and the commodity table in different Database instances are inevitable.
After the database becomes distributed, we can naturally implement distributed transactions through 2PC on the basis of stand-alone transactions as described in the previous article.
For the needs of high availability, data can also be copied to the slave machine by means of binlog, and this process can also be implemented through 2PC.
However, no distributed system can escape the curse of CAP.
As explained in detail in the previous article, distributed transactions in 2PC mode, multiple rounds of multi-node negotiation lead to poor performance and cannot provide partition tolerance.
Although C in ACID and C in CAP have different meanings, when extended to a distributed context, ACID does imply a strong consistency guarantee. 2PC-based distributed transactions continue the pursuit of strong consistency in distributed scenarios, which can be called ACID on 2PC. Even if we continue to optimize, such as ACID on 3PC, it will not solve the problem fundamentally.
At the same time, as mentioned in the previous article, in the scenario of large data volume and high concurrency, sometimes availability and performance (which can also be seen as a manifestation of availability) are not less important than consistency. A system that does not respond at all times or responds very slowly, no matter how consistent the data is, it is difficult to apply it on a large scale.
On the one hand, consistency is difficult to fully guarantee, and on the other hand, availability and performance cannot be ignored. Is there a better way out for distributed transactions?
The answer is yes. And someone has already summarized this idea as the BASE theory:
BA, Basicly Available, basically available, does not pursue complete availability. Partially available is better than not available at all.
Soft state, soft state, does not pursue mechanical state transitions like state machines, and allows intermediate states. The so-called "flexible transaction" also means this.
Eventually consistency, eventually consistency, do not pursue strong consistency all the time.
Simply put, it is sacrificing a part of consistency in exchange for availability. Very important and very typical trade-off.
In chemical terms, ACID means acid and BASE means base. The relationship between the two can be seen from the name.
Dynamo-based distributed transactions
That's right! Didn't the last article just introduce Dynamo, a ready-made causal (weak) consistent distributed database, so why not do transactions directly on the basis of Dynamo? !
Implement transactions on the client side
Amazon has officially provided a library called dynamodb-transactions to help applications implement distributed transactions on the client side.
Roughly speaking, it is an implementation of the multi-phase commit protocol:
create, create a TX record with a unique primary key and save it as a Dynamo object.
add, add transaction-related objects to the object list of the TX object.
lock, set the lock of related objects to this TX id one by one.
save, saves a copy of the related object for rollback.
verify, reread the TX record to ensure that the state is still pending to prevent competition with other transactions.
apply, executes the operation corresponding to the transaction. The modification operation will directly modify the original object, and the delete operation will not be executed at this time.
commit, the TX record status is changed from pending to committed.
complete, releases locks on transaction-related objects, and deletes copies saved during the save phase.
clean, the TX record status is updated to complete.
delete, delete the TX record.
It looks a bit scary with a lot of steps, but it's actually just the details of the operation, which is similar to 2PC's Prepare-Commit.
The above is the normal execution flow when there is no conflict. Once a conflict occurs, another processing flow will be entered:
Decide, get the TX id that occupies it from the object lock, and judge the TX status. If it is pending, it means that it has not started, and then change it to roll-back (so the normal process above only has the verify step).
complete, if the transaction status is committed, continue to execute the normal commit process; if it is roll-back, execute the rollback process.
clean, same as the clean operation above.
It can be seen that transactions can help each other to advance the process. This more radical mechanism is of course conducive to promoting the completion of transactions as soon as possible, but it also intensifies competition. Frequent mutual rollbacks are predictable. Therefore, you can consider adding waiting before the decide step to alleviate it.
In addition to the competition between transactions, DynamoDB also supports multiple coordinators to process the same transaction, but the idea of solving the competition is similar, so I won't go into details.
It should be noted that the modification operation in the apply phase will directly update the object, and the modification will be visible even if the transaction is not committed. Of course, it can be avoided by adding read locks.
Implementing transactions on the server side
The client-side implementation is always inconvenient and error-prone, so in 2018, DynamoDB finally integrated the transaction function on the server side.
// The following omit the definition code of checkCustomerValid, markItemSold and createOrder
Collection actions = Arrays.asList(
new TransactWriteItem().withConditionCheck(checkCustomerValid),
new TransactWriteItem().withUpdate(markItemSold),
new TransactWriteItem().withPut(createOrder));
TransactWriteItemsRequest placeOrderTransaction = new TransactWriteItemsRequest()
.withTransactItems(actions)
.withReturnConsumedCapacity(ReturnConsumedCapacity.TOTAL);
// Execute the transaction and process the result.
try {
client.transactWriteItems(placeOrderTransaction);
System.out.println("Transaction Successful");
} catch (ResourceNotFoundException rnf) {
System.err.println("One of the table involved in the transaction is not found" + rnf.getMessage());
} catch (InternalServerErrorException ise) {
System.err.println("Internal Server Error" + ise.getMessage());
} catch (TransactionCanceledException tce) {
System.out.println("Transaction Canceled " + tce.getMessage());
}
Regardless of the implementation of the client or server, because DynamoDB adopts the W+R read-write mode, even if the transaction is successfully executed, other requests may still read outdated copies. However, you can set the ConsistentRead parameter to force reading enough replicas to get the latest data.
As shown in the sample code above, through the transaction API centered on transactWriteItems and transactGetItems, a few lines of familiar code can achieve the effect of the transaction.
In addition, in order to reduce the user experience problem caused by a large number of transaction failures caused by network problems, DynamoDB only supports transactions in the same region (simply understood as a computer room on AWS), and does not support transactions on the global table across regions.
BASE-based distributed transactions
DynamoDB is good, but what if our existing system does not use DynamoDB, or if DynamoDB is used, but the transaction also contains other databases and services (heterogeneous distributed transactions), what should we do?
sacrificing consistency
Thinking back, since what BASE wants to do is to sacrifice consistency for availability and performance (and to some extent availability). Then sacrifice first, how to sacrifice consistency?
Looking back at the previous articles, the strong consistency models we mentioned, such as single-master synchronization, 2PC, and Paxos, are very different, but they all have one thing in common: data interaction through synchronization.
If you want to sacrifice consistency, just change synchronous to asynchronous. This is in line with what we talked about in Part 8 of the series, and also addresses the need for improved performance.
To do asynchronous, the most commonly used is MQ.
To illustrate with an example (Dan Pritchett from Ebay).
There is an e-commerce system in which there are two tables, user and transaction, which record user and transaction information respectively.
Every time a transaction is generated, a transaction record is generated and the amt_sold or amt_bought field in the user table is updated. Obviously, the data of these two tables needs to be correlated and consistent.
In the beginning, transactions under ACID will suffice:
begin transaction
insert into transaction(xid, seller_id, buyer_id, amount);
update user set amt_sold=amt_sold+$amount where id=$seller_id;
update user set amt_bought=amt_bought+$amount where id=$buyer_id;
end transaction
The pseudocode after transforming into BASE will probably look like this:
begin transaction
insert into transaction(xid, seller_id, buyer_id, amount);
queue message "update user("seller", seller_id, amount)";
queue message "update user("buyer", buyer_id, amount)";
end transaction
for each message in queue
begin transaction
dequeue message
if message.balance = "seller"
update user set amt_sold=amt_sold+message.amount where id=message.id
else
update user set amt_bought=amt_bought+message.amount where id=message.id
end if
end transaction
end for
After the introduction of MQ decoupling, the update of the two tables is not completed in one transaction, and the strong consistency of data is not guaranteed, but the performance is greatly improved.
regain consistency
Sacrificing consistency is a temporary compromise, not giving up, and eventual consistency is still guaranteed.
How to guarantee it? To solve a problem, you must first find the problem.
Let's first look at a few places where consistency may occur:
Inserting into the user table and sending a message to MQ, how to guarantee the transactionality of these two operations
How to guarantee the transactionality of dequeue message and inserting data into user table
In fact, it is the heterogeneous transactional guarantee of the database and MQ.
The first method, it is easy to think of 2PC, otherwise I talked about it in vain before). This requires MQ to support the pre-submission of messages, such as RocketMQ.
In the first stage, database transactions are initiated and messages are pre-committed to MQ.
In the second stage, if the database transaction is successfully executed, the message is formally submitted to MQ, otherwise the message is cancelled or discarded after a timeout.
Of course, since the database already supports transactions, the actual writing method is not strictly two-phase, but embeds the pre-commit and formal commit/rollback of the message into the transaction code of the database. A simple example of pseudocode is as follows:
begin transaction
try
database.update_row()
mq.prepare_message()
except
database.rollback()
mq.cancle_message()
else
database.commit()
mq.commit_message()
end transaction
The transaction of producing messages is solved, and the transaction of consuming messages is similar, which is implemented by using the ACK function of MQ.
However, if the MQ used in the system does not support message pre-commit, how can heterogeneous transactions be implemented?
begin transaction
try
database.update_row()
mq.commit_message()
except
database.rollback()
else
database.commit()
end transaction
The so-called idempotency means that no matter how many times the same operation is performed, the result is the same. For example a = 1 is idempotent, a++ is not idempotent.
So there is a second way, which requires the consumer of downstream messages to support idempotent operations.
The so-called idempotency means that no matter how many times the same operation is performed, the result is the same. For example a = 1 is idempotent, a++ is not idempotent.
One of the easiest ways to ensure idempotency is to give each message a unique id, and downstream maintain a cache of message ids that have been consumed. Every time a message is consumed, check the id. If it is in the cache, it means that the id has been consumed. Once processed, it should be discarded.
(This also involves the exactly once problem that we have mentioned several times. For a more complete solution, we should wait for the system to talk later.)
In this way, a distributed system that provides BASE guarantees is formed.
We use MQ asynchronous decoupling to improve the overall performance of the system. The price that comes with it is that only eventual consistency (E) can be provided, and an intermediate state (S) will appear in the process.
In the decoupled system, the failure of some components will only lead to partial unavailability, while the overall availability (BA) of the system can be preserved to the greatest extent.
TL;DR
Single-machine transactions automatically evolve into distributed scenarios, and a solution similar to ACID on 2PC is obtained, but the performance and availability are not good enough.
BASE reduces the need for consistency for better availability and performance.
The above article introduced Dynamo, which is a typical eventually consistent distributed database, and it is easy to think of using it to implement BASE.
DynamoDB can implement transactions in two ways: client and server.
If DynamoDB is not used or not only used, you need to implement heterogeneous transactions yourself.
Heterogeneous BASE transactions can be implemented based on 2PC.
Heterogeneous BASE transactions can also be implemented by supporting idempotency.
In the 10th article of the series, we introduced distributed transactions under strong consistency, and this article learned about distributed transactions under weak consistency.
Then there is the question: when to use which?
(If you are still asking which is better, then you need to get used to it gradually. There is often no best, only trade-off is better.)
The advantages and disadvantages of both models are very obvious, so neither can dominate the world, nor can they be replaced. We need ourselves to choose flexibly according to application scenarios.
For example, the inventory quantity in the product details page on Amazon does not need to be accurate all the time but has a huge number of visits, so you can use BASE; and if your system has a cash transfer function, your worries about losing money far outweigh your complaints about performance , then you can consider continuing ACID under 2PC.
Well, that concludes the introduction to distributed transactions. Careful people should be able to find that I gradually introduced the topic of distributed transactions from data consistency. This is a continuation of my thinking in writing this series, so there will be a focus.
Therefore, I don't want to exhaustively list all the implementations of distributed transactions. For example, TCC only mentions it, and SAGA doesn't even mention it. It also did not carefully talk about the very important isolation level in the transaction, which is reflected in each implementation. Interested students can learn more about it.
The next article is the last article in this chapter on data consistency. Let's summarize together, where is the source of the consistency problem.
This is a carefully conceived series of 20-30 articles. I hope to let everyone have a basic and core grasp of the distributed system in a story-telling way. Stay Tuned for the next one!
The Other Type of Consistency- Part 12 of About Distributed Systems
Learning about Distributed Systems - Part 14: Causes of Inconsistency
64 posts | 54 followers
FollowAlibaba Cloud Native Community - July 4, 2023
Alibaba EMR - February 10, 2022
ApsaraDB - November 1, 2022
Alibaba Clouder - July 26, 2021
ApsaraDB - October 17, 2024
ApsaraDB - June 19, 2024
64 posts | 54 followers
FollowProvides scalable, distributed, and high-performance block storage and object storage services in a software-defined manner.
Learn MoreMulti-source metrics are aggregated to monitor the status of your business and services in real time.
Learn MoreBuild business monitoring capabilities with real time response based on frontend monitoring, application monitoring, and custom business monitoring capabilities
Learn MorePlan and optimize your storage budget with flexible storage services
Learn MoreMore Posts by Alibaba Cloud_Academy