By Mengshi
PolarDB-X 2.0 (hereinafter referred to as PolarDB-X) and DRDS (Distributed Relational Database Service, also known as PolarDB-X 1.0) are both distributed database products on Alibaba Cloud. It seems that they both adopt a Share-Nothing structure, using horizontal scaling to solve the bottleneck problem of stand-alone databases. Many of you may want to ask: what are the differences between them?
In essence, DRDS is a database-sharding and table-sharding middleware built on ApsaraDB RDS for MySQL. DRDS is highly flexible. PolarDB-X is a distributed database that uses cloud-native technology and provides an integrated database experience. Its storage node is a highly customized MySQL database, which provides many capabilities that middleware cannot provide, such as strong consistent distributed transactions using global MVCC, performance improvement brought by private RPC protocol, and the consistent reading capability on followers.
This article analyzes the similarities and differences between PolarDB-X and DRDS from different perspectives.
First, let's look at their similarities:
Next, let's look at their differences:
Setting aside technical principles, let's first look at the similarities and differences in the most intuitive product experiences.
Since DRDS is a middleware, the boundary between DRDS and MySQL is clear. DRDS itself does not contain MySQL(RDS) resources, so users must purchase by themselves. You need to purchase the two products separately in their consoles and assemble them together in the DRDS console.
By contrast, PolarDB-X provides an overall database service, so you only need to create a PolarDB-X instance, which contains the required computing resources, and storage resources.
In DRDS, you need to create a database in the console, and during this process, you need to select existing or purchase new MySQL resources:
In the PolarDB-X, you can use the tools you are familiar with to connect like when you use a stand-alone database, and then use the CREATE DATABASE command to create the database:
In DRDS, you need to evaluate the capacity of each MySQL database and select which database to move to the new MySQL storage.
In PolarDB-X, you only need to select the number of nodes, and the data will be automatically evenly distributed on each storage node.
If you want to synchronize data from a DRDS instance to a downstream database, you must use DTS to subscribe to each MySQL instance and carefully handle the details such as table shard name differences in the same logical table. In addition, DDL operations can interrupt the synchronization.
PolarDB-X provides a unified binlog service, which can be subscribed to like a stand-alone MySQL by using DTS. This binlog service is fully compatible with MySQL, which shields all distributed details and appears like a normal stand-alone MySQL to downstream services. For example, PolarDB-X supports all BINLOG-related instructions including SHOW BINLOG EVENTS.
In DRDS, you can use read-only instances (secondary databases) to perform some high-consumption SQL statements to avoid impact on online business. However, you need to determine the type of these SQL statements by yourself and put them in the right place to execute them by using HINT, different connection strings, etc. At the same time, you need to be aware of the delay on the secondary database and transform your business system to tolerate this delay.
In PolarDB-X, the application can be realized with a connection string. You don't need to pay attention to the types and costs of these SQL statements (you don't need to add HINT to them). Its optimizer will automatically identify the costs of these SQL statements and use the correct resource pool to execute them, thus avoiding AP's SQL statements from affecting TP's SQL statements as much as possible. At the same time, the PolarDB-X storage node supports consistent read on the followers, so you do not need to worry about getting the old data when reading the secondary database. You can read the latest data anytime.
DRDS allows you to use MySQL instances that you purchase for components, so you have full O&M permissions on these MySQL instances. You can do whatever you want on them, for example:
• If the load is unbalanced, you can upgrade the specification of one of the nodes separately.
• You can assign one of the storage nodes to another business.
• You can use ApsaraDB RDS 5.6, 5.7, or 8.0.
• You can subscribe to any of the binlog of an ApsaraDB RDS for MySQL instance.
However, this flexibility has risks. For example, there is no way to prevent you from directly deleting one of the database shards. This deletion will cause DRDS to fail to access the data on this database shard.
PolarDB-X storage node is shielded from the user. You cannot and do not need to directly access its storage node. It presents the overall perspective of a database to the user. It reduces your demand for direct access to the storage node through automatic load balancing, logical binlog, mixed-load HTAP, and other capabilities. At present, PolarDB-X DN is mainly based on MySQL 5.7, and support for subsequent 8.0 is also in the plan.
Many of the above differences are determined by their architectures. Let's take a look at the differences between PolarDB-X and DRDS in terms of architecture.
This is the architecture diagram of DRDS:
In the architecture of DRDS, many features depend on the peripheral control system, such as:
• Expansion uses an internal component called Jingwei.
• Metadata requires a storage named Diamond is shared within a region.
• Primary/secondary probing and switchover depend on a component called ADHA.
• Others
PolarDB-X architecture diagram:
In PolarDB-X, all core features are integrated into the kernel.
1) PolarDB-X uses X-DB as its DN (Data Node) X-DB uses Paxos to achieve RPO=0.
2) Compared with DRDS, PolarDB-X introduces a new component: GMS (Global Meta Service). It plays an important role:
3) The scale-out of DRDS is based on binlogs and depends on the peripheral control system. The scale-out operations of PolarDB-X instances are completed by using the kernel based on distributed transactions.
4) The architecture continues to be refined. Let's look at its data distribution:
RDS in DRDS is a traditional primary/secondary (or three-node) architecture. The primary and secondary databases are based on instances. In normal cases, the secondary database does not provide services:
The DNs under the PolarDB-X are all in a three-node architecture. The Paxos group is based on shards. A node can be the leader of one shard and the follower of another shard at the same time. This improves resource utilization.
The transaction implementation mechanism is the most fundamental feature of a database. The transaction mechanism of PolarDB-X is very different from that of DRDS.
DRDS uses the XA transaction provided by MySQL. XA transactions ensure the atomicity of write operations.
However, a problem with standard XA is that it may read committed data in one shard and uncommitted data in another shard.
For example, there are two empty tables t1(pk,name,addr) dbpartition by hash(pk)
, and t2(pk,name,addr) dbpartition by hash(name)
. If the application performs an insert operation on two tables in transaction 1, insert into t1 values (1,'sun','hz')
and insert into t2 values (1,'sun','hz')
:
begin;
insert into t1 (pk,name,addr) values (1,'sun','hz');
insert into t2 (pk,name,addr) values (1,'sun','hz');
prepare p1;
prepare p2;
commit p1;
commit p2;
At the same time, if another read-only transaction performs the count operation on t1 and t2 respectively, they may read different results.
Look at the following timeline:
At t1, if you query the t1 and t2 tables in a transaction, you will get two different numbers of records. This causes inconsistent results.
In DRDS, to solve this problem, the implementation of locking is used, which means a high cost when there are many conflicts.
PolarDB-X uses self-developed global MVCC transactions. In addition to the two-phase commit protocol (2PC), transaction snapshot timestamps (snapshot_ts) and commit timestamps (commit_ts) are supported. The timestamp is allocated by the global TSO, so it can ensure external transaction consistency and avoid additional locking. In the preceding example, the time of t1 is later than the time of commit. Therefore, the result that both tables are 1 can be read.
Compared with DRDS, the performance of PolarDB-X is greatly improved in several aspects.
DRDS connects to ApsaraDB for RDS by using standard RDS access links, which need to be transferred through SLB. This increases the network latency of one hop.
The PolarDB-X CN node and DN node are in the same physical network, and their connection is directly point-to-point without any SLB or LVS transfer, so the network latency is the lowest. The following figure shows the network topology from CN to DN:
DRDS uses the standard MySQL protocol to connect to RDS and sends standard SQL statements. But there will be a lot of overhead here, for example:
• After the SQL statement is optimized by the DRDS optimizer, it needs to be optimized again by the MySQL optimizer. If multiple MySQL shards are involved, there will be more repetitions.
• There are many redundant elements in the MySQL protocol, such as the header of the result set, which stores unnecessary information such as the name and type of each row of the result set.
• The data format returned by the MySQL protocol is not the same as the data format used by DRDS for internal computing. Another conversion is needed.
• DRDS uses a connection pool to connect to MySQL. MySQL connections are bound to threads. Only one SQL statement can be executed on the same connection at a time. This means a large number of connections between DRDS and RDS are maintained.
To solve these problems in DRDS, PolarDB-X has introduced many customizations for MySQL, and the private RPC protocol is used for intermediate communication. Compared with the MySQL protocol, the RPC protocol has the following advantages:
• What is passed is no longer SQL statements, but the execution plan, avoiding the cost of MySQL repeatedly parsing and optimizing SQL statements.
• The asynchronous model is used. The connections and threads, or connections and sessions are not bound one by one, and fewer connections are needed to meet the requirements.
• Unnecessary information in communication is deleted, such as the result set header.
• Data are transmitted in the same format as the data for CN computing, preventing another data conversion.
Using the private RPC protocol, PolarDB-X has better performance in many scenarios compared with DRDS.
• 160 million rows of data
• 300 concurrent requests
• Compute node and storage node specifications: 16 cores 64 GB
• +39%
Node CPU | QPS | RT-AVG | RT-MAX | RT-95% | |
PolarDB-X | 710.6% | 97067.20 | 3.09ms | 108.12ms | 6.53ms |
DRDS | 1289% | 69787.34 | 4.30ms | 110.30ms | 10.67ms |
• 160 million rows of data
• 150 concurrent requests
• Compute node and storage node specifications: 16 cores 64 GB
• +14.4%
Node CPU | QPS | RT-AVG | RT-MAX | RT-95% | |
PolarDB-X | 1139% | 22587.23 | 119.52ms | 757.47ms | 471.02ms |
DRDS | 1236% | 19732.12 | 136.82ms | 798.47ms | 415.74ms |
DRDS uses the SMP (single-machine parallel) technology, and PolarDB-X uses the MPP (multi-machine parallel) technology. Different technologies make PolarDB-X able to use more resources to accelerate complex analysis queries than DRDS. This performance difference is very significant in TPC-H. The following is a TPC-H comparison between DRDS and PolarDB-X in the case of the same resource:
The total duration of DRDS is 386 sec, and the total duration of PolarDB-X is 274 sec.
PolarDB-X stands out from DRDS in several aspects. While DRDS represents the middleware of database and table sharding, PolarDB-X is a cloud-native distributed database. They will continue to coexist and cater to users with varying needs.
If you have any questions, please feel free to leave a message in the comments section.
Interpretation of Global Binlog and Backup and Restoration Capabilities of PolarDB-X 2.0
ApsaraDB - April 20, 2023
ApsaraDB - April 10, 2024
digoal - October 10, 2023
Alibaba Clouder - May 20, 2020
ApsaraDB - October 24, 2022
ApsaraDB - June 13, 2024
Alibaba Cloud PolarDB for MySQL is a cloud-native relational database service 100% compatible with MySQL.
Learn MoreAnalyticDB for MySQL is a real-time data warehousing service that can process petabytes of data with high concurrency and low latency.
Learn MoreCustomized infrastructure to ensure high availability, scalability and high-performance
Learn MoreAccelerate software development and delivery by integrating DevOps with the cloud
Learn MoreMore Posts by ApsaraDB