What are the Differences Between PolarDB-X and DRDS?

By Mengshi

PolarDB-X 2.0 (hereinafter referred to as PolarDB-X) and DRDS (Distributed Relational Database Service, also known as PolarDB-X 1.0) are both distributed database products on Alibaba Cloud. It seems that they both adopt a Share-Nothing structure, using horizontal scaling to solve the bottleneck problem of stand-alone databases. Many of you may want to ask: what are the differences between them?

In essence, DRDS is a database-sharding and table-sharding middleware built on ApsaraDB RDS for MySQL. DRDS is highly flexible. PolarDB-X is a distributed database that uses cloud-native technology and provides an integrated database experience. Its storage node is a highly customized MySQL database, which provides many capabilities that middleware cannot provide, such as strong consistent distributed transactions using global MVCC, performance improvement brought by private RPC protocol, and the consistent reading capability on followers.

This article analyzes the similarities and differences between PolarDB-X and DRDS from different perspectives.

First, let's look at their similarities:

They are both based on the Share-Nothing architecture and have strong horizontal scalability.
They are both based on the MySQL ecosystem and are highly compatible with MySQL.
They use the same SQL engine and have similar SQL execution capabilities.
They both provide advanced capabilities that are not available in common middleware, such as distributed transactions and global indexes.
They are both widely used within Alibaba, and their reliability and stability have stood the test of years of the Double 11 Shopping Festival.

Next, let's look at their differences:

User Experience

Setting aside technical principles, let's first look at the similarities and differences in the most intuitive product experiences.

1. Purchase the Instance

Since DRDS is a middleware, the boundary between DRDS and MySQL is clear. DRDS itself does not contain MySQL(RDS) resources, so users must purchase by themselves. You need to purchase the two products separately in their consoles and assemble them together in the DRDS console.

By contrast, PolarDB-X provides an overall database service, so you only need to create a PolarDB-X instance, which contains the required computing resources, and storage resources.

2. Build a Database

In DRDS, you need to create a database in the console, and during this process, you need to select existing or purchase new MySQL resources:

In the PolarDB-X, you can use the tools you are familiar with to connect like when you use a stand-alone database, and then use the CREATE DATABASE command to create the database:

3. Expansion

In DRDS, you need to evaluate the capacity of each MySQL database and select which database to move to the new MySQL storage.

In PolarDB-X, you only need to select the number of nodes, and the data will be automatically evenly distributed on each storage node.

4. Data Synchronization

If you want to synchronize data from a DRDS instance to a downstream database, you must use DTS to subscribe to each MySQL instance and carefully handle the details such as table shard name differences in the same logical table. In addition, DDL operations can interrupt the synchronization.

PolarDB-X provides a unified binlog service, which can be subscribed to like a stand-alone MySQL by using DTS. This binlog service is fully compatible with MySQL, which shields all distributed details and appears like a normal stand-alone MySQL to downstream services. For example, PolarDB-X supports all BINLOG-related instructions including SHOW BINLOG EVENTS.

5. Read/write Splitting

In DRDS, you can use read-only instances (secondary databases) to perform some high-consumption SQL statements to avoid impact on online business. However, you need to determine the type of these SQL statements by yourself and put them in the right place to execute them by using HINT, different connection strings, etc. At the same time, you need to be aware of the delay on the secondary database and transform your business system to tolerate this delay.

In PolarDB-X, the application can be realized with a connection string. You don't need to pay attention to the types and costs of these SQL statements (you don't need to add HINT to them). Its optimizer will automatically identify the costs of these SQL statements and use the correct resource pool to execute them, thus avoiding AP's SQL statements from affecting TP's SQL statements as much as possible. At the same time, the PolarDB-X storage node supports consistent read on the followers, so you do not need to worry about getting the old data when reading the secondary database. You can read the latest data anytime.

6. O&M

DRDS allows you to use MySQL instances that you purchase for components, so you have full O&M permissions on these MySQL instances. You can do whatever you want on them, for example:

• If the load is unbalanced, you can upgrade the specification of one of the nodes separately.

• You can assign one of the storage nodes to another business.

• You can use ApsaraDB RDS 5.6, 5.7, or 8.0.

• You can subscribe to any of the binlog of an ApsaraDB RDS for MySQL instance.

However, this flexibility has risks. For example, there is no way to prevent you from directly deleting one of the database shards. This deletion will cause DRDS to fail to access the data on this database shard.

PolarDB-X storage node is shielded from the user. You cannot and do not need to directly access its storage node. It presents the overall perspective of a database to the user. It reduces your demand for direct access to the storage node through automatic load balancing, logical binlog, mixed-load HTAP, and other capabilities. At present, PolarDB-X DN is mainly based on MySQL 5.7, and support for subsequent 8.0 is also in the plan.

Architecture Difference

Many of the above differences are determined by their architectures. Let's take a look at the differences between PolarDB-X and DRDS in terms of architecture.

This is the architecture diagram of DRDS:

In the architecture of DRDS, many features depend on the peripheral control system, such as:

• Expansion uses an internal component called Jingwei.

• Metadata requires a storage named Diamond is shared within a region.

• Primary/secondary probing and switchover depend on a component called ADHA.

• Others

PolarDB-X architecture diagram:

In PolarDB-X, all core features are integrated into the kernel.

1) PolarDB-X uses X-DB as its DN (Data Node) X-DB uses Paxos to achieve RPO=0.

2) Compared with DRDS, PolarDB-X introduces a new component: GMS (Global Meta Service). It plays an important role:

Provides the global auto-increment timestamps that are used by distributed transactions.
Evenly distributes data among nodes based on the loads.
Provides unified metadata, such as INFORMATION_SCHEMA.
Manages CNs and DNs, such as switching and logging in/logging out.

3) The scale-out of DRDS is based on binlogs and depends on the peripheral control system. The scale-out operations of PolarDB-X instances are completed by using the kernel based on distributed transactions.

4) The architecture continues to be refined. Let's look at its data distribution:

RDS in DRDS is a traditional primary/secondary (or three-node) architecture. The primary and secondary databases are based on instances. In normal cases, the secondary database does not provide services:

The DNs under the PolarDB-X are all in a three-node architecture. The Paxos group is based on shards. A node can be the leader of one shard and the follower of another shard at the same time. This improves resource utilization.

Transaction Model

The transaction implementation mechanism is the most fundamental feature of a database. The transaction mechanism of PolarDB-X is very different from that of DRDS.

DRDS uses the XA transaction provided by MySQL. XA transactions ensure the atomicity of write operations.

However, a problem with standard XA is that it may read committed data in one shard and uncommitted data in another shard.

For example, there are two empty tables t1(pk,name,addr) dbpartition by hash(pk), and t2(pk,name,addr) dbpartition by hash(name). If the application performs an insert operation on two tables in transaction 1, insert into t1 values (1,'sun','hz') and insert into t2 values (1,'sun','hz'):

begin;
insert into t1 (pk,name,addr) values (1,'sun','hz');
insert into t2 (pk,name,addr) values (1,'sun','hz');
prepare p1;
prepare p2;
commit p1;
commit p2;

At the same time, if another read-only transaction performs the count operation on t1 and t2 respectively, they may read different results.

Look at the following timeline:

At t1, if you query the t1 and t2 tables in a transaction, you will get two different numbers of records. This causes inconsistent results.

In DRDS, to solve this problem, the implementation of locking is used, which means a high cost when there are many conflicts.

PolarDB-X uses self-developed global MVCC transactions. In addition to the two-phase commit protocol (2PC), transaction snapshot timestamps (snapshot_ts) and commit timestamps (commit_ts) are supported. The timestamp is allocated by the global TSO, so it can ensure external transaction consistency and avoid additional locking. In the preceding example, the time of t1 is later than the time of commit. Therefore, the result that both tables are 1 can be read.

Higher Performance

Compared with DRDS, the performance of PolarDB-X is greatly improved in several aspects.

1) Streamlined Network Structure

DRDS connects to ApsaraDB for RDS by using standard RDS access links, which need to be transferred through SLB. This increases the network latency of one hop.

The PolarDB-X CN node and DN node are in the same physical network, and their connection is directly point-to-point without any SLB or LVS transfer, so the network latency is the lowest. The following figure shows the network topology from CN to DN:

2) Private RPC Protocol

DRDS uses the standard MySQL protocol to connect to RDS and sends standard SQL statements. But there will be a lot of overhead here, for example:

• After the SQL statement is optimized by the DRDS optimizer, it needs to be optimized again by the MySQL optimizer. If multiple MySQL shards are involved, there will be more repetitions.

• There are many redundant elements in the MySQL protocol, such as the header of the result set, which stores unnecessary information such as the name and type of each row of the result set.

• The data format returned by the MySQL protocol is not the same as the data format used by DRDS for internal computing. Another conversion is needed.

• DRDS uses a connection pool to connect to MySQL. MySQL connections are bound to threads. Only one SQL statement can be executed on the same connection at a time. This means a large number of connections between DRDS and RDS are maintained.

To solve these problems in DRDS, PolarDB-X has introduced many customizations for MySQL, and the private RPC protocol is used for intermediate communication. Compared with the MySQL protocol, the RPC protocol has the following advantages:

• What is passed is no longer SQL statements, but the execution plan, avoiding the cost of MySQL repeatedly parsing and optimizing SQL statements.

• The asynchronous model is used. The connections and threads, or connections and sessions are not bound one by one, and fewer connections are needed to meet the requirements.

• Unnecessary information in communication is deleted, such as the result set header.

• Data are transmitted in the same format as the data for CN computing, preventing another data conversion.

Using the private RPC protocol, PolarDB-X has better performance in many scenarios compared with DRDS.

sysbench-select

• 160 million rows of data

• 300 concurrent requests

• Compute node and storage node specifications: 16 cores 64 GB

• +39%

	Node CPU	QPS	RT-AVG	RT-MAX	RT-95%
PolarDB-X	710.6%	97067.20	3.09ms	108.12ms	6.53ms
DRDS	1289%	69787.34	4.30ms	110.30ms	10.67ms

sysbench-oltp

• 160 million rows of data

• 150 concurrent requests

• Compute node and storage node specifications: 16 cores 64 GB

• +14.4%

	Node CPU	QPS	RT-AVG	RT-MAX	RT-95%
PolarDB-X	1139%	22587.23	119.52ms	757.47ms	471.02ms
DRDS	1236%	19732.12	136.82ms	798.47ms	415.74ms

3) The MPP Engine Accelerates Analysis Queries

DRDS uses the SMP (single-machine parallel) technology, and PolarDB-X uses the MPP (multi-machine parallel) technology. Different technologies make PolarDB-X able to use more resources to accelerate complex analysis queries than DRDS. This performance difference is very significant in TPC-H. The following is a TPC-H comparison between DRDS and PolarDB-X in the case of the same resource:

The total duration of DRDS is 386 sec, and the total duration of PolarDB-X is 274 sec.

Summary

PolarDB-X stands out from DRDS in several aspects. While DRDS represents the middleware of database and table sharding, PolarDB-X is a cloud-native distributed database. They will continue to coexist and cater to users with varying needs.

If you have any questions, please feel free to leave a message in the comments section.

Community

What are the Differences Between PolarDB-X and DRDS?

User Experience

1. Purchase the Instance

2. Build a Database

3. Expansion

4. Data Synchronization

5. Read/write Splitting

6. O&M

Architecture Difference

Transaction Model

Higher Performance

1) Streamlined Network Structure

2) Private RPC Protocol

sysbench-select

sysbench-oltp

3) The MPP Engine Accelerates Analysis Queries

Summary

Read previous post:

Read next post:

ApsaraDB

You may also like

Comments

ApsaraDB

Related Products

PolarDB for MySQL

AnalyticDB for MySQL

Architecture and Structure Design

DevOps Solution

A Free Trial That Lets You Build Big!