PolarDB-X provides binary logs (binlogs) in two modes. The two modes can be used at the same time.
Single-stream mode: The single-stream mode is also referred to as the Global Binlog mode. In this mode, the binlogs of all data nodes (DNs) are merged into a global queue. A single log stream that ensures the integrity and order of transactions is provided. The single-stream mode provides a stronger guarantee for data consistency. For example, in the transfer scenario, a consistent balance can be queried at any time in downstream MySQL databases that subscribe to the single-stream binlogs of a PolarDB-X instance.
Multi-stream mode: The multi-stream mode is also referred to as the Binlog-X mode. In this mode, the binlogs of all DNs are not merged into a global queue. Instead, the binlogs are distributed to different log streams by using a hash algorithm. The multi-stream mode compromises the integrity of transactions to some extent, but greatly improves the extensibility. The single-point bottleneck faced by single-stream binlogs of large-scale clusters can be resolved.
Single-stream mode
The raw binlogs of all DNs are sorted and merged into one queue, and the internal details are removed. This way, PolarDB-X provides a log stream that is compatible with the binlog format and dump protocol of MySQL. By default, the single-stream binlog feature is enabled when you purchase a PolarDB-X instance.
Usage notes
The master and slave nodes of the change data capture (CDC) component replicate binlog files from each other to ensure data consistency between the two nodes. Downstream systems consume binlogs based on the file name and position. The file name and position do not change even if a switchover occurs between the master and slave nodes.
Distributed transactions can be merged only if the transaction policy is set to Timestamp Oracle (TSO). Otherwise, only eventual consistency can be ensured for data. By default, PolarDB-X uses TSO as the transaction policy.
If you need to modify the partition key value for a data row, make sure that value is modified within a distributed transaction whose transaction policy is set to TSO. This way, the DELETE event is recorded in binlogs earlier than the INSERT event. This ensures data consistency. Specifically, to modify a partition key value with data consistency ensured, you must first set the transaction policy to TSO, and then perform one of the following operations:
Execute an UPDATE statement to modify the partition key value.
Execute a REPLACE statement to modify the partition key value.
Explicitly start a transaction to execute a DELETE statement, modify the partition key value, and then execute an INSERT statement.
Multi-stream mode
Multi-stream binlogs are also fully compatible with the binlog format and dump protocol of MySQL. Each binlog stream can be treated as a log stream from a standalone MySQL database. SQL statements such as CHANGE MASTER and SHOW BINLOG EVENTS can be executed on each log stream to consume or view binlogs.
By default, the multi-stream binlog feature is disabled. To use the multi-stream binlog feature, you must separately enable this feature in the console. You can create multiple multi-stream groups for a PolarDB-X instance. Each multi-stream group contains multiple log streams. Different groups are isolated from each other. You can configure parameters such as the number of log streams and the splitting level for a multi-stream group based on your business requirements.
Splitting levels
The multi-stream binlog feature provides three splitting levels. You can configure splitting levels based on your business scenario when you create a multi-stream group.
Database level (in sequence)
Binlogs are distributed to different log streams based on the hash values that are calculated by using database names. This way, the binlogs of a database are always routed to the same log stream in sequence. This splitting level is suitable for scenarios in which a single PolarDB-X instance has a large number of databases. If a transaction does not involve cross-database operations, the integrity of the transaction can be ensured in binlogs that are split at the database level.
Table level (in sequence)
Binlogs are distributed to different log streams based on the hash values that are calculated by using table names. This way, the binlogs of a table are always routed to the same log stream in sequence. This splitting level is suitable for scenarios in which a large number of tables exist, and the operations on a single table, such as DML and DDL operations, are expected to be kept in sequence in a log stream.
Row level (in sequence)
Binlogs are distributed to different log streams based on the hash values that are calculated by using the primary key values of data rows. This way, the binlogs of a data row are always routed to the same log stream in sequence. This splitting level is suitable for scenarios in which binlogs are expected to be fully dispersed and do not need to be kept in sequence for a database or a table. To use this splitting level, make sure that the data tables contain a primary key. Binlogs of a table that does not contain a primary key are directly discarded.
You can configure splitting levels at the service layer and the database and table layer. After a splitting level is configured at a layer, you cannot modify the splitting level. Otherwise, the same binlogs appear in different log streams. This causes data inconsistency. Before you create a multi-stream group, we recommend that you plan the splitting levels based on your business requirements.
Service layer
The splitting level configured at the service layer is the default splitting level of a multi-stream group. If you do not configure a splitting level for a database or a table, the splitting level configured at the service layer is used.
Database and table layer
You can separately configure a splitting level for a database or table. The splitting level configured for a database or table overrides the splitting level configured at the service layer. This meets the requirements for differentiated management.
If the splitting level is set to row level (in sequence), take note of the following restriction: If a data table contains the UNIQUE constraint, and unique key swapping occurs, data inconsistency may occur, the value a of the unique key name is successively held by the data rows whose id values are 1 and 2. The execution order of the delete(id=1,name=1) and insert(id=2,name=a) statements in the destination database is uncertain. If the insert(id=2,name=2) statement is executed before the delete(id=1,name=1) statement, a write conflict occurs. In similar scenarios, we recommend that you set the splitting level to table level (in sequence).
Usage notes
After you create a multi-stream group, you cannot adjust the number of log streams. Plan the number of log streams before you create a multi-stream group. We recommend that you configure the number of log streams to be greater than or equal to the number of DNs.
After you create a multi-stream group, you cannot modify the splitting levels that take effect. We recommend that you plan the splitting levels before you create a multi-stream group.
If you want to separately configure a splitting level for a new table, configure the splitting level before data is written to the table.
If the splitting level is set to table level (in sequence), you can still rename tables. The system always distributes data based on the initial table names.
If the splitting level is set to table level (in sequence), the binlogs of a large table may always be routed to some specific log streams. This causes a data skew issue. In this case, you can separately configure a splitting level for the large table.
If you want to adjust the number of log streams or the splitting levels that take effect, you can create another multi-stream group to replace the original multi-stream group. In this case, some O&M operations need to be performed in downstream systems to adjust the log consumption.
The following table describes how DDL statements are distributed to each log stream.
Splitting level
Distribution mode
Description
Database level
Unicast + broadcast
The CREATE DATABASE and DROP DATABASE statements are broadcast to all log streams. This is because a separate splitting level can be set for a table in a database. Therefore, database creation and deletion must be broadcast.
Other types of DDL statements are unicast to a fixed log stream.
Table level
Unicast + broadcast
The CREATE DATABASE and DROP DATABASE statements are broadcast to all log streams. This is because a separate splitting level can be set for a table in a database. Therefore, database creation and deletion must be broadcast.
Other types of DDL statements are unicast to a fixed log stream.
Row level
Broadcast
All types of DDL statements are broadcast to all log streams.
Transparent consumption of binlogs
The CDC component preferentially saves binlog files on local disks, and can upload the files to a remote storage such as Object Storage Service (OSS) in real time. Generally, the files are stored on the local disks for a short period of time, and on the remote storage for a long period of time, such as 15 days. The CDC component provides the transparent consumption feature that shields the storage differences between the local disks and remote storage. Downstream systems can access the binlog files on the remote storage without any adaptation. The transparent consumption feature is supported in CDC V2.0.0 or later.