By Changfeng
Change Data Capture (CDC) refers to an application scenario that listens to upstream data changes and synchronizes the changed information to downstream services for further processing. In recent years, the popularity of event-driven architecture (EDA) has increased, which makes it the first choice for project architecture designers. EDA fits into the CDC's underlying infrastructure, which takes data changes as events. Each service completes business drivers by listening to events of interest. EventBridge is a Serverless event bus service launched by Alibaba Cloud, which helps users build applications based on the EDA architecture. Recently, EventBridge event streams have supported CDC capabilities based on Alibaba Cloud DTS [1] services. This article introduces how to use EventBridge to easily build CDC applications from the aspects of CDC, CDC's application on EventBridge, and several best practice scenarios.
CDC captures incremental data and data schema changes from the source database and synchronizes these changes to the destination database, data lake, or other data analysis services in an orderly manner in a highly reliable and low-latency data transmission. Currently, the mainstream open-source CDC tools in the industry include Debezium [2], Canal [3], and Maxwell [4].
Pictures source: https://dbconvert.com
The industry mainly has the following types of CDC implementations:
The timestamp-based method requires the database table to have a field representing the updated timestamp. When data is inserted or updated, the corresponding timestamp field will be updated accordingly. The CDC component periodically retrieves data records that were updated longer than the last synchronization time to capture changes to the data during the current period. The principles of version-based tracking and timestamp-based tracking are the same. Developers must update the version number of data when changing data.
The snapshot-based CDC implementation uses three copies of the data source at the storage level: the original data, the previous snapshot, and the current snapshot. Obtain the data changes between the two snapshots by comparing the differences between the two snapshots.
The trigger-based CDC implementation establishes a trigger on the source table to store the data change operation (INSERT, UPDATE, and DELETE) records. For example, a table is created to record user changes, and three types of triggers are created to synchronize user changes to this table.
The three methods are intrusive to the source database, while the log-based method is a non-intrusive CDC method. The database uses transaction logs to implement disaster recovery. For example, MySQL binlog records all user changes to the database. Log-based CDC continuously monitors transaction logs to obtain changes in the database in real-time.
CDC has a wide range of application scenarios, including (but not limited to) these aspects: remote data center database synchronization, heterogeneous database data synchronization, microservice decoupling, cache update, and CQRS.
Alibaba Cloud Data Transmission Service (DTS) is a real-time data streaming service. DTS supports data transmission between data sources (such as relational, NoSQL, and online analytical processing (OLAP) databases). DTS provides data synchronization, data migration, change tracking, data integration, and data processing features, enabling you to manage data within a secure, scalable, and high-availability architecture. A DTS data subscription [5] helps you obtain real-time incremental data from user-created MySQL, ApsaraDB RDS for MySQL, and Oracle databases.
Alibaba Cloud EventBridge provides event bus [6] and event stream [7] routing services in different application scenarios.
The underlying layer of an event bus has the persistence capability of events and can route events to multiple event targets as needed.
The event stream is suitable for end-to-end streaming data processing scenarios. Events generated from the source end are extracted, converted, and analyzed in real-time and loaded to the destination end without creating an event bus. The end-to-end dump is more efficient and easier to use.
EventBridge supports the data subscription feature of Alibaba Cloud DTS at the event stream source to help support your needs in CDC scenarios. You can synchronize database changes to EventBridge event streams with simple configurations.
EventBridge customizes DTS Source Connector based on DTS SDKs. When you configure an event stream whose event provider is DTS, the source connector pulls DTS record data from the DTS server in real-time. After the data is pulled to a local device, a certain structure is encapsulated to retain the data (such as id, operationType, topicPartition, beforeImage, and afterimage). At the same time, some system attributes required for streaming events are added.
Please see the EventBridge official documentation for a sample DTS event.
EventBridge Streaming ensures the sequence of DTS events, but the event may be delivered repeatedly. EventId ensures a one-to-one mapping relationship with each DTS record. You can perform idempotent processing on events based on this field.
The following shows how to create an event stream whose source is DTS in the EventBridge console.
1) Log on to the EventBridge console. Click Event Stream on the left-side navigation pane. Click Create Event Stream on the Event Stream page.
2) Fill in the Event Flow Name and Description in Basic Information as required.
3) When creating an event stream and selecting an event provider, select Database DTS from the drop-down list.
4) In the Data Subscription Tasks column, select the created DTS data subscription task. In the Consumer Group column, select the consumer group to consume subscription data and enter the consumer group password and initial consumption time.
5) Enter event stream rules and targets as required. Save and start to create an event stream that uses a DTS data subscription as the event source.
Note the following points when using:
In the Command Query Responsibility Segregation (CQRS) model, the command model is used to perform write and update operations, and the query model is used to support efficient read operations. There are certain differences between the data models used for read operations and write operations. You need to use certain methods to ensure data synchronization. Based on EventBridge event streams, CDC can meet this requirement.
Based on cloud services, users can easily build CQRS based on EventBridge in the following ways:
CDC can be used for microservice decoupling. For example, the following is an order processing system of an e-commerce platform. When a new unpaid order is generated, the database will have an INSERT operation. When the status of an order changes from Unpaid" to Paid, the database will have an UPDATE operation. According to the order status, the backend will have different microservices to handle this.
If the interface call method is used, the order system will need to call the cache update interface, the new order interface, and the order payment interface after the order is placed, and the business coupling is too high. This mode allows the data consumer not to worry about the semantic information of the content returned by the upstream order processing interface and directly determines whether and how the data change needs to be processed from the data level under the condition that the storage model remains unchanged. Message queue natural message accumulation capability can also help users achieve business peak-valley shifting when the peak order comes.
EventBridge Streaming supports other messaging products (such as RabbitMQ, Kafka, and MNS). Users can select based on their needs in practice.
Database disaster recovery and heterogeneous database data synchronization are important application scenarios for CDC. You can use Alibaba Cloud EventBridge to build such applications quickly.
You can use EventBridge to meet the needs of users for a self-built SQL audit.
This article introduces some concepts of CDC, the application of CDC on EventBridge, and several best practice scenarios. With the continuous increase of support products, the ecological map carried by EventBridge is also expanding. From message ecology to database ecology and from log ecology to big data ecology, EventBridge continues to expand its applicable fields and consolidate its position as an event hub on the cloud. In the future, EventBridge will continue to develop in this direction with deeper technology and wider ecology.
[1] DTS:
https://www.alibabacloud.com/product/data-transmission-service
[2] Debezium:
https://debezium.io/
[3] Canal:
https://github.com/alibaba/canal
[4] Maxwell:
https://github.com/zendesk/maxwell
[5] Overview of Change Tracking Scenarios:
https://www.alibabacloud.com/help/en/data-transmission-service/latest/overview-of-change-tracking-scenarios
[6] Event Bus:
https://www.alibabacloud.com/help/en/eventbridge/latest/event-bus-overview
[7] EventStreamings:
https://www.alibabacloud.com/help/en/eventbridge/latest/eventstreamings-overview
[8] SUBSCRIBE Consumption Pattern:
https://www.alibabacloud.com/help/en/data-transmission-service/latest/consume-tracked-data-use-the-sdk-demo-code-to-consume-tracked-data
Seata v1.5.1 Solves the Idempotence, Suspension, and Empty Rollback Problems of TCC Mode
How Does an Open-Source Workflow Engine Support an Enterprise-Level Serverless Architecture
507 posts | 48 followers
FollowAlibaba Cloud Native - November 13, 2024
Alibaba Cloud Community - December 21, 2021
Alibaba Cloud Native Community - November 23, 2022
Aliware - June 23, 2021
Alibaba Cloud Native - February 22, 2023
Alibaba Developer - April 19, 2022
507 posts | 48 followers
FollowSupports data migration and data synchronization between data engines, such as relational database, NoSQL and OLAP
Learn MoreTair is a Redis-compatible in-memory database service that provides a variety of data structures and enterprise-level capabilities.
Learn MoreTSDB is a stable, reliable, and cost-effective online high-performance time series database service.
Learn MoreProtect, backup, and restore your data assets on the cloud with Alibaba Cloud database services.
Learn MoreMore Posts by Alibaba Cloud Native Community