This topic describes how to use the cluster script feature of E-MapReduce (EMR) to quickly deploy and use the MirrorMaker 2.0 (MM2) service to synchronize data.
Background information
In this topic, an EMR Dataflow cluster is used as the cluster to which data is to be synchronized, and MM2 is deployed in the cluster in dedicated mode. This way, the EMR Dataflow cluster is used as both a destination cluster and a dedicated MirrorMaker cluster. In actual business scenarios, you can deploy a MirrorMaker cluster on a separate server.
- Remote data synchronization: You can use Kafka MM2 to synchronize data among clusters in different regions.
- Disaster recovery: You can use Kafka MM2 to build a disaster recovery architecture that consists of primary and secondary clusters in different data centers. Data in the two clusters can be synchronized in real time. If one cluster becomes unavailable, you can transfer applications in the cluster to a different cluster. This ensures geo-disaster recovery.
- Data migration: In scenarios such as cloud migration of businesses, hybrid clouds, and cluster upgrades, data needs to be migrated from the original cluster to a new cluster. You can use Kafka MM2 to migrate data to ensure business continuity.
- Data aggregation: You can use Kafka MM2 to synchronize data from multiple Kafka sub-clusters to a Kafka central cluster. This way, data can be aggregated.
- Replicates the data and configuration information of topics.
- Replicates the offset information of consumer groups and the consumed topics.
- Replicates access control lists (ACLs).
- Automatically detects new topics and partitions.
- Provides Kafka MM2 metrics.
- Provides high-availability architectures that are horizontally scalable.
- Method 1 (Recommended): Run MM2 connector tasks in an existing distributed Kafka Connect cluster. For more information, see Use Kafka MM2 to synchronize data across clusters.
- Method 2: Deploy a dedicated MirrorMaker cluster. This way, you can run driver programs
to manage all MM2 tasks.
You can run driver programs to manage MM2 tasks by referring to this topic.
- Method 3: Run a MirrorSourceConnector task on a single Connect worker. This method is suitable for test scenarios.
Prerequisites
- Two clusters are created and the Kafka service is selected from the optional services
during cluster creation. One is the source cluster and the other is the destination
EMR Dataflow cluster. For more information about how to create a Dataflow cluster,
see Create a cluster.
Note In this example, the source and destination clusters are both Dataflow clusters of EMR V3.42.0.
- A bucket is created in Object Storage Service (OSS). For more information, see Create buckets.
Limits
The version of the Kafka service that is selected for EMR Dataflow clusters must be 2.12_2.4.1 or later.