×
Community Blog Synchronize MySQL Data to Elasticsearch in Real-Time with Canal

Synchronize MySQL Data to Elasticsearch in Real-Time with Canal

This article outlines the step-by-step process to achieve efficient data synchronization and enhanced real-time search performance.

Alibaba Cloud Elasticsearch, a fully-managed service based on the open-source Elasticsearch engine, powers real-time search and analytics capabilities for a variety of applications. Pairing it up with Canal—an Alibaba Group's open-source tool—allows users to perform real-time data synchronization from MySQL to an Elasticsearch cluster, ensuring that the search engine reflects the most current data available.

Background information

Canal is adept at parsing binary log data from MySQL, providing a subscription and consumption model for incremental data updates. By using it as a subordinate to a MySQL instance, like ApsaraDB RDS for MySQL, it captures the binary logs and sends them to Elasticsearch. For further understanding of Canal's principles, refer to its comprehensive documentation. Here, we demonstrate how to use Canal with Alibaba Cloud's Elasticsearch to maintain high-performance data synchronization.

Before exploring the steps, ensure that you have set up an ApsaraDB RDS for MySQL instance an Elasticsearch cluster and an Elastic Compute Service (ECS) instance within the same Virtual Private Cloud (VPC) for optimal connectivity and security.

Prerequisites

  • An ApsaraDB RDS for MySQL instance running MySQL 5.7
  • An Alibaba Cloud Elasticsearch cluster, version 6.7 or higher
  • An ECS instance with CentOS 7.6 64-bit for deploying Canal

Limitations

1)Only incremental data synchronization is supported with this approach.

2)JDK version 1.8.0 or later is necessary.

3)Different Canal versions support specific Elasticsearch versions, e.g., Canal 1.1.5 for ES 7.X, and Canal 1.1.7 for ES 8.X.

4)Data types in MySQL must match the mappings in Elasticsearch.

5)Maintain Canal's availability to prevent disruptions in synchronization.

Procedure

Step 1: Setting up a MySQL Data Source

Firstly, you need to create a table within ApsaraDB RDS for MySQL. Below is an example SQL command to create a table:

-- create table
CREATE TABLE `es_test` (
    `id` bigint(32) NOT NULL,
    `name` text NOT NULL,
    `count` text NOT NULL,
    `color` text NOT NULL,
    PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARACTER SET=utf8;

Step 2: Creating an Index and Configuring Mappings in Elasticsearch

PUT /es_test?include_type_name=true
{
  "settings": {
    "index": {
      "number_of_shards": "5",
      "number_of_replicas": "1"
    }
  },
  "mappings": {
    "_doc": {
      "properties": {
        "count": {"type": "text"},
        "id": {"type": "integer"},
        "name": {"type": "text", "analyzer": "ik_smart"},
        "color": {"type": "text"}
      }
    }
  }
}

This creates an index es_test with the specified settings and mappings.

Step 3: Installing the JDK on ECS

SSH into your ECS instance and install the JDK required for Canal:

# Install JDK 1.8
sudo yum install java-1.8.0-openjdk-devel.x86_64
# Set environment variables
echo 'export JAVA_HOME=/usr/lib/jvm/jre-1.8.0-openjdk' >> ~/.bash_profile
echo 'export PATH=$PATH:$JAVA_HOME/bin' >> ~/.bash_profile
source ~/.bash_profile
# Verify installation
java -version

Step 4: Installing and Configuring Canal Server

Download and configure the Canal server appropriate for your Elasticsearch cluster version:

# Download Canal server
wget https://github.com/alibaba/canal/releases/download/v1.1.4/canal.deployer-1.1.4.tar.gz
# Decompress the package
tar -zxvf canal.deployer-1.1.4.tar.gz
# Modify the Canal server configuration. Insert your RDS and Elasticsearch details
vi conf/example/instance.properties
# Start the Canal server
./bin/startup.sh

Monitor the Canal logs to ensure everything is running smoothly.

Step 5: Installing the Canal Adapter

Similar to the server, download, decompress, and configure the Canal adapter before starting it:

# Download Canal adapter
wget https://github.com/alibaba/canal/releases/download/canal-1.1.4/canal.adapter-1.1.4.tar.gz
# Decompress the package
tar -zxvf canal.adapter-1.1.4.tar.gz
# Modify the application.yml file to set up connections and data mappings
vi conf/application.yml

Ensure that the connection details for your MySQL instance and Elasticsearch cluster are correctly inputted, focusing on the database URL, username, password, and Elasticsearch endpoint along with credentials.

canal.conf:
  canalServerHost: 127.0.0.1:11111
  srcDataSources:
    defaultDS:
      url: jdbc:mysql://<Your-RDS-Endpoint>:3306/<DB-Name>?useUnicode=true
      username: <DB-Username>
      password: <DB-Password>
  canalAdapters:
  - instance: default
    groups:
    - outerAdapters:
      - name: es
        hosts: <Your-ES-Endpoint>:9200
        properties:
          mode: rest
          security.auth: <ES-Username>:<ES-Password>
          cluster.name: <ES-Cluster-ID>

After editing the application.yml and other desired configurations, start the adapter:

# Start the Canal adapter
./bin/startup.sh

Monitor logs to ensure the adapter started correctly and is operational:

cat logs/adapter/adapter.log

Step 6: Verify Data Synchronization

To confirm that data synchronization from MySQL to Elasticsearch is properly set up through Canal, perform a test by inserting some data into the es_test table in your MySQL database.

INSERT INTO `ES`.`es_test`(`id`, `count`, `name`, `color`) VALUES (2, '11', 'canal_test2', 'red');

Then, query this data from Elasticsearch to verify it has been synchronized:

GET /es_test/_search

Successful synchronization will result in query results reflecting the new data you've inserted into MySQL.

Conclusion

Synchronizing data from MySQL to Elasticsearch enhances search capabilities, offering real-time search performance that is crucial for dynamic applications. Leveraging Alibaba Cloud Elasticsearch and Canal simplifies this task, ensuring that your search engine remains up-to-date with the latest database changes. Whether you're integrating complex datasets or looking for an efficient way to keep your search indices current, this setup caters well to high-performance requirements.

Start Your Journey with Alibaba Cloud Elasticsearch

Eager to experience seamless data synchronization and robust search capabilities? Alibaba Cloud Elasticsearch provides a fully-managed service that scales with your needs, empowering you to focus more on your application development and less on infrastructure management.

Embark on Your 30-Day Free Trial with Alibaba Cloud Elasticsearch and discover how you can transform your data into actionable insights and enhanced search experiences.

0 1 0
Share on

Data Geek

100 posts | 4 followers

You may also like

Comments

Data Geek

100 posts | 4 followers

Related Products