Alibaba Cloud Elasticsearch, a fully-managed service based on the open-source Elasticsearch engine, powers real-time search and analytics capabilities for a variety of applications. Pairing it up with Canal—an Alibaba Group's open-source tool—allows users to perform real-time data synchronization from MySQL to an Elasticsearch cluster, ensuring that the search engine reflects the most current data available.
Canal is adept at parsing binary log data from MySQL, providing a subscription and consumption model for incremental data updates. By using it as a subordinate to a MySQL instance, like ApsaraDB RDS for MySQL, it captures the binary logs and sends them to Elasticsearch. For further understanding of Canal's principles, refer to its comprehensive documentation. Here, we demonstrate how to use Canal with Alibaba Cloud's Elasticsearch to maintain high-performance data synchronization.
Before exploring the steps, ensure that you have set up an ApsaraDB RDS for MySQL instance an Elasticsearch cluster and an Elastic Compute Service (ECS) instance within the same Virtual Private Cloud (VPC) for optimal connectivity and security.
1)Only incremental data synchronization is supported with this approach.
2)JDK version 1.8.0 or later is necessary.
3)Different Canal versions support specific Elasticsearch versions, e.g., Canal 1.1.5 for ES 7.X, and Canal 1.1.7 for ES 8.X.
4)Data types in MySQL must match the mappings in Elasticsearch.
5)Maintain Canal's availability to prevent disruptions in synchronization.
Firstly, you need to create a table within ApsaraDB RDS for MySQL. Below is an example SQL command to create a table:
-- create table
CREATE TABLE `es_test` (
`id` bigint(32) NOT NULL,
`name` text NOT NULL,
`count` text NOT NULL,
`color` text NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARACTER SET=utf8;
PUT /es_test?include_type_name=true
{
"settings": {
"index": {
"number_of_shards": "5",
"number_of_replicas": "1"
}
},
"mappings": {
"_doc": {
"properties": {
"count": {"type": "text"},
"id": {"type": "integer"},
"name": {"type": "text", "analyzer": "ik_smart"},
"color": {"type": "text"}
}
}
}
}
This creates an index es_test with the specified settings and mappings.
SSH into your ECS instance and install the JDK required for Canal:
# Install JDK 1.8
sudo yum install java-1.8.0-openjdk-devel.x86_64
# Set environment variables
echo 'export JAVA_HOME=/usr/lib/jvm/jre-1.8.0-openjdk' >> ~/.bash_profile
echo 'export PATH=$PATH:$JAVA_HOME/bin' >> ~/.bash_profile
source ~/.bash_profile
# Verify installation
java -version
Download and configure the Canal server appropriate for your Elasticsearch cluster version:
# Download Canal server
wget https://github.com/alibaba/canal/releases/download/v1.1.4/canal.deployer-1.1.4.tar.gz
# Decompress the package
tar -zxvf canal.deployer-1.1.4.tar.gz
# Modify the Canal server configuration. Insert your RDS and Elasticsearch details
vi conf/example/instance.properties
# Start the Canal server
./bin/startup.sh
Monitor the Canal logs to ensure everything is running smoothly.
Similar to the server, download, decompress, and configure the Canal adapter before starting it:
# Download Canal adapter
wget https://github.com/alibaba/canal/releases/download/canal-1.1.4/canal.adapter-1.1.4.tar.gz
# Decompress the package
tar -zxvf canal.adapter-1.1.4.tar.gz
# Modify the application.yml file to set up connections and data mappings
vi conf/application.yml
Ensure that the connection details for your MySQL instance and Elasticsearch cluster are correctly inputted, focusing on the database URL, username, password, and Elasticsearch endpoint along with credentials.
canal.conf:
canalServerHost: 127.0.0.1:11111
srcDataSources:
defaultDS:
url: jdbc:mysql://<Your-RDS-Endpoint>:3306/<DB-Name>?useUnicode=true
username: <DB-Username>
password: <DB-Password>
canalAdapters:
- instance: default
groups:
- outerAdapters:
- name: es
hosts: <Your-ES-Endpoint>:9200
properties:
mode: rest
security.auth: <ES-Username>:<ES-Password>
cluster.name: <ES-Cluster-ID>
After editing the application.yml and other desired configurations, start the adapter:
# Start the Canal adapter
./bin/startup.sh
Monitor logs to ensure the adapter started correctly and is operational:
cat logs/adapter/adapter.log
To confirm that data synchronization from MySQL to Elasticsearch is properly set up through Canal, perform a test by inserting some data into the es_test table in your MySQL database.
INSERT INTO `ES`.`es_test`(`id`, `count`, `name`, `color`) VALUES (2, '11', 'canal_test2', 'red');
Then, query this data from Elasticsearch to verify it has been synchronized:
GET /es_test/_search
Successful synchronization will result in query results reflecting the new data you've inserted into MySQL.
Synchronizing data from MySQL to Elasticsearch enhances search capabilities, offering real-time search performance that is crucial for dynamic applications. Leveraging Alibaba Cloud Elasticsearch and Canal simplifies this task, ensuring that your search engine remains up-to-date with the latest database changes. Whether you're integrating complex datasets or looking for an efficient way to keep your search indices current, this setup caters well to high-performance requirements.
Eager to experience seamless data synchronization and robust search capabilities? Alibaba Cloud Elasticsearch provides a fully-managed service that scales with your needs, empowering you to focus more on your application development and less on infrastructure management.
Embark on Your 30-Day Free Trial with Alibaba Cloud Elasticsearch and discover how you can transform your data into actionable insights and enhanced search experiences.
Real-Time Synchronizing Data from MySQL to Alibaba Cloud Elasticsearch with DTS
Synchronizing Data from Azure Event Hubs to Alibaba Cloud Elasticsearch using Logstash
Alibaba Clouder - January 7, 2021
Apache Flink Community China - May 13, 2021
Alibaba Cloud Native Community - January 13, 2023
Apache Flink Community China - June 2, 2022
ApsaraDB - January 16, 2023
Alibaba EMR - January 10, 2023
Alibaba Cloud Elasticsearch helps users easy to build AI-powered search applications seamlessly integrated with large language models, and featuring for the enterprise: robust access control, security monitoring, and automatic updates.
Learn MoreA real-time data warehouse for serving and analytics which is compatible with PostgreSQL.
Learn MoreSecure and easy solutions for moving you workloads to the cloud
Learn MoreThis solution helps you easily build a robust data security framework to safeguard your data assets throughout the data security lifecycle with ensured confidentiality, integrity, and availability of your data.
Learn MoreMore Posts by Data Geek