×
Community Blog Sync MySQL Data to Alibaba Cloud Elasticsearch with DataWorks

Sync MySQL Data to Alibaba Cloud Elasticsearch with DataWorks

This step-by-step guide simplifies the process, empowering you to perform full-text searches, multi-dimensional queries, and statistical analysis on your MySQL data with ease.

Performing full-text searches, multi-dimensional queries, and statistical analysis on data stored in MySQL can be a cumbersome task. Thankfully, Alibaba Cloud DataWorks offers a streamlined solution for synchronizing data from MySQL to Alibaba Cloud Elasticsearch unlocking powerful search and analytical capabilities. In this article, we'll walk you through the entire process, complete with code examples and best practices.

Prerequisites

Before we dive in, ensure you have the following prerequisites in place:

1)An ApsaraDB RDS for MySQL instance (follow this guide to create one).

2)An Alibaba Cloud Elasticsearch cluster with the Auto Indexing feature enabled (learn how to create a cluster and configure the YML file).

3)A DataWorks workspace .

Note: The MySQL instance, Elasticsearch cluster, and DataWorks workspace must reside in the same region and time zone.

Step 1: Prepare Source Data

Let's start by creating a database and table in your ApsaraDB RDS for MySQL instance. Run the following SQL statements:

1-- Create table
2CREATE TABLE `es_test` (
3    `id` bigint(32) NOT NULL,
4    `name` varchar(32) NULL,
5    `age` bigint(32) NULL,
6    `hobby` varchar(32) NULL,
7    PRIMARY KEY (`id`)
8) ENGINE=InnoDB
9DEFAULT CHARACTER SET=utf8;
10
11-- Insert data
12INSERT INTO `es_test` (`id`,`name`,`age`,`hobby`) VALUES (1,'user1',22,'music');
13INSERT INTO `es_test` (`id`,`name`,`age`,`hobby`) VALUES (2,'user2',23,'sport');
14INSERT INTO `es_test` (`id`,`name`,`age`,`hobby`) VALUES (3,'user3',43,'game');
15INSERT INTO `es_test` (`id`,`name`,`age`,`hobby`) VALUES (4,'user4',24,'run');
16INSERT INTO `es_test` (`id`,`name`,`age`,`hobby`) VALUES (5,'user5',42,'basketball');

Step 2: Create and Configure an Exclusive Resource Group

Next, create an exclusive resource group for Data Integration and associate it with a Virtual Private Cloud (VPC) and your DataWorks workspace. This ensures fast and stable data transmission.

1)Log in to the DataWorks console

2)Select your region and navigate to Resource Groups > Exclusive Resource Groups.

3)Click Create Resource Group for Data Integration, configure the resource group name, and complete the purchase.

4)Find the newly created resource group, click Network Settings, and associate it with the VPC where your MySQL instance and Elasticsearch cluster reside.

5)Add the CIDR blocks of the vSwitches (for MySQL and Elasticsearch) to their respective IP address whitelists.

6)Return to the Resource Groups page and click Change Workspace to associate the resource group with your DataWorks workspace.

Step 3: Add Data Sources

It's time to add your MySQL instance and Elasticsearch cluster as data sources in Data Integration.

1)In the DataWorks console, navigate to Workspaces > Shortcuts > Data Integration.

2)In the left-side navigation pane, click Data Source > Add Data Source.

3)Add your MySQL data source by selecting MySQL and configuring the basic information (follow this guide).

4)Repeat the process to add your Elasticsearch data source (follow this guide).

Step 4: Configure and Run a Batch Synchronization Task

Now, it's time to configure and run the batch synchronization task using the exclusive resource group.

1)In the DataWorks console, navigate to Workspaces > Shortcuts > Data Integration.

2)Create a new workflow and add an Offline Synchronization node.

3)Configure the network and resources:

  • Set Source to MySQL and Data Source Name to your MySQL data source.
  • Select the exclusive resource group you created earlier.
  • Set Destination to Elasticsearch and Data Source Name to your Elasticsearch data source.

4)Click Next and configure the task:

  • Select the table you want to synchronize in the Source section.
  • Configure the destination parameters and field mappings in the Destination and Field Mapping sections.
  • Adjust the Channel Control settings as needed.

5)(Optional) Configure scheduling properties for the task.

6)Save and submit the task. You can run it immediately or set it to run on a schedule.

Step 5: Verify Data Synchronization

To verify that the data has been successfully synchronized to Elasticsearch, follow these steps:

1)Log in to the Kibana console of your Elasticsearch cluster.

2)Click the icon in the top-left corner and select Dev Tools.

3)On the Console tab, run the following command (replace es_test with your index name):

POST /es_test/_search?pretty
{
  "query": { "match_all": {}}
}

If the data synchronization was successful, you should see the synchronized data in the response.

Congratulations! You've successfully synchronized data from MySQL to Alibaba Cloud Elasticsearch using DataWorks. Now, you can leverage Elasticsearch's powerful search and analytical capabilities on your MySQL data.

Ready to start your journey with Elasticsearch on Alibaba Cloud? Explore our tailored Cloud solutions and services to take the first step towards transforming your data into a visual masterpiece. Click here to embark on your 30-Day Free Trial

0 1 0
Share on

Data Geek

100 posts | 4 followers

You may also like

Comments

Data Geek

100 posts | 4 followers

Related Products