×
Community Blog Synchronize ApsaraDB RDS for SQL Server Data to Alibaba Cloud Elasticsearch using DataWorks

Synchronize ApsaraDB RDS for SQL Server Data to Alibaba Cloud Elasticsearch using DataWorks

Discover how to leverage Alibaba Cloud's DataWorks to seamlessly synchronize your valuable data from ApsaraDB RDS for SQL Server to Alibaba Cloud Elasticsearch.

In today's data-driven landscape, the ability to effectively search and analyze information is vital. The combination of ApsaraDB RDS for SQL Server with Alibaba Cloud Elasticsearch offers a powerful synergy for data analysis and search capabilities. By utilizing the Data Integration service from DataWorks, this article guides you through the process of synchronizing data to Alibaba Cloud Elasticsearch.

Background Information

Alibaba Cloud's DataWorks is a comprehensive big data platform providing data development, task scheduling, and data management capabilities. Its Data Integration feature allows for efficient data collection at intervals as low as every five minutes, supporting batch synchronization tasks to various data destinations, including Alibaba Cloud Elasticsearch.
Supported data sources include:

  • Alibaba Cloud databases: ApsaraDB variants and PolarDB, MaxCompute, OSS, Tablestore
  • Self-managed databases: HDFS, Oracle, FTP, Db2, MySQL, SQL Server, MongoDB, HBase

Supported synchronization scenarios cover offline synchronization to Alibaba Cloud Elasticsearch and full table data transfer.

Prerequisites

Before proceeding, ensure you have:

1)An existing ApsaraDB RDS for SQL Server instance.

2)An Alibaba Cloud Elasticsearch cluster with Auto Indexing enabled.

3)A DataWorks workspace in the same region and time zone as your RDS and Elasticsearch instances.

Refer to the respective documentation for guidance on setting up these resources.

Procedure

Step 1: Prepare Source Data

Set up a test environment within your ApsaraDB RDS for SQL Server instance. Here's a sample SQL script snippet:

CREATE TABLE students (
    id INT,
    name VARCHAR(20),
    age INT
)

INSERT INTO students (id, name, age) VALUES ('1', 'Bob', '21');

Step 2: Configure Exclusive Resource Group for Data Integration

Exclusive resource groups ensure stable and rapid data transmission. To set one up:

- Navigate to the DataWorks console.
- Select your region and go to **Resource Groups** > **Exclusive Resource Groups** and click **Create Resource Group for Data Integration**.
- Associate your group with the necessary VPC and your DataWorks workspace.

For a detailed process, see here

Step 3: Add Data Sources

Add both your ApsaraDB RDS for SQL Server instance and Elasticsearch cluster as data sources within the Data Integration service.

Step 4: Configure and Run a Batch Synchronization Task

Create a new batch synchronization task in DataWorks.

- Go to your workspace's DataStudio.
- Create a workflow and then a new Offline synchronization node within.
- Configure your network, set up SQL Server as your source, and your Elasticsearch cluster as your destination.
- Map fields accordingly and run your task.

Step 5: Verify Data Synchronization

Check data has successfully synchronized by running a query in your Elasticsearch cluster's Kibana console.

POST /dbo.students/_search?pretty
{
   "query": { "match_all": {}}
}

Conclusion

Synchronizing data between ApsaraDB RDS for SQL Server and Alibaba Cloud Elasticsearch is made intuitive using DataWorks. This integration unlocks the potential for advanced data search and analytics.

Ready to start your journey with Elasticsearch on Alibaba Cloud? Explore our tailored Cloud solutions and services to take the first step towards transforming your data into a visual masterpiece. Embark on Your 30-Day Free Trial

0 1 0
Share on

Data Geek

97 posts | 4 followers

You may also like

Comments