Automatically synchronize the metadata of a PolarDB-X instance to a Data Lakehouse Edition cluster - AnalyticDB

You can enable the column-based storage engine for a PolarDB-X instance and associate the instance with a Data Lakehouse Edition (V3.0) cluster to achieve one-stop data synchronization and management for the PolarDB-X instance. Compared with other data synchronization solutions, this solution is more cost-effective and easy to use, and can synchronize data in real time to achieve real-time data analysis. This topic describes how to synchronize data from a PolarDB-X instance to an AnalyticDB for MySQL Data Lakehouse Edition (V3.0) cluster in one stop.

Prerequisites

The Data Lakehouse Edition (V3.0) cluster and the PolarDB-X 2.0 instance belong to the same VPC in the same region. For more information, see Create a Data Lakehouse Edition cluster and Create a PolarDB-X instance.
The PolarDB-X 2.0 instance is of Enterprise Edition. The version of the PolarDB-X instance is 5.4.19-16989811 and later.
Note
For more information about how to view and upgrade the version of an instance, see View and update the version of an instance.
The minor version of the Data Lakehouse Edition (V3.0) cluster is 3.2.0.0 or later.
Note
For information about how to query the minor version of a cluster, see How do I query the version of an AnalyticDB for MySQL cluster? To update the minor version of a cluster, contact technical support.
A database account is created for the AnalyticDB for MySQL Data Lakehouse Edition (V3.0) cluster.
- If you use an Alibaba Cloud account, you must create a privileged account. For more information, see the "Create a privileged account" section of the Create a database account topic.
- If you use a Resource Access Management (RAM) user, you must create both a privileged account and a standard account and associate the standard account with the RAM user. For more information, see Create a database account and Associate or disassociate a database account with or from a RAM user.

Background information

AnalyticDB for MySQL is a real-time lakehouse that is built based on the data lakehouse architecture and provides high-performance query capabilities for large amounts of data. PolarDB-X provides high-performance cloud native database services that can handle tens of millions of concurrent requests and store hundreds of petabytes of data. After you enable the column-based storage engine for a PolarDB-X 2.0 instance and associate the instance with a Data Lakehouse Edition (V3.0) cluster, the Data Lakehouse Edition (V3.0) cluster automatically creates a metadata discovery task. The metadata information of the column-oriented tables in the PolarDB-X 2.0 instance is automatically synchronized to the Data Lakehouse Edition (V3.0) cluster. You can directly perform data analysis in the Data Lakehouse Edition (V3.0) cluster.

The one-stop data synchronization solution for PolarDB-X 2.0 instances provides the following benefits:

Real-time synchronization: After you modify data in the PolarDB-X 2.0 instance, the data is synchronized to the Data Lakehouse Edition (V3.0) cluster within seconds. You can query the data changes in the Lake Warehouse Edition (3.0) cluster within seconds to implement real-time data analysis.
Easy to use: After you modify the data and table structure in the PolarDB-X 2.0 instance, the changes are automatically synchronized to the Data Lakehouse Edition (V3.0) cluster. You do not need to perform any additional operations.
Cost-effective: Enabling the column-based storage engine for a PolarDB-X 2.0 instance is free of charge. You can use job resource groups to develop SQL statements when you perform data analysis in the Data Lakehouse Edition (V3.0) cluster. Job resource groups provide elastic computing and storage resources on demand at low costs.

Scenarios

The one-stop data synchronization solution for PolarDB-X 2.0 instances is applicable to the following scenarios:

Real-time data analysis for PolarDB-X 2.0 instances: analyzes data in a PolarDB-X 2.0 instance in real time without affecting online business.
Association analysis across multiple data sources: analyzes data across the PolarDB-X 2.0 instance and other data sources.
Data analysis of multiple PolarDB-X 2.0 instances: analyzes data across multiple PolarDB-X 2.0 isntances.

Usage notes

During data synchronization, databases and tables that have the same names as the databases and tables in the PolarDB-X 2.0 instance are created in the Data Lakehouse Edition (V3.0) cluster. Take note of the following items:

If databases and tables that have the same names as the databases and tables in the PolarDB-X 2.0 instance already exist in the Data Lakehouse Edition (V3.0) cluster, data synchronization fails.
If databases that have the same names as the databases in the PolarDB-X 2.0 instance already exist in the Data Lakehouse Edition (V3.0) cluster, tables are synchronized to the existing databases.

Data preparation

In this example, the name of the PolarDB-X 2.0 database is test_adb, and a table named person is created in the database. See the following sample code:

CREATE DATABASE test_adb MODE='AUTO'
CREATE TABLE  person (
`id` bigint NOT NULL AUTO_INCREMENT,
`age` int(11) NOT NULL DEFAULT '0',
`name` char(60) NOT NULL DEFAULT '',
`city` char(60) NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
INDEX `age_1` (`age`)
) PARTITION BY KEY(`id`) PARTITIONS 4;

Insert data records to the person table. Sample code:

INSERT INTO person VALUES (1,23,'Bob','Hangzhou'),(2,33,'Make','Wuhan');

Create a columnar index named person_col_index for the id column of the person table.

CREATE clustered columnar INDEX `person_col_index` ON person(`id`) PARTITION BY hash(`id`) partitions 4;

Procedure

Step 1: Enable the column-based storage engine for the PolarDB-X 2.0 instance

Log on to the PolarDB for Xscale console.
In the top navigation bar, select the region where the target instance is located.
On the Instance List page, click the PolarDB-X 2.0 tab. Click the ID of the instance that you want to manage.
In the left-side navigation pane, click Column-based Storage Engine.
Click Create Column-based Storage Engine. In the Create Column-based Storage Engine dialog box that appears, click OK.
Note
You can click in the upper-right corner of the page to view the Task Progress. When the Status of the nodes in the Topological information section is In use, the column-based storage engine is created.
In the upper-right corner of the page, click Activate AnalyticDB for MySQL Data Lakehouse Edition.
In the AnalyticDB for MySQL Data Lakehouse Edition clusters dialog box, select the Data Lakehouse Edition (V3.0) cluster that you want to use and click OK.
Note
After the association, the Data Lakehouse Edition (V3.0) cluster automatically creates a metadata discovery task.

Step 2: Analyze data in the Data Lakehouse Edition (V3.0) cluster

Log on to the AnalyticDB for MySQL console. In the upper-left corner of the console, select a region. In the left-side navigation pane, click Clusters. On the Data Lakehouse Edition (V3.0) tab, find the cluster that you want to manage and click the cluster ID.
In the left-side navigation pane, choose Data Ingestion > Metadata Discovery.
In the Tasks section, view the Status of the metadata discovery task that is automatically generated.

If the task is successfully run, the Latest Status value of the task is Successful. Choose Job Development > SQL Development to view the databases and tables that are synchronized to the Data Lakehouse Edition (V3.0) cluster.

SELECT * FORM test_adb.person;

Sample result:

+--------+---------+----------+----------+
|   id   |  name   |   age    |   city   |
+--------+---------+----------+----------+
|   1    |   23    |   Bob    | Hangzhou |
+--------+---------+----------+----------+
|   2    |   33    |   Make   |   Wuhan  |
+--------+---------+----------+----------+