Use Data Integration to migrate data to Tablestore - Tablestore

If your business requires database capabilities, such as high concurrent read and write performance, high scalability, high availability, complex retrieval, and big data analysis, but the architecture of the existing databases cannot meet your business requirements or the cost for database transformation is high, you can use DataWorks Data Integration to migrate data from the existing databases to Tablestore tables. You can also use DataWorks Data Integration to migrate data in Tablestore tables across instances or Alibaba Cloud accounts or migrate Tablestore data to Object Storage Service (OSS) or MaxCompute. This way, you can back up Tablestore data and use Tablestore data in other services.

Scenarios

DataWorks Data Integration is a stable, efficient, and scalable data synchronization platform. It is suitable for data migration and synchronization between multiple disparate data sources, such as MySQL, Oracle, MaxCompute, and Tablestore.

Tablestore allows you to use DataWorks Data Integration to migrate database data to Tablestore, migrate Tablestore data across instances or Alibaba Cloud accounts, and migrate Tablestore data to OSS or MaxCompute.

Migrate database data to Tablestore

DataWorks provides a stable and efficient data synchronization feature among disparate data sources. You can migrate data from various databases to Tablestore. The following figure shows the synchronization between Tablestore and various data sources.

Note

For information about the data sources and Reader and Writer plug-ins supported by DataWorks, see Supported data source types, Reader plug-ins, and Writer plug-ins.

Migrate or synchronize Tablestore data across instances or Alibaba Cloud accounts

You can configure Tablestore-related Reader and Writer plug-ins in DataWorks to synchronize data in Tablestore data tables or time series tables. The following figure shows the synchronization process. The following table describes Tablestore-related Reader and Writer plug-ins.

Plug-in	Description
OTSReader	The plug-in is used to read data from Tablestore tables. You can specify the range of data that you want to extract for incremental extraction.
OTSStreamReader	The plug-in is used to export data in Tablestore tables in incremental mode.
OTSWriter	The plug-in is used to write data to Tablestore.

Migrate Tablestore data to OSS or MaxCompute

You can migrate Tablestore data to OSS or MaxCompute based on your business scenario.

MaxCompute is a fully managed data warehouse service that can process terabytes or petabytes of data at high speeds. You can use MaxCompute to back up Tablestore data or migrate Tablestore data to MaxCompute and use Tablestore data in MaxCompute.
OSS is a secure, cost-effective, and highly reliable service that can store large amounts of data. You can use OSS to back up Tablestore data or synchronize Tablestore data to OSS and download objects from OSS to your local devices.

Migration solutions

You can use DataWorks Data Integration to migrate data between Tablestore and various data sources.

You can use a data import solution to synchronize the following types of data to Tablestore: MySQL, Oracle, Kafka, HBase, and MaxCompute. You can also synchronize data across Tablestore data tables or time series tables.
You can use a data export solution to synchronize data from Tablestore to MaxCompute or OSS.

Import data

The following table describes data import solutions.

Solution	Description
Synchronize MySQL data to Tablestore	You can migrate data in MySQL databases only to Tablestore data tables. During migration, the Reader script configurations of MySQL and the Writer script configurations of Tablestore are used. The following items describe the source and destination configurations: Source configurations: MySQL data source Destination configurations: Tablestore data source
Synchronize Oracle data to Tablestore	You can migrate data in Oracle databases only to Tablestore data tables. During migration, the Reader script configurations of Oracle and the Writer script configurations of Tablestore are used. The following items describe the source and destination configurations: Source configurations: Oracle data source Destination configurations: Tablestore data source
Synchronize Kafka data to Tablestore	You can migrate Kafka data to Tablestore data tables or time series tables. Important You can use DataWorks Data Integration to migrate Kafka data only to Tablestore data tables. If you want to migrate Kafka data to Tablestore time series tables, use Tablestore Sink Connector. For more information, see Data synchronization to time series tables. Tablestore supports the Wide Column and TimeSeries models. Before you migrate Kafka data to Tablestore, select a Tablestore model based on your business requirements. For more information, see Overview of the Wide Column model and Overview of the TimeSeries model. During migration, the Reader script configurations of Kafka and the Writer script configurations of Tablestore are used. The following items describe the source and destination configurations: Source configurations: Kafka data source Destination configurations: Tablestore data source
Synchronize HBase data to Tablestore	You can migrate data in HBase databases only to Tablestore data tables. During migration, the Reader script configurations of HBase and the Writer script configurations of Tablestore are used. The following items describe the source and destination configurations: Source configurations: HBase data source Destination configurations: Tablestore data source
Synchronize MaxCompute data to Tablestore	You can migrate MaxCompute data only to Tablestore data tables. During migration, the Reader script configurations of MaxCompute and the Writer script configurations of Tablestore are used. The following items describe the source and destination configurations: Source configurations: MaxCompute data source Destination configurations: Tablestore data source
Synchronize PolarDB-X 2.0 data to Tablestore	You can migrate data from PolarDB-X 2.0 only to Tablestore data tables. During migration, the Reader script configurations of PolarDB-X 2.0 and the Writer script configurations of Tablestore are used. Source configurations: PolarDB-X 2.0 data source Destination configurations: Tablestore data source
Synchronize data between Tablestore data tables	You can migrate data from a Tablestore data table only to another Tablestore data table. During migration, the Reader script configurations and the Writer script configurations of Tablestore are used. For information about the source and destination configurations, see Tablestore data source. When you specify the Reader script configurations and the Writer script configurations of Tablestore, refer to the configurations that are used to read and write data in tables in the Wide Column model.
Synchronize data between Tablestore time series tables	You can migrate data from a Tablestore time series table only to another Tablestore time series table. During migration, the Reader script configurations and the Writer script configurations of Tablestore are used. For information about the source and destination configurations, see Tablestore data source. When you specify the Reader script configurations and the Writer script configurations of Tablestore, refer to the configurations that are used to read and write data in tables in the TimeSeries model.

Export data

The following table describes data export solutions.

Solution

Description

Synchronize Tablestore data to MaxCompute

You can use MaxCompute to back up Tablestore data or migrate Tablestore data to MaxCompute.

During migration, the Reader script configurations of Tablestore and the Writer script configurations of MaxCompute are used. The following items describe the source and destination configurations:

Source configurations: Tablestore data source (full synchronization) and Tablestore Stream data source (incremental synchronization)
Destination configurations: MaxCompute data source

Synchronize Tablestore data to OSS

You can download objects that are synchronized from Tablestore to OSS and store the objects in OSS as the backup of the data in Tablestore.

During migration, the Reader script configurations of Tablestore and the Writer script configurations of OSS are used. The following items describe the source and destination configurations:

Source configurations: Tablestore data source (full synchronization) and Tablestore Stream data source (incremental synchronization)
Destination configurations: OSS data source

Prerequisites

After you determine a migration solution, make sure that the following preparations are made:

The network connection between the source and DataWorks and between the destination and DataWorks is established.
The following operations are performed on the source service: confirm the version, prepare the account, configure the required permissions, and perform service-specific configurations. For more information, see the configuration requirements in the documentation of the source.
The destination service is activated, and the required resources are created. For more information, see the configuration requirements in the documentation of the destination.

Usage notes

Important

If you require technical support when you migrate data, submit a ticket.

Make sure that DataWorks Data Integration supports data migration of the specific product version.
The data type of the destination must match the data type of the source. Otherwise, dirty data may be generated during migration.
After you determine the migration solution, make sure to read the limits and usage notes in the documentation of the source and destination.
Before you migrate Kafka data, you must select a Tablestore data model to store the data based on your business scenario.

Configuration process

You can determine your migration solution and learn about how to configure data migration by using DataWorks Data Integration for your migration solution.

The following table describes the configuration steps.

No.	Step	Description
1	Add and manage data sources	Create the required data sources based on the migration solution. If you want to migrate data to Tablestore, the destination is Tablestore and the source can be Tablestore or other services, such as MySQL or MaxCompute. If you want to migrate data from Tablestore, the source is Tablestore and the destination can be MaxCompute or OSS.
2	Configure a batch synchronization task by using the codeless UI	DataWorks Data Integration provides the codeless UI and step-by-step instructions to help you configure a batch synchronization task. The codeless UI is easy to use but provides only limited features.
3	Verify migration results	View the imported data in the destination based on the migration solution. After you complete the data import operation, view the imported data in the Tablestore console. After you complete the data export operation, view the imported data in the MaxCompute or OSS console.

Examples

Import data

DataWorks Data Integration allows you to import data from sources, such as MySQL, Oracle, and MaxCompute, to Tablestore data tables. In this example, data is imported from MaxCompute to a Tablestore data table.

Prerequisites

Make sure that the following preparations are made before you perform subsequent operations:

Information about the MaxCompute table that you want to synchronize to a Tablestore table is confirmed and recorded.
A RAM user is created and granted Tablestore management permissions (AliyunOTSFullAccess) and DataWorks management permissions (AliyunDataWorksFullAccess). For more information, see Create a RAM user and Grant permissions to the RAM user.
An AccessKey pair is created for the RAM user. For more information, see Create an AccessKey pair.
DataWorks is activated and a workspace is created. For more information, see Activate DataWorks and Create a workspace.
Tablestore is activated. An instance and a data table are created. For more information, see Step 1: Activate Tablestore, Step 2: Create an instance, and Step 3: Create a data table.
Important
The names and types of the primary key columns of the Tablestore data table must match the names and types of the primary key columns of the MaxCompute table. You do not need to define attribute columns in the Tablestore data table. You can specify the names of attribute columns when you write data to the Tablestore data table.

Step 1: Add a Tablestore data source and a MaxCompute data source

In this example, a Tablestore data source is added. Add the required data sources based on your business requirements.

Important

If you want to add a MaxCompute data source, click MaxCompute in the Add Data Source dialog box and configure the parameters.

Go to the Data Integration page.
Log on to the DataWorks console, select a region in the upper-left corner, choose Data Development and Governance > Data Integration, select a workspace from the drop-down list, and then click Go to Data Integration.
In the left-side navigation pane, click Data Source.
On the Data Source page, click Add Data Source.
In the Add Data Source dialog box, click the Tablestore block.

In the Add OTS data source dialog box, configure the parameters that are described in the following table.

Parameter	Description
Data Source Name	The name of the data source. The name can contain letters, digits, and underscores (_), and must start with a letter.
Data Source Description	The description of the data source. The description cannot exceed 80 characters in length.
Endpoint	The endpoint of the Tablestore instance. For more information, see Endpoints. If the Tablestore instance and the resources of the destination data source are in the same region, enter a virtual private cloud (VPC) endpoint. Otherwise, enter a public endpoint.
Table Store instance name	The name of the Tablestore instance. For more information, see Instance.
AccessKey ID	The AccessKey ID and AccessKey secret of your Alibaba Cloud account or RAM user. For more information about how to create an AccessKey pair, see Create an AccessKey pair.
AccessKey Secret

Test the network connectivity between the data source and the resource group that you select.
To ensure that your synchronization nodes run as expected, you need to test the connectivity between the data source and all types of resource groups on which your synchronization nodes will run.
Important
A synchronization task can use only one type of resource group. By default, only shared resource groups for Data Integration are displayed in the resource group list. To ensure the stability and performance of data synchronization, we recommend that you use an exclusive resource group for Data Integration.
1. Click Purchase to create a new resource group or click Associate Purchased Resource Group to associate an existing resource group. For more information, see Create and use an exclusive resource group for Data Integration.
2. After the resource group is started, click Test Network Connectivity in the Connection Status (Production Environment) column of the resource group.
  If Connected is displayed, the connectivity test is passed.
If the data source passes the network connectivity test, click Complete.
The newly created data source is displayed in the data source list.

Step 2: Configure a batch synchronization task by using codeless UI

Add a synchronization task node.
1. Go to the DataStudio console.
  Log on to the DataWorks console, select a region in the upper-left corner, choose Data Development and Governance > Data Development in the left-side navigation pane, select a workspace from the drop-down list, and click Go to DataStudio.
2. In the Scheduled Workflow section of the DataStudio console, click Business Flow and select a business flow.
  For information about how to create a workflow, see Create a workflow.
3. Right-click Data Integration and choose Create Node > Offline synchronization.
4. In the Create Node dialog box, enter a name in the Name field and click Confirm.
  You can find the offline synchronization node that you created under Data Integration.
Configure network and resource links.
Select the source and destination for the batch synchronization task, and the resource group that is used to run the batch synchronization task. Establish network connections between the resource group and data sources, and test the connectivity.
1. Double-click the created batch synchronization node under Data Integration.
2. In the Configure Network Connections and Resource Group step, set the Source parameter to MaxCompute(ODPS) and the Data Source Name parameter to the name of the MaxCompute data source.
3. Select a resource group.
  After you select a resource group, the system displays the region and specifications of the resource group and automatically tests the connectivity between the resource group and the selected data source.
  Important
  Make sure that the resource group is the same as the resource group that you selected when you added the data source.
4. Set the Destination parameter to Tablestore and the Data Source Name parameter to the name of the Tablestore data source.
  The system automatically tests the connectivity between the resource group and the destination.
5. After the connectivity test is passed, click Next.

Configure the source and destination.

In the Configure Source and Destination section, configure the parameters for the source. The following table describes the parameters.

Parameter	Description
Tunnel Resource Group	The data transmission resources of MaxCompute. The default value is Common transmission resources, which is a free quota for MaxCompute.
Table	The name of the table in MaxCompute.
Partition information	The names of the partitions from which you want to read data.

In the Configure Source and Destination section, configure the parameters for the destination. The following table describes the parameters.

Parameter	Description
Table	The name of the Tablestore data table.
primaryKeyInfo	The primary key information about the Tablestore data table. The value of the parameter is a JSON array. The primary key of a data table consists of one to four primary key columns. Specify the primary key information based on the actual number of primary key columns. Important The number, sequence, and types of primary key columns that you specify must be the same as the number, sequence, and types of primary key columns in the data table. Example: `[ { "name": "pk", "type": "string" } ]`
WriteMode	The mode in which data is written to Tablestore. Valid values: PutRow: inserts data into the specified row. This mode corresponds to the Tablestore PutRow API operation. If the specified row does not exist, a new row is added. If the specified row exists, the row is overwritten. UpdateRow: updates data in the specified row. This mode corresponds to the Tablestore UpdateRow API operation. If the specified row does not exist, a new row is added. If the specified row exists, the values of the specified columns in the row are added, modified, or removed based on the content of the request.

Configure field mappings.
After you configure the source and destination, you must configure mappings between source fields and destination fields. After you configure the mappings, the batch synchronization task writes the values of the source fields to the destination fields of the specific data type based on the mappings. By default, the system maps the field in a row of the source to the field in the same row of the destination.
Important
- If you want to change the mapping sequence, you can click the icon next to Target field that corresponds to Source field. You can specify the name of a source field to represent the source field. You must use the JSON format to specify a destination field. Example: {"name":"age","type":"INTEGER"}. Each line represents a field.
- The data types of attribute columns in the Tablestore data table must match the data types of fields that you want to write to MaxCompute. The names of attribute columns in the Tablestore data table can be different from the names of the fields that you want to write to MaxCompute.
By default, the primary key information about the destination is specified when you configure the parameters for the destination. In this case, you must manually delete primary key information in the destination fields. Otherwise, the batch synchronization task fails.
1. Click the icon next to Target field.
2. In the Manually Edit Table Fields dialog box, delete the primary key information in the destination fields and add or modify the attribute column information based on your business requirements.
3. Click Confirmation.
Configure channel control policies.
In the Channel Control section, configure the parameters for task execution, such as the Synchronization rate and Policy for Dirty Data Records parameters. For more information, see Step 5: Configure channel control policies.
Click the icon to save the configurations.
Note
If you do not save the configurations, a message that prompts you to save the configurations appears when you perform subsequent operations. In this case, click Confirmation in the message to save the configurations.
Run the synchronization task.
Important
In most cases, you need to synchronize full data only once. You do not need to configure scheduling properties.
1. Click .
2. In the Parameters dialog box, select the name of the resource group from the drop-down list.
3. Click Run.
  After the synchronization task is complete, click the URL of the run log on the Runtime Log tab to go to the details page of the run log. On the details page of the run log, check the value of Current task status.
  If the value of Current task status is FINISH, the task is complete.

Step 3: View the data imported to Tablestore

Log on to the Tablestore console to view data.

Log on to the Tablestore console.
In the upper part of the Overview page, select a region.
Click the name of the Tablestore instance to which data is restored.
On the Instance Details tab, click the name of the restored table on the Tables tab.
On the Query Data tab, you can view the data imported to Tablestore.

Export data

You can use DataWorks Data Integration to export Tablestore data to MaxCompute or OSS.

Export full Tablestore data to MaxCompute. For more information, see Export full data from Tablestore to MaxCompute.
Synchronize Tablestore data to OSS.
- Full export
  You can export full data from Tablestore to OSS. For more information, see Export full data from Tablestore to OSS.
- Incremental synchronization
  You can synchronize incremental data from Tablestore to OSS. For more information, see Synchronize incremental data to OSS.

Billing

After you import data to Tablestore, you are charged by Tablestore for storage usage based on the amount of stored data.
When you use a migration tool to access Tablestore, you are charged by Tablestore for read and write throughput based on the read and write requests. You are separately charged for metered read and write CUs and reserved read and write CUs. The type of the instance that you access determines whether metered read and write CUs or reserved read and write CUs are consumed.
Note
For more information about instance types and CUs, see Instance and Read and write throughput.
When you use DataWorks tools, you are charged for specific features and resources. For more information, see Purchase guide.

Appendix: Field type mappings

This section describes the field type mappings between common services and Tablestore. In actual scenarios, configure field mappings based on the field type mappings.

Field type mapping between MaxCompute and Tablestore

Field type in MaxCompute	Field type in Tablestore
STRING	STRING
BIGINT	INTEGER
DOUBLE	DOUBLE
BOOLEAN	BOOLEAN
BINARY	BINARY

Field type mapping between MySQL and Tablestore

Field type in MySQL	Field type in Tablestore
STRING	STRING
INT or INTEGER	INTEGER
DOUBLE, FLOAT, or DECIMAL	DOUBLE
BOOL or BOOLEAN	BOOLEAN
BINARY	BINARY

Field type mapping between Kafka and Tablestore

Kafka Schema Type	Field type in Tablestore
STRING	STRING
INT8, INT16, INT32, or INT64	INTEGER
FLOAT32 or FLOAT64	DOUBLE
BOOLEAN	BOOLEAN
BYTES	BINARY

Tablestore:Data Integration

Scenarios

Migrate database data to Tablestore

Migrate or synchronize Tablestore data across instances or Alibaba Cloud accounts

Migrate Tablestore data to OSS or MaxCompute

Migration solutions

Import data

Export data

Prerequisites

Usage notes

Configuration process

Examples

Import data

Prerequisites

Step 1: Add a Tablestore data source and a MaxCompute data source

Step 2: Configure a batch synchronization task by using codeless UI

Step 3: View the data imported to Tablestore

Export data

Billing

Other solutions

Appendix: Field type mappings

Field type mapping between MaxCompute and Tablestore

Field type mapping between MySQL and Tablestore

Field type mapping between Kafka and Tablestore