All Products
Search
Document Center

Tablestore:Data Integration

Last Updated:Aug 22, 2024

If your business requires database capabilities, such as high concurrent read and write performance, high scalability, high availability, complex retrieval, and big data analysis, but the architecture of the existing databases cannot meet your business requirements or the cost for database transformation is high, you can use DataWorks Data Integration to migrate data from the existing databases to Tablestore tables. You can also use DataWorks Data Integration to migrate data in Tablestore tables across instances or Alibaba Cloud accounts or migrate Tablestore data to Object Storage Service (OSS) or MaxCompute. This way, you can back up Tablestore data and use Tablestore data in other services.

Scenarios

DataWorks Data Integration is a stable, efficient, and scalable data synchronization platform. It is suitable for data migration and synchronization between multiple disparate data sources, such as MySQL, Oracle, MaxCompute, and Tablestore.

Tablestore allows you to use DataWorks Data Integration to migrate database data to Tablestore, migrate Tablestore data across instances or Alibaba Cloud accounts, and migrate Tablestore data to OSS or MaxCompute.

Migrate database data to Tablestore

DataWorks provides a stable and efficient data synchronization feature among disparate data sources. You can migrate data from various databases to Tablestore. The following figure shows the synchronization between Tablestore and various data sources.

Note

For information about the data sources and Reader and Writer plug-ins supported by DataWorks, see Supported data source types, Reader plug-ins, and Writer plug-ins.

image.png

Migrate or synchronize Tablestore data across instances or Alibaba Cloud accounts

You can configure Tablestore-related Reader and Writer plug-ins in DataWorks to synchronize data in Tablestore data tables or time series tables. The following figure shows the synchronization process. The following table describes Tablestore-related Reader and Writer plug-ins.

Plug-in

Description

OTSReader

The plug-in is used to read data from Tablestore tables. You can specify the range of data that you want to extract for incremental extraction.

OTSStreamReader

The plug-in is used to export data in Tablestore tables in incremental mode.

OTSWriter

The plug-in is used to write data to Tablestore.

Migrate Tablestore data to OSS or MaxCompute

You can migrate Tablestore data to OSS or MaxCompute based on your business scenario.

  • MaxCompute is a fully managed data warehouse service that can process terabytes or petabytes of data at high speeds. You can use MaxCompute to back up Tablestore data or migrate Tablestore data to MaxCompute and use Tablestore data in MaxCompute.

  • OSS is a secure, cost-effective, and highly reliable service that can store large amounts of data. You can use OSS to back up Tablestore data or synchronize Tablestore data to OSS and download objects from OSS to your local devices.

Migration solutions

You can use DataWorks Data Integration to migrate data between Tablestore and various data sources.

  • You can use a data import solution to synchronize the following types of data to Tablestore: MySQL, Oracle, Kafka, HBase, and MaxCompute. You can also synchronize data across Tablestore data tables or time series tables.

  • You can use a data export solution to synchronize data from Tablestore to MaxCompute or OSS.

Import data

The following table describes data import solutions.

Solution

Description

Synchronize MySQL data to Tablestore

You can migrate data in MySQL databases only to Tablestore data tables.

During migration, the Reader script configurations of MySQL and the Writer script configurations of Tablestore are used. The following items describe the source and destination configurations:

Synchronize Oracle data to Tablestore

You can migrate data in Oracle databases only to Tablestore data tables.

During migration, the Reader script configurations of Oracle and the Writer script configurations of Tablestore are used. The following items describe the source and destination configurations:

Synchronize Kafka data to Tablestore

You can migrate Kafka data to Tablestore data tables or time series tables.

Important

During migration, the Reader script configurations of Kafka and the Writer script configurations of Tablestore are used. The following items describe the source and destination configurations:

Synchronize HBase data to Tablestore

You can migrate data in HBase databases only to Tablestore data tables.

During migration, the Reader script configurations of HBase and the Writer script configurations of Tablestore are used. The following items describe the source and destination configurations:

Synchronize MaxCompute data to Tablestore

You can migrate MaxCompute data only to Tablestore data tables.

During migration, the Reader script configurations of MaxCompute and the Writer script configurations of Tablestore are used. The following items describe the source and destination configurations:

Synchronize PolarDB-X 2.0 data to Tablestore

You can migrate data from PolarDB-X 2.0 only to Tablestore data tables.

During migration, the Reader script configurations of PolarDB-X 2.0 and the Writer script configurations of Tablestore are used.

Synchronize data between Tablestore data tables

You can migrate data from a Tablestore data table only to another Tablestore data table.

During migration, the Reader script configurations and the Writer script configurations of Tablestore are used. For information about the source and destination configurations, see Tablestore data source. When you specify the Reader script configurations and the Writer script configurations of Tablestore, refer to the configurations that are used to read and write data in tables in the Wide Column model.

Synchronize data between Tablestore time series tables

You can migrate data from a Tablestore time series table only to another Tablestore time series table.

During migration, the Reader script configurations and the Writer script configurations of Tablestore are used. For information about the source and destination configurations, see Tablestore data source. When you specify the Reader script configurations and the Writer script configurations of Tablestore, refer to the configurations that are used to read and write data in tables in the TimeSeries model.

Export data

The following table describes data export solutions.

Solution

Description

Synchronize Tablestore data to MaxCompute

You can use MaxCompute to back up Tablestore data or migrate Tablestore data to MaxCompute.

During migration, the Reader script configurations of Tablestore and the Writer script configurations of MaxCompute are used. The following items describe the source and destination configurations:

Synchronize Tablestore data to OSS

You can download objects that are synchronized from Tablestore to OSS and store the objects in OSS as the backup of the data in Tablestore.

During migration, the Reader script configurations of Tablestore and the Writer script configurations of OSS are used. The following items describe the source and destination configurations:

Prerequisites

After you determine a migration solution, make sure that the following preparations are made:

  • The network connection between the source and DataWorks and between the destination and DataWorks is established.

  • The following operations are performed on the source service: confirm the version, prepare the account, configure the required permissions, and perform service-specific configurations. For more information, see the configuration requirements in the documentation of the source.

  • The destination service is activated, and the required resources are created. For more information, see the configuration requirements in the documentation of the destination.

Usage notes

Important

If you require technical support when you migrate data, submit a ticket.

  • Make sure that DataWorks Data Integration supports data migration of the specific product version.

  • The data type of the destination must match the data type of the source. Otherwise, dirty data may be generated during migration.

  • After you determine the migration solution, make sure to read the limits and usage notes in the documentation of the source and destination.

  • Before you migrate Kafka data, you must select a Tablestore data model to store the data based on your business scenario.

Configuration process

You can determine your migration solution and learn about how to configure data migration by using DataWorks Data Integration for your migration solution.

image

The following table describes the configuration steps.

No.

Step

Description

1

Add and manage data sources

Create the required data sources based on the migration solution.

  • If you want to migrate data to Tablestore, the destination is Tablestore and the source can be Tablestore or other services, such as MySQL or MaxCompute.

  • If you want to migrate data from Tablestore, the source is Tablestore and the destination can be MaxCompute or OSS.

2

Configure a batch synchronization task by using the codeless UI

DataWorks Data Integration provides the codeless UI and step-by-step instructions to help you configure a batch synchronization task. The codeless UI is easy to use but provides only limited features.

3

Verify migration results

View the imported data in the destination based on the migration solution.

  • After you complete the data import operation, view the imported data in the Tablestore console.

  • After you complete the data export operation, view the imported data in the MaxCompute or OSS console.

Examples

Import data

DataWorks Data Integration allows you to import data from sources, such as MySQL, Oracle, and MaxCompute, to Tablestore data tables. In this example, data is imported from MaxCompute to a Tablestore data table.

Prerequisites

Make sure that the following preparations are made before you perform subsequent operations:

  • Information about the MaxCompute table that you want to synchronize to a Tablestore table is confirmed and recorded.

  • A RAM user is created and granted Tablestore management permissions (AliyunOTSFullAccess) and DataWorks management permissions (AliyunDataWorksFullAccess). For more information, see Create a RAM user and Grant permissions to the RAM user.

  • An AccessKey pair is created for the RAM user. For more information, see Create an AccessKey pair.

  • DataWorks is activated and a workspace is created. For more information, see Activate DataWorks and Create a workspace.

  • Tablestore is activated. An instance and a data table are created. For more information, see Step 1: Activate Tablestore, Step 2: Create an instance, and Step 3: Create a data table.

    Important

    The names and types of the primary key columns of the Tablestore data table must match the names and types of the primary key columns of the MaxCompute table. You do not need to define attribute columns in the Tablestore data table. You can specify the names of attribute columns when you write data to the Tablestore data table.

Step 1: Add a Tablestore data source and a MaxCompute data source

In this example, a Tablestore data source is added. Add the required data sources based on your business requirements.

Important

If you want to add a MaxCompute data source, click MaxCompute in the Add Data Source dialog box and configure the parameters.

  1. Go to the Data Integration page.

    1. Log on to the DataWorks console as the project administrator.

    2. In the left-side navigation pane, click Workspaces. In the top navigation bar, select a region.

    3. On the Workspaces page, find the workspace that you want to manage and choose Shortcuts > Data Integration in the Actions column.

  2. In the left-side navigation pane, click Data Source.

  3. On the Data Source page, click Add Data Source.

  4. In the Add Data Source dialog box, click the Tablestore block.

  5. In the Add OTS data source dialog box, configure the parameters that are described in the following table.

    Parameter

    Description

    Data Source Name

    The name of the data source. The name can contain letters, digits, and underscores (_), and must start with a letter.

    Data Source Description

    The description of the data source. The description cannot exceed 80 characters in length.

    Endpoint

    The endpoint of the Tablestore instance. For more information, see Endpoints.

    If the Tablestore instance and the resources of the destination data source are in the same region, enter a virtual private cloud (VPC) endpoint. Otherwise, enter a public endpoint.

    Table Store instance name

    The name of the Tablestore instance. For more information, see Instance.

    AccessKey ID

    The AccessKey ID and AccessKey secret of your Alibaba Cloud account or RAM user. For more information about how to create an AccessKey pair, see Create an AccessKey pair.

    AccessKey Secret

  6. Test the network connectivity between the data source and the resource group that you select.

    To ensure that your synchronization nodes run as expected, you need to test the connectivity between the data source and all types of resource groups on which your synchronization nodes will run.

    Important

    A synchronization task can use only one type of resource group. By default, only shared resource groups for Data Integration are displayed in the resource group list. To ensure the stability and performance of data synchronization, we recommend that you use an exclusive resource group for Data Integration.

    1. Click Purchase to create a new resource group or click Associate Purchased Resource Group to associate an existing resource group. For more information, see Create and use an exclusive resource group for Data Integration.

    2. Find the resource group that you want to manage and click Test Network Connectivity in the Connection Status column.

      If Connected is displayed in the Connection Status column, the connectivity test is passed.

  7. If the data source passes the network connectivity test, click Complete.

    The newly created data source is displayed in the data source list.

Step 2: Configure a batch synchronization task by using codeless UI

  1. Add a synchronization task node.

    1. Go to the DataStudio console.

      1. Log on to the DataWorks console as a project administrator.

        Important

        Only the project administrator role can be used to add data sources. Members who are assigned other roles can only view data sources.

      2. In the left-side navigation pane, click Workspaces. In the top navigation bar, select a region.

      3. On the Workspaces page, find the workspace that you want to manage and choose Shortcuts > Data Development in the Actions column.

    2. In the Scheduled Workflow section of the DataStudio console, click Business Flow and select a business flow.

      For information about how to create a workflow, see Create a workflow.

    3. Right-click Data Integration and choose Create Node > Offline synchronization.

    4. In the Create Node dialog box, enter a name in the Name field and click Confirm.

      You can find the offline synchronization node that you created under Data Integration.

  2. Configure network and resource links.

    Select the source and destination for the batch synchronization task, and the resource group that is used to run the batch synchronization task. Establish network connections between the resource group and data sources, and test the connectivity.

    1. Double-click the created batch synchronization node under Data Integration.

    2. In the Configure Network Connections and Resource Group step, set the Source parameter to MaxCompute(ODPS) and the Data Source Name parameter to the name of the MaxCompute data source.

    3. Select a resource group.

      After you select a resource group, the system displays the region and specifications of the resource group and automatically tests the connectivity between the resource group and the selected data source.

      Important

      Make sure that the resource group is the same as the resource group that you selected when you added the data source.

    4. Set the Destination parameter to Tablestore and the Data Source Name parameter to the name of the Tablestore data source.

      The system automatically tests the connectivity between the resource group and the destination.

    5. After the connectivity test is passed, click Next.

  3. Configure the source and destination.

    1. In the Configure Source and Destination section, configure the parameters for the source. The following table describes the parameters.

      Parameter

      Description

      Tunnel Resource Group

      The data transmission resources of MaxCompute. The default value is Common transmission resources, which is a free quota for MaxCompute.

      Table

      The name of the table in MaxCompute.

      Partition information

      The names of the partitions from which you want to read data.

    2. In the Configure Source and Destination section, configure the parameters for the destination. The following table describes the parameters.

      Parameter

      Description

      Table

      The name of the Tablestore data table.

      primaryKeyInfo

      The primary key information about the Tablestore data table. The value of the parameter is a JSON array. The primary key of a data table consists of one to four primary key columns. Specify the primary key information based on the actual number of primary key columns.

      Important

      The number, sequence, and types of primary key columns that you specify must be the same as the number, sequence, and types of primary key columns in the data table.

      Example:

      [
       {
       "name": "pk",
       "type": "string"
       }
      ]

      WriteMode

      The mode in which data is written to Tablestore. Valid values:

      • PutRow: inserts data into the specified row. This mode corresponds to the Tablestore PutRow API operation. If the specified row does not exist, a new row is added. If the specified row exists, the row is overwritten.

      • UpdateRow: updates data in the specified row. This mode corresponds to the Tablestore UpdateRow API operation. If the specified row does not exist, a new row is added. If the specified row exists, the values of the specified columns in the row are added, modified, or removed based on the content of the request.

  4. Configure field mappings.

    After you configure the source and destination, you must configure mappings between source fields and destination fields. After you configure the mappings, the batch synchronization task writes the values of the source fields to the destination fields of the specific data type based on the mappings. By default, the system maps the field in a row of the source to the field in the same row of the destination.

    Important
    • If you want to change the mapping sequence, you can click the image.png icon next to Target field that corresponds to Source field. You can specify the name of a source field to represent the source field. You must use the JSON format to specify a destination field. Example: {"name":"age","type":"INTEGER"}. Each line represents a field.

    • The data types of attribute columns in the Tablestore data table must match the data types of fields that you want to write to MaxCompute. The names of attribute columns in the Tablestore data table can be different from the names of the fields that you want to write to MaxCompute.

    By default, the primary key information about the destination is specified when you configure the parameters for the destination. In this case, you must manually delete primary key information in the destination fields. Otherwise, the batch synchronization task fails.

    1. Click the image.png icon next to Target field.

    2. In the Manually Edit Table Fields dialog box, delete the primary key information in the destination fields and add or modify the attribute column information based on your business requirements.

    3. Click Confirmation.

  5. Configure channel control policies.

    In the Channel Control section, configure the parameters for task execution, such as the Synchronization rate and Policy for Dirty Data Records parameters. For more information, see Step 5: Configure channel control policies.

  6. Click the image.png icon to save the configurations.

    Note

    If you do not save the configurations, a message that prompts you to save the configurations appears when you perform subsequent operations. In this case, click Confirmation in the message to save the configurations.

  7. Run the synchronization task.

    Important

    In most cases, you need to synchronize full data only once. You do not need to configure scheduling properties.

    1. Click 1680170333627-a1e19a43-4e2a-4340-9564-f53f2fa6806e.

    2. In the Parameters dialog box, select the name of the resource group from the drop-down list.

    3. Click Run.

      After the synchronization task is complete, click the URL of the run log on the Runtime Log tab to go to the details page of the run log. On the details page of the run log, check the value of Current task status.

      If the value of Current task status is FINISH, the task is complete.

Step 3: View the data imported to Tablestore

Log on to the Tablestore console to view data.

  1. Log on to the Tablestore console.

  2. In the upper part of the Overview page, select a region.

  3. Click the name of the Tablestore instance to which data is restored.

  4. On the Instance Details tab, click the name of the restored table on the Tables tab.

  5. On the Query Data tab, you can view the data imported to Tablestore.

Export data

You can use DataWorks Data Integration to export Tablestore data to MaxCompute or OSS.

Billing

  • After you import data to Tablestore, you are charged by Tablestore for storage usage based on the amount of stored data.

  • When you use a migration tool to access Tablestore, you are charged by Tablestore for read and write throughput based on the read and write requests. You are separately charged for metered read and write CUs and reserved read and write CUs. The type of the instance that you access determines whether metered read and write CUs or reserved read and write CUs are consumed.

    Note

    For more information about instance types and CUs, see Instance and Read and write throughput.

  • When you use DataWorks tools, you are charged for specific features and resources. For more information, see Purchase guide.

Other solutions

You can download Tablestore data to a local file based on your business requirements. For more information, see Download data in Tablestore to a local file.

You can also use other migration tools, such as Tunnel Service, to import data.

Migration tool

Description

Migration solution

DataX

DataX abstracts the synchronization between different data sources into a Reader plug-in that reads data from the source and a Writer plug-in that writes data to the destination.

Synchronize data from one table to another table in Tablestore

Tunnel Service

Tunnel Service is an integrated service used to consume full and incremental data based on the Tablestore API. This tool is suitable for scenarios in which the source for migration or synchronization is Tablestore.

Tunnel Service provides tunnels that are used to export and consume data in the full, incremental, and differential modes. After you create tunnels, you can consume historical and incremental data that is exported from a specific table.

Synchronize data from one table to another table in Tablestore

Appendix: Field type mappings

This section describes the field type mappings between common services and Tablestore. In actual scenarios, configure field mappings based on the field type mappings.

Field type mapping between MaxCompute and Tablestore

Field type in MaxCompute

Field type in Tablestore

STRING

STRING

BIGINT

INTEGER

DOUBLE

DOUBLE

BOOLEAN

BOOLEAN

BINARY

BINARY

Field type mapping between MySQL and Tablestore

Field type in MySQL

Field type in Tablestore

STRING

STRING

INT or INTEGER

INTEGER

DOUBLE, FLOAT, or DECIMAL

DOUBLE

BOOL or BOOLEAN

BOOLEAN

BINARY

BINARY

Field type mapping between Kafka and Tablestore

Kafka Schema Type

Field type in Tablestore

STRING

STRING

INT8, INT16, INT32, or INT64

INTEGER

FLOAT32 or FLOAT64

DOUBLE

BOOLEAN

BOOLEAN

BYTES

BINARY