If you want to back up full data in Tablestore in a cost-effective manner or export Tablestore data as a file to your local device, you can use DataWorks Data Integration to export full data from Tablestore to Object Storage Service (OSS). After the full data is exported to OSS, you can download the OSS object that contains the exported Tablestore data to your local device.
Usage notes
This feature is applicable to the Wide Column model and TimeSeries model of Tablestore.
Wide Column model: You can use the codeless user interface (UI) or code editor to export data from a data table in Tablestore to OSS.
TimeSeries model: You can use only the code editor to export data from a time series table in Tablestore to OSS.
Prerequisites
OSS is activated and an OSS bucket is created. For more information, see Activate OSS and Create buckets.
The information about the instances, data tables, or time series tables whose data you want to synchronize from Tablestore to OSS is confirmed and recorded.
DataWorks is activated and a workspace is created. For more information, see Activate DataWorks and Create a workspace.
A Resource Access Management (RAM) user is created and granted the AliyunOSSFullAccess and AliyunOTSFullAccess permissions. AliyunOSSFullAccess and AliyunOTSFullAccess grants full permissions on OSS and Tablestore. For more information, see Create a RAM user and Grant permissions to a RAM user.
ImportantTo prevent security risks caused by the leakage of the AccessKey pair of your Alibaba Cloud account, we recommend that you use the AccessKey pair of a RAM user.
An AccessKey pair is created for the RAM user. For more information, see Create an AccessKey pair.
Step 1: Add a Tablestore data source
To add a Tablestore database as the data source, perform the following steps.
Go to the Data Integration page.
Log on to the DataWorks console, select a region in the upper-left corner, choose , select a workspace from the drop-down list, and then click Go to Data Integration.
In the left-side navigation pane, click Data Source.
On the Data Source page, click Add Data Source.
In the Add Data Source dialog box, click the Tablestore block.
In the Add OTS data source dialog box, configure the parameters that are described in the following table.
Parameter
Description
Data Source Name
The name of the data source. The name can contain letters, digits, and underscores (_), and must start with a letter.
Data Source Description
The description of the data source. The description cannot exceed 80 characters in length.
Endpoint
The endpoint of the Tablestore instance. For more information, see Endpoints.
If the Tablestore instance and the resources of the destination data source are in the same region, enter a virtual private cloud (VPC) endpoint. Otherwise, enter a public endpoint.
Table Store instance name
The name of the Tablestore instance. For more information, see Instance.
AccessKey ID
The AccessKey ID and AccessKey secret of your Alibaba Cloud account or RAM user. For more information about how to create an AccessKey pair, see Create an AccessKey pair.
AccessKey Secret
Test the network connectivity between the data source and the resource group that you select.
To ensure that your synchronization nodes run as expected, you need to test the connectivity between the data source and all types of resource groups on which your synchronization nodes will run.
ImportantA synchronization task can use only one type of resource group. By default, only shared resource groups for Data Integration are displayed in the resource group list. To ensure the stability and performance of data synchronization, we recommend that you use an exclusive resource group for Data Integration.
Click Purchase to create a new resource group or click Associate Purchased Resource Group to associate an existing resource group. For more information, see Create and use an exclusive resource group for Data Integration.
After the resource group is started, click Test Network Connectivity in the Connection Status (Production Environment) column of the resource group.
If Connected is displayed, the connectivity test is passed.
If the data source passes the network connectivity test, click Complete.
The newly created data source is displayed in the data source list.
Step 2: Add an OSS data source
The procedure is similar to that in Step 1. In Step 2, click OSS in the Add data source dialog box.
In this example, the OSS data source is named OTS2OSS, as shown in the following figure.
When you configure parameters for an OSS data source, make sure that the endpoint does not contain the name of the specified OSS bucket and starts with
http://
orhttps://
.You can select RAM role authorization mode or Access Key mode as the access mode of the OSS data source.
Access Key mode: You can use the AccessKey pair of an Alibaba Cloud account or a RAM user to access the OSS data source.
RAM role authorization mode: DataWorks can assume related roles to access the OSS data source by using Security Token Service (STS) tokens. This ensures higher security. For more information, see Use the RAM role-based authorization mode to add a data source.
If you select RAM role authorization mode as Access Mode for the first time, the Warning dialog box appears to prompt you to create a service-linked role for DataWorks. Click Enable authorization. Then, select the service-linked role that you created for DataWorks and complete authorization.
Step 3: Create a batch synchronization node
Go to the DataStudio console.
Log on to the DataWorks console, select a region in the upper-left corner, choose , select a workspace from the drop-down list, and then click Go to DataStudio.
On the Scheduled Workflow page of the DataStudio console, click Business Flow and select a business flow.
For information about how to create a workflow, see Create a workflow.
Right-click the Data Integration node and choose Create Node > Offline synchronization.
In the Create Node dialog box, select a path and enter a node name.
Click Confirm.
The newly created offline synchronization node will be displayed under the Data Integration node.
Step 4: Configure and start a batch synchronization task
When you configure a task to synchronize full data from Tablestore to OSS, select a task configuration method based on the data storage model that you use.
If you select the Wide Column model to use a data table to store data, you need to synchronize data from the data table. For more information, see the Configure a task to synchronize data from a data table section of this topic.
If you select the TimeSeries model to use a time series table to store data, you need to synchronize data from the time series table. For more information, see the Configure a task to synchronize data from a time series table section of this topic.
Configure a task to synchronize data from a data table
Configure a task to synchronize data from a time series table
Step 5: View the data exported to OSS
Log on to the OSS console.
Click Buckets in the left-side navigation pane. On the Buckets page, find the bucket to which data is synchronized and click the name of the bucket.
On the Objects page, select an object and download the object to check whether the data is synchronized as expected.
FAQ
References
After you export full data from Tablestore to OSS, you can synchronize incremental data in Tablestore to OSS. For more information, see Synchronize incremental data to OSS.
After you export full data from Tablestore to OSS, you can use the time-to-live (TTL) management feature to clear historical data that is no longer needed in Tablestore tables. For more information, see the Data versions and TTL topic or the "Appendix: Manage a time series table" section of the Use the TimeSeries model in the Tablestore console topic.
You can download the OSS object that contains the exported Tablestore data to your local device by using the OSS console or ossutil. For more information, see Simple download.
To prevent important data from being unavailable due to accidental deletion or malicious tampering, you can use Cloud Backup to back up data in wide tables of Tablestore instances on a regular basis and restore lost or damaged data at your earliest opportunity. For more information, see Overview.
If you want to implement tiered storage for the hot and cold data of Tablestore, full backup of Tablestore data, and large-scale real-time data analysis, you can use the data delivery feature of Tablestore. For more information, see Overview.