You can use DataWorks Data Integration to synchronize full and incremental data in Tablestore to Object Storage Service. This way, Tablestore data is backed up and you can use Tablestore data in OSS.
How it works
The offline synchronization feature of DataWorks Data Integration abstracts the synchronization between different data sources and destinations into a Reader plug-in that is used to read data from the data source and a Writer plug-in that is used to write data to the destination. This allows you to define the data source and destination and use them together with DataWorks scheduling parameters to synchronize full or incremental data from the data source to the destination.
When you synchronize Tablestore data to OSS, you must configure a Tablestore-related Reader plug-in and the OSS-related Writer plug-in for the offline synchronization task. The following items describe the usage notes of the plug-ins.
Tablestore-related Reader plug-ins
The Tablestore-related Reader plug-in that is required varies based on the data synchronization mode that you use. The following table describes the mappings between data synchronization modes and Tablestore-related Reader plug-ins.
Synchronization mode
Tablestore-related Reader plug-in
Plug-in description
Full export
Tablestore Reader
The plug-in is used to read data from Tablestore tables. You can specify the range of data that you want to extract to perform incremental extraction. For more information, see Tablestore data source.
Incremental synchronization
OTSStream Reader
The plug-in is used to export data in Tablestore tables in incremental mode. For more information, see OTSStream data source.
OSS-related Writer plug-in
DataWorks uses the OSS-related Writer plug-in to write data to OSS, regardless of whether the full export or incremental synchronization mode is used. For more information, see OSS data source.
Synchronization modes
You can configure data filters and use scheduling parameters in offline synchronization tasks to determine whether to synchronize full data or incremental data. The following table describes the synchronization modes.
Synchronization mode | Description |
Full export | In this mode, full data in Tablestore is exported to OSS at a time. If you use this mode, you need to run an offline synchronization task only once. You do not need to configure scheduling parameters for the offline synchronization task. |
Incremental synchronization | In this mode, new and modified data in Tablestore is periodically synchronized to OSS. If you use this mode, you need to configure scheduling parameters for the offline synchronization task. This way, incremental data is periodically synchronized. |
Scenarios
You need to back up Tablestore data at lower costs or want to export Tablestore data as files to local devices.
Procedure
The procedure varies based on the synchronization mode that you use. Use the procedure specific to your synchronization mode. For more information, see Export full data from Tablestore to OSS and Synchronize incremental data to OSS.
Full export procedure
The following table describes the major steps in full export mode.
Step | Operation | Description |
1 | Add a data source | This step is performed to specify instance information about the table from which you want to synchronize data. The data source is Tablestore. |
2 | Add a destination | This step is performed to specify information about the OSS bucket to which you want to synchronize data. The destination is OSS. |
3 | Create an offline task node | Offline task nodes are required for offline synchronization operations. You need to create an offline task node for each synchronization operation. |
4 | Configure and start an offline synchronization task | DataWorks Data Integration provides wizard mode and script mode to configure offline synchronization tasks. Select the mode based on your business requirements.
|
5 | Verify migration results | After you export data, you can view the imported data in the OSS console. |
Incremental synchronization procedure
The following table describes the major steps in incremental synchronization mode.
Step | Operation | Description |
1 | Add a data source | This step is performed to specify instance information about the table from which you want to synchronize data. The data source is Tablestore. If an existing Tablestore data source meets your business requirements, skip this step. |
2 | Add a destination | This step is performed to specify information about the OSS bucket to which you want to synchronize data. The destination is OSS. If an existing OSS data source meets your business requirements, skip this step. |
3 | Create an offline task node | Offline task nodes are required for offline synchronization operations. You need to create an offline task node for each synchronization operation. |
4 | Configure and start an offline synchronization task | DataWorks Data Integration provides wizard mode and script mode to configure offline synchronization tasks. Select the mode based on your business requirements.
|
5 | Configure scheduling parameters | This step is performed to configure the execution time, rerun property, and scheduling dependencies of the synchronization task so that the synchronization task can be periodically executed. |
6 | Debug code and submit the task | After the debugging is successful, submit the offline synchronization task to the server so that the task can be periodically executed based on the scheduling properties. |
7 | View task execution results | You can view the task running status in the DataWorks console and view the data synchronization results in the OSS console. |
Billing rules
When you synchronize data from Tablestore to OSS, you are charged by Tablestore for reading Tablestore data based on the number of capacity units (CUs) that are consumed. You are charged separately for metered read CUs and reserved read CUs. Whether metered read CUs or reserved read CUs are consumed varies based on the type of the instance that you access. For more information, see Billing overview.
NoteFor more information about instance types and CUs, see Instances and Read and write throughput.
After data is synchronized to OSS, you are charged by OSS for the storage of data files based on the storage usage and duration. If you download objects from OSS to local devices, you are charged by OSS for the number of GET API requests and the amount of outbound traffic over the Internet. For more information, see OSS billing overview.