All Products
Search
Document Center

Tablestore:Synchronize incremental data to OSS

Last Updated:Nov 20, 2024

To periodically synchronize incremental data from Tablestore to Object Storage Service (OSS) for backup or use, you can create and configure batch synchronization tasks in the DataWorks console. This topic describes how to synchronize incremental data from Tablestore to OSS by using batch synchronization tasks in the DataWorks console.

Usage notes

  • This feature is applicable to the Wide Column model and TimeSeries model of Tablestore.

    • Wide Column model: You can use the codeless user interface (UI) or code editor to export data from a data table in Tablestore to OSS.

    • TimeSeries model: You can use only the code editor to export data from a time series table in Tablestore to OSS.

  • When you use Tablestore Stream to synchronize incremental data, make sure that a whole row of data is written to Tablestore each time. The whole-row data write mode is applied to time series data such as IoT data. Therefore, data does not need to be modified after it is written.

  • Incremental data is synchronized every 5 minutes and Tablestore plug-ins may introduce a latency of 5 minutes. Therefore, the total latency for incremental synchronization ranges from 5 to 10 minutes.

Prerequisites

  • OSS is activated, and an OSS bucket is created. For more information, see Activate OSS and Create buckets.

  • The information about the instances, data tables, or time series tables whose data you want to synchronize from Tablestore to OSS is confirmed and recorded.

  • DataWorks is activated, and a workspace is created. For more information, see Activate DataWorks and Create a workspace.

  • A Resource Access Management (RAM) user is created, and the OSS and Tablestore policies are attached to the RAM user. For more information, see Create a RAM user and Grant permissions to a RAM user.

    Important

    To prevent security risks caused by the leakage of the AccessKey pair of your Alibaba Cloud account, we recommend that you use the AccessKey pair of a RAM user.

  • An AccessKey pair is created for the RAM user. For more information, see Create an AccessKey pair.

  • A Tablestore data source and an OSS data source are added. For more information, see the Step 1: Add a Tablestore data source and Step 2: Add an OSS data source sections of the "Export full data to OSS" topic.

Step 1: Create a batch synchronization node

  1. Go to the DataStudio console.

    1. Log on to the DataWorks console as the project administrator.

    2. In the top navigation bar, select a region. In the left-side navigation pane, click Workspaces.

    3. On the Workspaces page, find the workspace that you want to manage and choose Shortcuts > Data Development in the Actions column.

  2. On the Scheduled Workflow page of the DataStudio console, click Business Flow and select a business flow.

    For information about how to create a workflow, see Create a workflow.

  3. Right-click the Data Integration node and choose Create Node > Offline synchronization.

  4. In the Create Node dialog box, select a path and enter a node name.

  5. Click Confirm.

    The newly created offline synchronization node will be displayed under the Data Integration node.

Step 2: Configure and start a batch synchronization task

To configure a task to synchronize incremental data from Tablestore to OSS, select an appropriate configuration method based on the data storage model.

Configure a task to synchronize data from a data table

  1. Double-click the created batch synchronization node in the Data Integration folder.

  2. Establish network connections between the resource group and data sources.

    Select the source and destination data sources for the data synchronization task and the resource group that is used to run the data synchronization task. Establish network connections between the resource group and data sources and test the connectivity.

    Important

    Data synchronization tasks are run by using resource groups. Select a resource group and make sure that network connections between the resource group and data sources are established.

    1. In the Configure Network Connections and Resource Group step, set the Source parameter to Tablestore Stream and the Data Source Name parameter to the name of the Tablestore data source that you added.

    2. Select a resource group.

      After you select a resource group, the system displays the region and specifications of the resource group and automatically tests the connectivity between the resource group and the source data source.

      Important

      Make sure that the resource group is the same as that you selected when you added the data source.

    3. Set the Destination parameter to OSS and the Data Source Name parameter to the name of the OSS data source that you added.

      The system automatically tests the connectivity between the resource group and the destination data source.

    4. After the network connectivity test is passed, click Next.

  3. Configure and save the task.

    If you configure the task by using the codeless UI, data can be exported only based on the incremental changes in the column values of rows. If you configure the task by using the code editor, data can be exported based on the incremental changes in the column values of rows or the row mode. If you want to export incremental data based on the row mode, use the code editor.

    (Recommended) Use the codeless UI

    1. In the Configure Source and Destination section of the Configure tasks step, configure the source and destination data sources based on your business requirements.

      Configure the source data source

      Parameter

      Description

      Table

      The name of the data table in Tablestore.

      Start Time

      The start time and end time of the time range during which incremental data is read. Specify the parameters in the ${startTime} and ${endTime} formats. The specific formats are configured in scheduling properties. The time range is a left-closed, right-open interval.

      End Time

      State Table

      The name of the table that stores the status of Tablestore Stream. Default value: TableStoreStreamReaderStatusTable.

      Maximum Retries

      The maximum number of retries allowed for each request to read incremental data from Tablestore.

      Export Time Series Information

      Specifies whether to export time series information. The time series information includes the time when data is written.

      Configure the destination data source

      Parameter

      Description

      Text type

      The format of the object that is written to OSS, such as .csv or .txt.

      Note

      Different object formats require different configurations. Select an object format and specify the specific parameters that are displayed.

      File name (including path)

      This parameter is displayed only when you set the Text type parameter to text, csv, or orc.

      The name of the object in OSS. The name can contain a path, such as tablestore/20231130/myotsdata.csv.

      path

      This parameter is displayed only when you set the Text type parameter to parquet.

      The path of the object in OSS, such as tablestore/20231130/.

      fileName

      This parameter is displayed only when you set the Text type parameter to parquet.

      The name of the object in OSS.

      Column Delimiter

      This parameter is displayed only when you set the Text type parameter to text or csv.

      The delimiter that is used to separate columns when data is written to the OSS object.

      Row Delimiter

      This parameter is displayed only when you set the Text type parameter to text.

      The custom delimiter that is used to separate rows. Example: \u0001. You must use a delimiter that does not exist in the data as the row delimiter.

      If you want to use the default row delimiter \n for Linux or \r\n for Windows, we recommend that you leave this parameter empty. The system automatically reads data based on the default row delimiter.

      Coding

      This parameter is displayed only when you set the Text type parameter to text or csv.

      The encoding format of the OSS object to which you write data.

      null value

      This parameter is displayed only when you set the Text type parameter to text, csv, or orc.

      The string that can be interpreted as null in the source data source. For example, if you set this parameter to null and the value of a field in the source data source is null, the system recognizes the field as null.

      Time format

      This parameter is displayed only when you set the Text type parameter to text or csv.

      The time format in which date data is written to the OSS object, such as yyyy-MM-dd.

      Prefix conflict

      The processing method that is used when the specified object name is the same as the name of an existing object in OSS. Valid values:

      • Replace: deletes the existing object and creates another object of the same name.

      • Retain: retains the existing object and creates another object whose name combines the existing object name and a random suffix.

      • Error: reports an error and stops the batch synchronization task.

      Split File

      This parameter is displayed only when you set the Text type parameter to text or csv.

      The maximum size of a single OSS object to which data is written. Unit: MB. The maximum size is 100 GB. When the size of an object that is written reaches the value of this parameter, the system generates a new object to continue writing data until all data is written.

      Write as Single File

      This parameter is displayed only when you set the Text type parameter to text or csv.

      Specifies whether to write a single object to OSS at a time. By default, multiple objects are written to OSS at a time. If no data is read during data writing, the following cases may occur: If an object header is configured, an empty object that contains only the object header is generated. If no object header is configured, a completely empty object is generated.

      If you want to write a single object to OSS at a time, select Write as Single File. In this case, if no data is read during data writing, no empty file is generated.

      First Row as Table Header

      This parameter is displayed only when you set the Text type parameter to text or csv.

      Specifies whether to write the first row as table headers when data is written to an object. By default, no table headers are generated. If you want to write the first row as table headers, select First Row as Table Header.

    2. In the Field Mappings section, the system automatically completes field mappings. In this section, retain the default settings.

      The source fields include the primary key columns and incremental data changes in the data table. The destination fields cannot be configured.

      image.png

    3. In the Channel Control section, configure the parameters for task execution, such as the Task Expected Maximum Concurrency, Synchronization rate, Policy for Dirty Data Records, and Distributed Execution parameters. For more information about the parameters, see Configure channel control policies.

    4. Click the image.png icon to save the configurations.

      Note

      If you do not save the configurations, a message prompting you to save the configurations appears when you perform subsequent operations. Click OK to save the configurations.

    Use the code editor

    To synchronize incremental data, you must use OTSStream Reader and OSS Writer. For more information about how to configure the task by using the code editor, see Tablestore Stream data source and OSS data source.

    Important

    You cannot switch between the codeless UI and the code editor. Proceed with caution.

    1. In the Configure tasks step, click the image.png icon. In the message that appears, click OK.

    2. In the code editor, specify the parameters based on the following sample code.

      Important

      Comments are provided in the sample code to help you understand the configurations. Delete all comments when you use the sample code.

      Export incremental data based on the row mode

      {
          "type": "job",
          "version": "2.0",
          "steps": [
              {
                  "stepType": "otsstream", // The name of the reader. You cannot change the value of this parameter. 
                  "parameter": {
                      "statusTable": "TableStoreStreamReaderStatusTable", // The name of the table that stores the status of Tablestore Stream. In most cases, you do not need to change the value of this parameter. 
                      "maxRetries": 30, // The maximum number of retries allowed for each request. 
                      "isExportSequenceInfo": false, // Specifies whether to export time series information. The time series information includes the time when data is written. 
      				        "mode": "single_version_and_update_only", // The mode in which Tablestore Stream is used to export data. Set this parameter to single_version_and_update_only. If the configuration template does not contain this parameter, add this parameter. 
                      "datasource": "otssource", // The name of the source data source. Specify this parameter based on your business requirements. 
                      "envType": 1,
                      "column": [ 
      				          {
                           "name": "pk1" 
                        },
                        {
                           "name": "pk2"  
                        },
                        {
                           "name": "col1"  
                        }
                      ],
                      "startTimeString": "${startTime}", // The start time of data export. The task must be started in loops because this task is used for incremental data export. The start time for each loop varies. Therefore, you must use a variable, such as ${startTime}. 
                      "table": "mytable", // The name of the source table in Tablestore. 
                      "endTimeString": "${endTime}" // The end time of data export. You must use a variable, such as ${endTime}. 
                  },
                  "name": "Reader",
                  "category": "reader"
              },
              {
                  "stepType": "oss", // The name of the writer. You cannot change the value of this parameter. 
                  "parameter": { // If you specify parameters based on this sample code, data can be exported only in the CSV and TEXT formats. If you want to export data in the Parquet or ORC format, specify parameters based on the content in the "OSS data source" topic. 
                      "fieldDelimiterOrigin": ",",
                      "nullFormat": "null", // The string used to identify the null field value. The value can be an empty string. 
                      "dateFormat": "yyyy-MM-dd HH:mm:ss", // The format of the time. 
                      "datasource": "osssource", // The name of the OSS data source. Specify this parameter based on your business requirements. 
                      "envType": 1,
                      "writeSingleObject": true, // Specifies whether to write a single file to OSS at a time. A value of true specifies that you write a single file to OSS at a time. If no data is read, no empty file is generated. A value of false specifies that you write multiple files to OSS at a time. If no data is read and a file header is configured, an empty file that contains only the file header is generated. Otherwise, an empty file is generated. 
                      "writeMode": "truncate", // The operation to be performed by the system if an object with the specified source file name exists in the destination data source. Valid values: truncate, append, and nonConflict. A value of truncate specifies that the system clears the object in the destination data source. A value of append specifies that the system appends the data to the object in the destination data source. A value of nonConflict specifies that an error is reported.  
                      "encoding": "UTF-8", // The encoding type. 
                      "fieldDelimiter": ",", // The delimiter used to separate columns. 
                      "fileFormat": "csv", // The file format. Valid values: csv, text, parquet, and orc. 
                      "object": "" // The prefix of the name of the file that you want to synchronize to OSS. We recommend that you use the "Tablestore instance name/Table name/Date" format. Example: "instance/table/{date}". 
                  },
                  "name": "Writer",
                  "category": "writer" 
              },
              {
                  "copies": 1,
                  "parameter": {
                      "nodes": [],
                      "edges": [],
                      "groups": [],
                      "version": "2.0"
                  },
                  "name": "Processor",
                  "category": "processor"
              }
          ],
          "setting": {
              "errorLimit": {
                  "record": "0" // The number of errors allowed. If the number of errors exceeds this value, the synchronization task fails. 
              },
              "locale": "zh",
              "speed": {
                  "throttle": false, // Specifies whether to enable throttling. A value of false specifies that throttling is disabled, and a value of true specifies that throttling is enabled. The mbps parameter takes effect only if the throttle parameter is set to true. 
                  "concurrent": 3 // The maximum number of concurrent tasks. 
              }
          },
          "order": {
              "hops": [
                  {
                      "from": "Reader",
                      "to": "Writer"
                  }
              ]
          }
      }

      Export incremental data based on the incremental changes in the column values of rows

      {
          "type": "job",
          "version": "2.0",
          "steps": [
              {
                  "stepType": "otsstream", // The name of the reader. You cannot change the value of this parameter. 
                  "parameter": {
                      "statusTable": "TableStoreStreamReaderStatusTable", // The name of the table that stores the status of Tablestore Stream. In most cases, you do not need to change the value of this parameter. 
                      "maxRetries": 30, // The maximum number of retries allowed for each request. 
                      "isExportSequenceInfo": false, // Specifies whether to export time series information. The time series information includes the time when data is written. 
                      "datasource": "otssource",
                      "envType": 1,
                      "column": [ // The columns that you want to export from the data table to OSS. If the configuration template does not contain this parameter, add this parameter. In this example, the default settings are used. 
                          "pk1", // The name of the primary key column. If the data table has multiple primary key columns, you must specify all of them. 
                          "pk2", // The name of the primary key column. If the data table has multiple primary key columns, you must specify all of them. 
                          "colName", // The name of the attribute column whose value is incremental data. You do not need to change the value of this parameter. 
                          "version", // The data version number of the column whose value is incremental data. You do not need to change the value of this parameter. The value of this parameter is a 64-bit timestamp. Unit: millisecond. 
                          "colValue", // The value of the attribute column whose value is incremental data. You do not need to change the value of this parameter. 
                          "opType", // The type of the incremental data operation. You do not need to change the value of this parameter. 
                          "sequenceInfo" // The ID of the auto-increment sequence. You do not need to change the value of this parameter. 
                      ],
                      "startTimeString": "${startTime}", // The start time of data export. The task must be started in loops because this task is used for incremental data export. The start time for each loop varies. Therefore, you must use a variable, such as ${startTime}. 
                      "table": "mytable", // The name of the source table in Tablestore. 
                      "endTimeString": "${endTime}" // The end time of data export. You must use a variable, such as ${endTime}. 
                  },
                  "name": "Reader",
                  "category": "reader"
              },
              {
                  "stepType": "oss", // The name of the writer. You cannot change the value of this parameter. 
                  "parameter": { // If you specify parameters based on this sample code, data can be exported only in the CSV and TEXT formats. If you want to export data in the Parquet or ORC format, specify parameters based on the content in the "OSS data source" topic. 
                      "fieldDelimiterOrigin": ",",
                      "nullFormat": "null", // The string used to identify the null field value. The value can be an empty string. 
                      "dateFormat": "yyyy-MM-dd HH:mm:ss", // The format of the time. 
                      "datasource": "osssource", // The name of the OSS data source. Specify this parameter based on your business requirements. 
                      "envType": 1,
                      "writeSingleObject": true, // Specifies whether to write a single file to OSS at a time. A value of true specifies that you write a single file to OSS at a time. If no data is read, no empty file is generated. A value of false specifies that you write multiple files to OSS at a time. If no data is read and a file header is configured, an empty file that contains only the file header is generated. Otherwise, an empty file is generated. 
                      "column": [ // The columns that you want to synchronize to OSS. Specify this parameter by using the sequence number of a column. You do not need to change the value of this parameter. 
                          "0",
                          "1",
                          "2",
                          "3",
                          "4",
                          "5",
                          "6"
                      ],
                      "writeMode": "truncate", // The operation to be performed by the system if an object with the specified source file name exists in the destination data source. Valid values: truncate, append, and nonConflict. A value of truncate specifies that the system clears the object in the destination data source. A value of append specifies that the system appends the data to the object in the destination data source. A value of nonConflict specifies that an error is reported.  
                      "encoding": "UTF-8", // The encoding type. 
                      "fieldDelimiter": ",", // The delimiter used to separate columns. 
                      "fileFormat": "csv", // The file format. Valid values: csv and text. 
                      "object": "" // The prefix of the name of the file that you want to synchronize to OSS. We recommend that you use the "Tablestore instance name/Table name/Date" format. Example: "instance/table/{date}". 
                  },
                  "name": "Writer",
                  "category": "writer"
              },
              {
                  "copies": 1,
                  "parameter": {
                      "nodes": [],
                      "edges": [],
                      "groups": [],
                      "version": "2.0"
                  },
                  "name": "Processor",
                  "category": "processor"
              }
          ],
          "setting": {
              "errorLimit": {
                  "record": "0" // The number of errors allowed. If the number of errors exceeds this value, the synchronization task fails. 
              },
              "locale": "zh",
              "speed": {
                  "throttle": false, // Specifies whether to enable throttling. A value of false specifies that throttling is disabled, and a value of true specifies that throttling is enabled. The mbps parameter takes effect only if the throttle parameter is set to true. 
                  "concurrent": 3 // The maximum number of concurrent tasks. 
              }
          },
          "order": {
              "hops": [
                  {
                      "from": "Reader",
                      "to": "Writer"
                  }
              ]
          }
      }
    3. Click the image.png icon to save the configurations.

      Note

      If you do not save the script, a message prompting you to save the script appears when you perform subsequent operations. Click OK to save the script.

Configure a task to synchronize data from a time series table

  1. Double-click the created batch synchronization node in the Data Integration folder.

  2. Establish network connections between the resource group and data sources.

    Select the source and destination data sources for the data synchronization task and the resource group that is used to run the data synchronization task. Establish network connections between the resource group and data sources and test the connectivity.

    Important

    Data synchronization tasks are run by using resource groups. Select a resource group and make sure that network connections between the resource group and data sources are established.

    1. In the Configure Network Connections and Resource Group step, set the Source parameter to Tablestore Stream and the Data Source Name parameter to the name of the Tablestore data source that you added.

    2. Select a resource group.

      After you select a resource group, the system displays the region and specifications of the resource group and automatically tests the connectivity between the resource group and the source data source.

      Important

      Make sure that the resource group is the same as that you selected when you added the data source.

    3. Set the Destination parameter to OSS and the Data Source Name parameter to the name of the OSS data source that you added.

      The system automatically tests the connectivity between the resource group and the destination data source.

    4. After the network connectivity test is passed, click Next.

  3. Configure the task.

    To synchronize data from time series tables, you must use the code editor. To synchronize incremental data, you must use OTSStream Reader and OSS Writer. For more information about how to configure the task by using the code editor, see Tablestore Stream data source and OSS data source.

    Important

    You cannot switch between the codeless UI and the code editor. Proceed with caution.

    1. In the Configure tasks step, click the image.png icon. In the message that appears, click OK.

    2. In the code editor, specify the parameters based on the following sample code.

      The incremental data in a time series table can be exported only based on the row mode.

      Important

      Comments are provided in the sample code to help you understand the configurations. Delete all comments when you use the sample code.

      {
          "type": "job",
          "version": "2.0",
          "steps": [
              {
                  "stepType": "otsstream", // The name of the reader. You cannot change the value of this parameter. 
                  "parameter": {
                      "statusTable": "TableStoreStreamReaderStatusTable", // The name of the table that stores the status of Tablestore Stream. In most cases, you do not need to change the value of this parameter. 
                      "maxRetries": 30, // The maximum number of retries allowed for each request. 
                      "isExportSequenceInfo": false, // Specifies whether to export time series information. The time series information includes the time when data is written. 
                      "mode": "single_version_and_update_only", // The mode in which Tablestore Stream is used to export data. Set this parameter to single_version_and_update_only. If the configuration template does not contain this parameter, add this parameter. 
                      "isTimeseriesTable":"true", // Specifies whether data is synchronized from a time series table. If data is synchronized from a time series table, you must set this parameter to true. 
                      "datasource": "otssource", // The name of the source data source. Specify this parameter based on your business requirements. 
                      "envType": 1,
                      "column": [ // The columns that you want to export from the time series table to OSS. If the configuration template does not contain this parameter, add this parameter. In this example, the default settings are used. 
                          {
                              "name": "_m_name" // The name of the measurement column. You do not need to change the value of this parameter. If you do not need to synchronize the column, remove this parameter. 
                          },
                          {
                              "name": "_data_source", // The name of the data source column. You do not need to change the value of this parameter. If you do not need to synchronize the column, remove this parameter. 
                          },
                          {
                              "name": "_tags", // The name of the time series tag column. You do not need to change the value of this parameter. If you do not need to synchronize the column, remove this parameter. 
                          },
                          {
                              "name": "colname", // The name of the time series data column. Specify this parameter based on your business requirements. If you want to export multiple columns, add the columns. 
                          }
                       ],
                      "startTimeString": "${startTime}", // The start time of data export. The task must be started in loops because this task is used for incremental data export. The start time for each loop varies. Therefore, you must use a variable, such as ${startTime}. 
                      "table": "timeseriestable", // The name of the time series table in Tablestore. 
                      "endTimeString": "${endTime}" // The end time of data export. You must use a variable, such as ${endTime}. 
                  },
                  "name": "Reader",
                  "category": "reader"
              },
              {
                  "stepType": "oss", // The name of the writer. You cannot change the value of this parameter. 
                  "parameter": { // If you specify parameters based on this sample code, data can be exported only in the CSV and TEXT formats. If you want to export data in the Parquet or ORC format, specify parameters based on the content in the "OSS data source" topic. 
                      "fieldDelimiterOrigin": ",", 
                      "nullFormat": "null", // The string used to identify the null field value. The value can be an empty string. 
                      "dateFormat": "yyyy-MM-dd HH:mm:ss", // The format of the time. 
                      "datasource": "osssource", // The name of the OSS data source. Specify this parameter based on your business requirements. 
                      "envType": 1, 
                      "writeSingleObject": false, // Specifies whether to write a single file to OSS at a time. A value of true specifies that you write a single file to OSS at a time. If no data is read, no empty file is generated. A value of false specifies that you write multiple files to OSS at a time. If no data is read and a file header is configured, an empty file that contains only the file header is generated. Otherwise, an empty file is generated. 
                      "writeMode": "truncate", // The operation to be performed by the system if an object with the specified source file name exists in the destination data source. Valid values: truncate, append, and nonConflict. A value of truncate specifies that the system clears the object in the destination data source. A value of append specifies that the system appends the data to the object in the destination data source. A value of nonConflict specifies that an error is reported.  
                      "encoding": "UTF-8", // The encoding type. 
                      "fieldDelimiter": ",", // The delimiter used to separate columns. 
                      "fileFormat": "csv", // The file format. Valid values: csv and text. 
                      "object": "" // The prefix of the name of the file that you want to synchronize to OSS. We recommend that you use the "Tablestore instance name/Table name/Date" format. Example: "instance/table/{date}". 
                  },
                  "name": "Writer",
                  "category": "writer"
              },
              {
                  "name": "Processor",
                  "stepType": null,
                  "category": "processor",
                  "copies": 1,
                  "parameter": {
                      "nodes": [],
                      "edges": [],
                      "groups": [],
                      "version": "2.0"
                  }
              }
          ],
          "setting": {
              "executeMode": null,
              "errorLimit": {
                  "record": "0" // The number of errors allowed. If the number of errors exceeds this value, the synchronization task fails. 
              },
              "speed": {
                  "concurrent": 2, // The maximum number of concurrent tasks. 
                  "throttle": false  // Specifies whether to enable throttling. A value of false specifies that throttling is disabled, and a value of true specifies that throttling is enabled. The mbps parameter takes effect only if the throttle parameter is set to true. 
              }
          },
          "order": {
              "hops": [
                  {
                      "from": "Reader",
                      "to": "Writer"
                  }
              ]
          }
      }
    3. Click the image.png icon to save the configurations.

      Note

      If you do not save the script, a message prompting you to save the script appears when you perform subsequent operations. Click OK to save the script.

Step 3: Configure scheduling properties

You can configure the scheduling properties of the batch synchronization task in the Properties panel, such as the time to run the task, rerun properties, and scheduling dependencies.

  1. Click Properties in the right-side navigation pane of the task configuration tab.

  2. In the Scheduling Parameter section of the Properties panel, click Add Parameter to add parameters. The following table describes the parameters. For more information, see Supported formats of scheduling parameters.

    Parameter

    Value

    startTime

    $[yyyymmddhh24-2/24]$[miss-10/24/60]

    endTime

    $[yyyymmddhh24-1/24]$[miss-10/24/60]

    The following figure shows how to configure the parameters.

    image..png

    For example, if you want to run the task at 19:00:00 on April 23, 2023, you can set the startTime parameter to 20230423175000 and the endTime parameter to 20230423185000. In this case, the task synchronizes the data that is generated from 17:50 to 18:50.

  3. In the Schedule section, configure the scheduling properties. For more information, see Configure time properties.

    The following figure shows how to configure a task that is scheduled to run by the hour.

    image..png

  4. In the Dependencies section, select Add Root Node. The system automatically generates the information about the ancestor node of the current node.

    If you select Add Root Node, the task that runs on the current node does not depend on the ancestor node of the current node.

    image.png

  5. After the configuration is complete, close the Properties panel.

Step 4: Debug the script and commit the task

  1. Optional. Debug the script.

    Debug the script to ensure that the synchronization task can synchronize incremental data from Tablestore to OSS.

    Important

    When you debug the script, data generated within the specified time range may be imported to OSS multiple times. If the same data rows are written to OSS multiple times, the relevant data rows in OSS are overwritten.

    1. Click the 1680170333627-a1e19a43-4e2a-4340-9564-f53f2fa6806e icon.

    2. In the Parameters dialog box, select a resource group and configure the custom parameters.

      Specify the custom parameter values in the yyyyMMddHHmmss format. Example: 20230423175000.

      image

    3. Click Run.

  2. Commit the synchronization task.

    After a synchronization task is committed, the synchronization task is run based on the scheduling properties that you configured.

    1. Click the image icon.

    2. In the Submit dialog box, specify the change description based on your business requirements.

    3. Click Confirm.

Step 5: View the result of the synchronization task

  1. To view the status of the task, perform the following steps in the DataWorks console:

    1. Click Operation Center in the upper-right corner of the task configuration tab.

    2. In the left-side navigation pane, choose Cycle Task Maintenance > Cycle Instance. On the Instance Perspective tab of the Cycle Instance page, view the status of the instance.

  2. To view the result of the task, perform the following steps in the OSS console:

    1. Log on to the OSS console.

    2. Click Buckets in the left-side navigation pane. On the Buckets page, find the bucket to which data is synchronized and click the name of the bucket.

    3. On the Objects page, select an object and download the object to check whether the data is synchronized as expected.

FAQ

Errors that may occur when OTSStream Reader is running

References

  • If you want to download the OSS objects that contain the exported Tablestore data to your local device, you can use the OSS console or ossutil. For more information, see Simple download.

  • To prevent important data from being unavailable due to accidental deletion or malicious tampering, you can use Cloud Backup to back up data in the wide tables of Tablestore instances on a regular basis and restore lost or damaged data at your earliest opportunity. For more information, see Overview.

  • If you want to implement tiered storage for the hot and cold data of Tablestore, full backup of Tablestore data, and large-scale real-time data analysis, you can use the data delivery feature of Tablestore. For more information, see Overview.