Migrate data to TSDB by using DataWorks - Time Series Database

This article introduces the Data Integration function of Dataworks to realize the Data Transport from OpenTSDB to TSDB.

Background information

This topic describes how to migrate data from OpenTSDB to Time Series Database (TSDB) by using the Data Integration service of DataWorks.

DataWorks is an important platform as a service (PaaS) of Alibaba Cloud. It offers a wide range of services, including data Aggregation, data development, dataService studio, DataAnalysis, and data governance. DataWorks also provides a one-stop data development and management console, which helps enterprises implement data mining and unlock the full potential of valuable data. The data development service of DataWorks is used in the data migration process in this topic. If you are new to DataWorks, see the DataWorks documentation for more information.

Currently, DataWorks supports migrating data from the following types of data sources to TSDB: TSDB, OpenTSDB, Prometheus, InfluxDB, and MySQL.

Quick Start

Step 1: Ingress

Log on to the DataWorks console. If no workspaces are available in the console, you must create a workspace.

Step 2: Create a sync node on the DataStudio page

In the upper-left corner of the page, right-click Business Flow, and then click Create Workflow. Figure 1 shows the position of the Create Workflow option.

dataworks_new_business_process In the dialog box that appears, enter a workflow name, for example, migration_from_opentsdb_to_tsdb. Figure 2 shows the dialog box where you can create a workflow.

dataworks_business_name Follow the three steps in Figure 3 to create the data synchronization task:

dataworks_create_migration_job In the dialog box that appears, enter a name for the sync node, for example, node1. Figure 4 shows the dialog box where you can create a node.

dataworks_create_migration_job_name After the sync node is created, node1 is displayed in the blank section on the right of the page. Double-click node1. On the page that appears, configure the sync node. Figure 5 shows the page where the node is displayed.

dataworks_begin_edit_job By default, the sync node node1 is configured based on the codeless UI. If you want to configure the node by using the code editor, you can click the rightmost icon in the top toolbar. Figure 6 shows the page where you can configure the node.

dataworks_script_mode The default sync node synchronizes data from Stream Reader to Stream Writer. Stream Reader is the source that generates random strings, and Stream Writer is the target that receives and prints the generated random strings. For more information about how to configure Stream Reader and Stream Writer, click the corresponding topics at the top of the page.

Stream Reader and Stream Writer can synchronize data without depending on external resources. To run the sync node, you can click the Run icon in the upper-left corner. Then, you can view the execution process in the section that appears at the bottom of the page.

Step 3: Modify the configuration

Change the configurations of the default sync node to migrate data from OpenTSDB to TSDB.

Click to import a configuration template. Figure 7 shows the position of the icon that you must click.

dataworks_import_job_template In the dialog box that appears, set the source connection type to OpenTSDB and the target connection type to TSDB. Figure 8 shows the dialog box where you can configure the template.

dataworks_import_template_opentsdb_2_tsdb Click OK. Then, the values for the stepType parameters are changed to opentsdb and tsdb. Other configuration items are also automatically changed to migrate data from OpenTSDB to TSDB. In addition, the topic names in the help documentation are also changed. You can click the new topic names to obtain details about how to configure “OpenTSDB Reader” and “TSDB Writer”. Figure 9 shows the new topic names.

dataworks_imported_template Then, modify the configurations based on the help documentation. You must specify the following five parameters: endpoint, column, beginDateTime, endDateTime, and endpoint. The first endpoint parameter specifies the OpenTSDB endpoint, and the second endpoint parameter specifies the TSDB endpoint. The column parameter determines the metrics that are to be migrated. The beginDateTime and endDateTime parameters determine the time range during which the data is to be migrated. The sample code is described as follows:

{ "type":"job", "steps":[ { "stepType":"opentsdb", "parameter":{ "endpoint":"http://host:4242", "column":[ "m" ], "beginDateTime":"20190101000000", "endDateTime":"20190101030000" }, "name":"Reader", "category":"reader" }, { "stepType":"tsdb", "parameter":{ "endpoint":"http://host:8242" }, "name":"Writer", "category":"writer" } ], "version":"2.0", "order":{ "hops":[ { "from":"Reader", "to":"Writer" } ] }, "setting":{ "errorLimit":{ "record":"0" }, "speed":{ "throttle":false, "concurrent":1, "dmu":1 } } }

Step 4: Modify the whitelist

To use the default resource group of DataWorks, you must add the CIDR block of the region to the whitelist. For example, to migrate data from OpenTSDB to TSDB, you must configure a whitelist for OpenTSDB and TSDB, respectively.

Find the CIDR blocks that must be added to the whitelist based on the region where the DataWorks workspace resides. For more information, you can navigate through User Guide > Data Integration > Common configurations > Configure a whitelist in the DataWorks V2.0 documentation. The China (Shanghai) region is used as an example to describe how to configure the whitelist.
If your user-created OpenTSDB instances are hosted on an ECS instance, add the corresponding CIDR blocks to the security groups of the ECS instance. The added CIDR blocks must include those for the HBase nodes and TSD nodes. HBase is the underlying data storage system for OpenTSDB.
Then, add the corresponding CIDR blocks to the whitelist of the TSDB instance that runs on the cloud. For more information, you can navigate through Quick Start > Set the IP address whitelist in the TSDB documentation.

Step 5: Synchronize data

Click the Run icon to run the sync node. Figure 10 shows an example of the execution process.

dataworks_opentsdb_2_tsdb_run Step 6: Create exclusive resource groups

By default, shared resource groups of DataWorks are used to run nodes. The shared resources may be preempted, and the performance for data migration may be negatively affected. If you have high requirements for the performance, we recommend that you create exclusive resource groups.