DataWorks Data Integration provides TSDB Writer for you to write data points to Lindorm Time Series Database (TSDB) provided by Alibaba Cloud ApsaraDB for Lindorm. This topic describes the capabilities of synchronizing data to TSDB data sources.
Supported TSDB versions
TSDB Writer supports all versions of ApsaraDB for Lindorm and HiTSDB V2.4.X or later.
Limits
TSDB Writer supports only exclusive resource groups for Data Integration.
You can configure TSDB Writer only by using the code editor.
How it works
TSDB Writer connects to a TSDB instance by using the TSDB client hitsdb-client and writes data points by using the HTTP API endpoint. For more information, see TSDB SDK documentation.
Data type mappings
If the sourceDbType parameter is set to TSDB, source data is read by using TSDB Reader or OpenTSDB Reader. In this case, TSDB Writer writes the source data to Lindorm TSDB in the format of JSON strings. If the sourceDbType parameter is set to RDB, the source is a relational database. In this case, TSDB Writer parses the source data based on the records of the relational database. The following table lists the valid values of the columnType parameter and the data types that match the column types when the sourceDbType parameter is set to RDB.
Data model | Valid value of columnType | Data type |
Tag | tag | A string data type. A tag describes the features of the data source. In most case, a tag does not change over time. |
Timestamp | timestamp | The TIMESTAMP data type. A timestamp specifies the point in time at which data is generated. The timestamp can be manually specified when data is written or automatically generated by the system. |
Field | field_string | A string data type. A field describes the measurement metrics of the data source. In most case, a field changes over time. |
field_double | A numeric data type. A field describes the measurement metrics of the data source. In most case, a field changes over time. | |
field_boolean | A Boolean data type. A field describes the measurement metrics of the data source. In most case, a field changes over time. |
Develop a data synchronization task
For information about the configuration procedure, see Configure a batch synchronization task by using the code editor.
For information about all parameters that are configured and the code that is run when you use the code editor to configure a batch synchronization task, see Appendix: Code and parameters.
Appendix: Code and parameters
Appendix: Configure a batch synchronization task by using the code editor
If you use the code editor to configure a batch synchronization task, you must configure parameters for the reader and writer of the related data source based on the format requirements in the code editor. For more information about the format requirements, see Configure a batch synchronization task by using the code editor. The following information describes the configuration details of parameters for the reader and writer in the code editor.
Code for TSDB Writer
Write data from RDB to TSDB by using the following default configurations (recommended)
{ "type": "job", "version": "2.0", "steps": [ { "stepType": "stream",// You can replace the stream plug-in with the specific RDB plug-in. RDB databases include MySQL, Oracle, PostgreSQL, and DRDS databases. "parameter": {}, "name": "Reader", "category": "reader" }, { "stepType": "tsdb", "parameter": { "endpoint": "http://localhost:8242", "username": "xxx", "password": "xxx", "sourceDbType": "RDB", "batchSize": 256, "columnType": [ "tag", "tag", "field_string", "field_double", "timestamp", "field_bool" ], "column": [ "tag1", "tag2", "field1", "field2", "timestamp", "field3" ], "multiField": "true", "table": "testmetric", "ignoreWriteError": "false", "database": "default" }, "name": "Writer", "category": "writer" } ], "setting": { "errorLimit": { "record": "0" }, "speed": { "throttle":true,// Specifies whether to enable throttling. The value false indicates that throttling is disabled, and the value true indicates that throttling is enabled. The mbps parameter takes effect only when the throttle parameter is set to true. "concurrent":1, // The maximum number of parallel threads. "mbps":"12"// The maximum transmission rate. Unit: MB/s. } }, "order": { "hops": [ { "from": "Reader", "to": "Writer" } ] } }
Write data from a database that supports the OpenTSDB protocol to TSDB
{ "type": "job", "version": "2.0", "steps": [ { "stepType": "opentsdb", "parameter": { "endpoint": "http://localhost:4242", "column": [ "m1", "m2", "m3", "m4", "m5", "m6" ], "startTime": "2019-01-01 00:00:00", "endTime": "2019-01-01 03:00:00" }, "name": "Reader", "category": "reader" }, { "stepType": "tsdb", "parameter": { "endpoint": "http://localhost:8242" }, "name": "Writer", "category": "writer" } ], "setting": { "errorLimit": { "record": "0" }, "speed": { "throttle":true,// Specifies whether to enable throttling. The value false indicates that throttling is disabled, and the value true indicates that throttling is enabled. The mbps parameter takes effect only when the throttle parameter is set to true. "concurrent":1, // The maximum number of parallel threads. "mbps":"12"// The maximum transmission rate. Unit: MB/s. } }, "order": { "hops": [ { "from": "Reader", "to": "Writer" } ] } }
Use the OpenTSDB protocol to write a univariate data point to TSDB (not recommended)
{ "type": "job", "version": "2.0", "steps": [ { "stepType": "stream",// You can replace the stream plug-in with the specific RDB plug-in. RDB databases include MySQL, Oracle, PostgreSQL, and DRDS databases. "parameter": {}, "name": "Reader", "category": "reader" }, { "stepType": "tsdb", "parameter": { "endpoint": "http://localhost:8242", "username": "xxx", "password": "xxx", "sourceDbType": "RDB", "batchSize": 256, "columnType": [ "tag", "tag", "field_string", "field_double", "timestamp", "field_boolean" ], "column": [ "tag1", "tag2", "field_metric_1", "field_metric_2", "timestamp", "field_metric_3" ], "ignoreWriteError": "false" }, "name": "Writer", "category": "writer" } ], "setting": { "errorLimit": { "record": "0" }, "speed": { "throttle":true,// Specifies whether to enable throttling. The value false indicates that throttling is disabled, and the value true indicates that throttling is enabled. The mbps parameter takes effect only when the throttle parameter is set to true. "concurrent":1, // The maximum number of parallel threads. "mbps":"12"// The maximum transmission rate. Unit: MB/s. } }, "order": { "hops": [ { "from": "Reader", "to": "Writer" } ] } }
NoteThe names of the TSDB metrics are determined by the column names of fields for the column parameter. In the preceding code, a row of data in a relational database is written to three metrics: field_metric_1, field_metric_2, and field_metric_3.
Parameters in code for TSDB Writer
Parameter type | Parameter | Description | Required | Default value |
Common parameters | sourceDbType | The type of the source database. | No | TSDB Note Valid values: TSDB and RDB. The value TSDB indicates that the source database is an OpenTSDB, Prometheus, or Timescale database. The value RDB indicates that the source database is a relational database, such as a MySQL, Oracle, PostgreSQL, or DRDS database. |
endpoint | The HTTP URL of the destination TSDB database. Specify the endpoint in the format of http://IP address:Port number. You can obtain the HTTP endpoint in the ApsaraDB for Lindorm console. | Yes | No default value | |
database | The name of the TSDB database to which data is written. | No | default Note You must create a database first. | |
username | The username of the TSDB database. You must specify a value for this parameter if you configure authentication for the TSDB database. | No | No default value | |
batchSize | The number of data records to write at a time. The value of this parameter is of the INT type and must be greater than 0. If you want to configure a large value for the batchSize parameter, you must reserve more memory space. | No | 100 | |
Parameters for TSDB | maxRetryTime | The maximum number of retries allowed after a failure. The value of this parameter is of the INT type and must be greater than 1. | No | 3 |
ignoreWriteError | Specifies whether to ignore write errors. The value of this parameter is of the BOOLEAN type. If you set this parameter to true, TSDB Writer continues to perform the write operation after a write error occurs. If the write operation fails after the specified number of retries, the synchronization task is terminated. | No | false | |
Parameters for RDB | table | The names of the metrics that you want to import to TSDB. If the multiField parameter is set to false, you can leave this parameter empty. In this case, you need to specify the names of the metrics for the column parameter. If the multiField parameter is set to true, you must configure this parameter. | No | No default value |
multiField | Specifies whether to write a multivariate data point to TSDB by using the HTTP API endpoint. Note If you want to use the native SQL capabilities of Lindorm TSDB to access data that is written by using the HTTP API endpoint, you must create a table in TSDB. Otherwise, you can query a multivariate data point only by using the TSDB HTTP API endpoint. For more information, see Query a multivariate data point. | Yes | false Note To write a multivariate data point to TSDB, you must set the value to true. | |
column | The names of the columns whose data you want to write to the TSDB database. | Yes | No default value Note You must specify the columns in the same order as the columns specified for a reader. | |
columnType | The data types of the columns in the relational database. The following types are supported:
| Yes | No default value Note You must specify the columns in the same order as the columns specified for a reader. | |
batchSize | The number of data records to write at a time. The value of this parameter is of the INT type and must be greater than 0. | No | 100 |
Performance test report
Characteristics of test data
Metric: a metric, which is m.
tag_k and tag_v: the key and value of a tag. The keys and values of the first four tags constitute a time series of 2,000,000 data points. The number of data points is calculated by using the following formula:
10 (zones) × 20 (clusters) × 100 (groups) × 100 (applications)
. The ip tag corresponds to the index of the 2,000,000 data points, starting from 1.tag_k
tag_v
zone
z1 to z10
cluster
c1 to c20
group
g1 to g100
app
a1 to a100
ip
ip1 to ip2,000,000
value: a random value from 1 to 100.
interval: a collection interval of 10 seconds. The total duration of data collection is 3 hours, and a total number of 2,160,000,000 data points are collected. The number of data points is calculated by using the following formula:
3 × 60 × 60/10 × 2,000,000
.
Performance test results
Number of channels
Data integration speed (record/s)
Data integration bandwidth (Mbit/s)
1
129,753
15.45
2
284,953
33.70
3
385,868
45.71