This topic describes how to migrate full data from a Time Series Database (TSDB) instance to the Lindorm time series engine (LindormTSDB).
Prerequisites
Linux or macOS is installed on the client and the following environment is installed:
Java Development Kit (JDK) 1.8 or later is installed.
Python 2.x or 3.x is installed.
The version of the TSDB instance is 2.7.4 or later.
A Lindorm instance is created and LindormTSDB is activated for the instance. For more information, see Create an instance.
Background information
LindormTSDB is developed by Alibaba Cloud and is compatible with most APIs of TSDB. Compared with TSDB, LindormTSDB offers higher performance with lower costs and supports more features. TSDB is no longer available for sale. We recommend that you migrate all data in your TSDB instances to LindormTSDB.
Process
You can perform the following steps to migrate full data from a TSDB instance to LindormTSDB:
Use the migration tool provided by LindormTSDB to read all time series data in the TSDB instance and save the data to a local file.
Split the migration task into multiple time groups based on the configurations of the task, including the start time, end time, and interval. Split each time group into multiple read subtasks based on the value of the oidBatch parameter in the configurations of the migration task. Each read subtask reads data in multiple time series with the specified time range and sends the data to the write component.
After all read subtasks in a time group are complete, record the ID of the time group, the ID of the migration task, and the task status in a list whose name is in the following format:
internal_datax_jobjobName
.NoteThe migration tool provided by LindormTSDB supports multiple migration tasks. The ID of each migration task is recorded in a task ID list. The migration of data in a time group does not start until all read subtasks in the previous time group are complete.
The write component receives the data sent by each read subtask and writes the data to LindormTSDB by using the multi-value data model.
Usage notes
If your application is deployed on an ECS instance, we recommend that you deploy the ECS instance and the Lindorm instance in the same VPC as the TSDB instance that you want to migrate to ensure the communication between the instances.
If you migrate data from a TSDB instance to LindormTSDB over the Internet, make sure that the public endpoints of the Lindorm and TSDB instances are enabled, and the IP address of your client is added to the whitelists of the Lindorm and TSDB instances. For more information, see Configure whitelists.
During the migration process, data is read from the TSDB instance and is written to LindormTSDB. Therefore, check whether your business is affected during the migration from the following dimensions before you migrate data:
The specification of the TSDB instance
The specifications of the environment (such as an ECS instance) on which applications are deployed
The number of time series in the TSDB instance
The total size of data that you want to migrate
The average frequency at which data in each time series are reported
The time range of the data that you want to migrate
The interval at which each migration task is split
NoteFor more information about performance evaluation, see Performance testing.
Data written by using the multi-value data model cannot be queried by using SQL statements. To use SQL statements to query the migrated data, create a time series table before you migrate data to LindormTSDB.
By default, the timestamps used in LindormTSDB are 13 bits in length, which indicate time values in milliseconds. The timestamps used in TSDB are 10 bits in length, which indicate time values in seconds. After data is migrated from the TSDB instance to LindormTSDB, the timestamps of the data are converted to 13 bits in length.
The single-value model is not recommended in LindormTSDB for writing data. Therefore, data that has been written to the TSDB instance by using the single-value model must be queried by using the multi-value data model. The following sample code shows how to query data that has been written to the TSDB instance by using the single-value data model in TSDB and LindormTSDB:
// // The statement used to query data in TSDB. curl -u username:password ts-xxxxx:3242/api/query -XPOST -d '{ "start": 1657004460, "queries": [ { "aggregator": "none", "metric": "test_metric" } ] }' // The query results in TSDB. [ { "aggregateTags": [], "dps": { "1657004460": 1.0 }, "fieldName": "", "metric": "test_metric", "tags": { "tagkey1": "1" } } ] // The statement used to query data in LindormTSDB. curl -u username:password ld-xxxxx:8242/api/mquery -XPOST -d '{ "start":1657004460, "queries": [ { "metric": "test_metric", "fields": [ { "field": "*", "aggregator": "none" } ], "aggregator": "none" } ] }' // The query results in LindormTSDB. [ { "aggregatedTags": [], "columns": [ "timestamp", "value" ], "metric": "test_metric", "tags": { "tagkey1": "1" }, "values": [ [ 1657004460000, 1.0 ] ] } ]
Configure a data migration task
Configure the parameters described in the following three tables and save the configurations as a JSON file such as job.json.
Configure parameters related to the task.
Parameter
Required
Description
channel
No
The number of concurrent tasks that can be performed at the same time. Default value: 1.
errorLimit
No
The number of write errors that are allowed during the migration task. Default value: 0.
Configure parameters related to data reading. Specify the values of the parameters based on the specification of the TSDB instance.
Parameter
Required
Description
sinkDbType
Yes
The type of the destination database. Set this parameter to LINDORM-MIGRATION.
endpoint
Yes
The endpoint that is used to connect to the TSDB instance. For more information, see Network connection.
beginDateTime
Yes
The time when the migration task starts.
endDateTime
Yes
The time when the migration task ends.
splitIntervalMs
Yes
The interval at which the migration task is split. The value is calculated based on the total duration of the migration task and the average frequency at which data in each time series is reported. Example: 604800000 (7 days).
If data in each time series is reported at a frequency of seconds or less, we recommend that you set the interval to a value shorter than one day.
If data in each time series is reported at a frequency of hours, you can set the interval to a larger value based on your requirements.
selfId
Yes
The ID of the custom migration task.
If you use multiple concurrent tasks to migrate data, specify the IDs of all tasks in the value of the jobIds parameter.
If you use only one task to migrate data, specify the ID of the task in the value of the jobIds parameter.
jobIds
Yes
The IDs of the migration tasks.
jobName
Yes
The name of the migration task. The name of a migration task is the same as the suffix of the task in the task status list. If you use multiple concurrent tasks to migrate data, the names of the migration tasks must be the same.
oidPath
Yes
The path in which all time series that you want to migrate are stored in the TSDB instance.
oidBatch
Yes
The number of time series that are read by each read subtask each time.
oidCache
Yes
Specifies whether to cache the time series migrated by the migration task to the memory. If you want to migrate tens of billions of time series, not all time series can be cached in the memory.
metrics
No
The table that you want to migrate. This parameter does not have the default value. Example:
["METRIC_1","METRIC_2"...]
.NoteThe amount of data that is read each time in a migration task is determined by the splitIntervalMs and oidBatch parameters and the average frequency at which data in each time series is reported. For example, if the value of splitIntervalMs is set to 604800000 and the value of oidBatch is set to 100, and data in each time series is reported on an hourly basis, the number of data records that can be read each time can be calculated by using the following formula: 100 × 604800000/3600000 = 16800.
Configure parameters related to data writing.
Parameter
Required
Description
endpoint
Yes
The endpoint used to access LindormTSDB. For more information, see View endpoints.
batchSize
Yes
The maximum number of data points that can be sent to LindormTSDB at a time.
multiField
Yes
Specifies whether the multi-value data model is used to write data. If you use the multi-value data model to write data to LindormTSDB, set this parameter to true.
The following example shows the content contained in the job.json file:
{
"job": {
"setting": {
"speed": {
"channel": 1
},
"errorLimit": {
"record": 0,
"percentage": 0.00
}
},
"content": [
{
"reader": {
"name": "tsdbreader",
"parameter": {
"sinkDbType": "LINDORM-MIGRATION",
"endpoint": "ts-xxxx:3242",
"beginDateTime": "2022-5-2 00:00:00",
"endDateTime": "2022-7-2 00:00:00",
"splitIntervalMs": 86400000,
"jobName":"myjob",
"selfId":1,
"jobIds":[1],
"oidPath":"{$myworkplace}/oidfile",
"oidBatch":100,
"oidCache":true
}
},
"writer": {
"name": "tsdbwriter",
"parameter": {
"endpoint": "ld-xxxx:8242",
"multiField":true,
"batchSize":500
}
}
}
]
}
}
Start the data migration task
Download the migration tool for time series data.
Run the following command to decompress the downloaded package of the migration tool:
tar -zxvf tsdb2lindorm.tar.gz
Run the following command to start the data migration task:
python datax/bin/datax.py --jvm="-Xms8G -Xmx8G" job.json > job.result
NoteReplace
job
in the preceding command with the name of the configuration file for the actual migration task.After the command is run, check whether error information is recorded in the job.result file. If no error information is returned, the migration task is successful.
(Optional)If the migration task fails, you can execute the following multi-value statement to query the task status list of the TSDB instance:
curl -u username:password ts-****:3242/api/mquery -XPOST -d '{ "start": 1, "queries": [ { "metric": "internal_datax_jobjobName", "fields": [ { "field": "*", "aggregator": "none" } ] } ] }'
Noteusername:password
: Replace this value with the account and password that you use to access the TSDB instance. For more information, see Manage accounts.ts-****
: Replace this value with the ID of the TSDB instance.jobName
: Replace this value with the name of the actual migration task. Example:internal_datax_jobmyjob
.
The following table describes the returned task status list.
Timestamp (endtime)
jobId (Tag)
state(field)
1651795199999 (2022-05-05 23:59:59.999)
3
ok
1651795199999 (2022-05-05 23:59:59.999)
2
ok
1651795199999 (2022-05-05 23:59:59.999)
1
ok
1651881599999 (2022-05-06 23:59:59.999)
2
ok
To prevent an executed migration task from being executed again, modify the value of beginDateTime in the job.json file before you start the task. In this example, the value of beginDateTime is changed to 2022-05-06 00:00:00.
Performance test
Before you migrate data from a TSDB instance, you must evaluate the performance of the TSDB instance. The following tables show the performance test results of TSDB Basic Edition instance and TSDB Standard Edition instances for reference.
Test results of two TSDB Basic Edition II instances each with 4 CPU cores and 8 GB of memory
Tests
Amount of data
Number of processes in a task
Configurations
Size of time series files
Number of data points migrated per second
Migration duration
TSDB resource utilization
1
Total number of time series: 30,000
Total number of data points: 86,400,000
1
channel:2
oidCache:true
oidBatch:100
splitInterval:6h
mem:-Xms6G -Xmx6G
1.5 MB
230000
12 minutes 30 seconds
CPU utilization: 30%
2
Total number of time series: 6,000,000
Total number of data points: 2,592,000,000
1
channel:10
oidCache:true
oidBatch:100
splitInterval:6h
mem:-Xms8G -Xmx8G
292 MB
200000
2 hours 55 minutes 30 seconds
CPU utilization: 70% to 90%
3
Total number of time series: 30,000,000
Total number of data points: 4,320,000,000
1
channel:10
oidCache:false
oidBatch:100
splitInterval:6h
mem:-Xms28G -Xmx28G
1.5 GB
140000
9 hours
CPU utilization: 40% to 80%
4
Total number of time series: 30,000,000
Total number of data points: 4,320,000,000
3
channel:10
oidCache:false
oidBatch:100
splitInterval:6h
mem:-Xms8G -Xmx8G
1.5 GB
250000
5 hours
CPU utilization: 90%
Test results of two TSDB Standard Edition I instances each with 8 CPU cores and 16 GB of memory
Amount of data
Number of processes in a task
Configurations
Size of time series files
Number of data points migrated per second
Migration duration
TSDB resource utilization
Total number of time series: 40,000,000
Total number of data points: 5,760,000,000
3
channel:10
oidCache:false
oidBatch:100
splitInterval:6h
mem:-Xms8G -Xmx8G
2 GB
150000~200000
9 hours
CPU utilization: 10% to 20%