All Products
Search
Document Center

Lindorm:Migrate full data from TSDB to LindormTSDB

Last Updated:May 21, 2024

This topic describes how to migrate full data from a Time Series Database (TSDB) instance to the Lindorm time series engine (LindormTSDB).

Prerequisites

  • Linux or macOS is installed on the client and the following environment is installed:

    • Java Development Kit (JDK) 1.8 or later is installed.

    • Python 2.x or 3.x is installed.

  • The version of the TSDB instance is 2.7.4 or later.

  • A Lindorm instance is created and LindormTSDB is activated for the instance. For more information, see Create an instance.

Background information

LindormTSDB is developed by Alibaba Cloud and is compatible with most APIs of TSDB. Compared with TSDB, LindormTSDB offers higher performance with lower costs and supports more features. TSDB is no longer available for sale. We recommend that you migrate all data in your TSDB instances to LindormTSDB.

Process

You can perform the following steps to migrate full data from a TSDB instance to LindormTSDB:

  1. Use the migration tool provided by LindormTSDB to read all time series data in the TSDB instance and save the data to a local file.

  2. Split the migration task into multiple time groups based on the configurations of the task, including the start time, end time, and interval. Split each time group into multiple read subtasks based on the value of the oidBatch parameter in the configurations of the migration task. Each read subtask reads data in multiple time series with the specified time range and sends the data to the write component.

  3. After all read subtasks in a time group are complete, record the ID of the time group, the ID of the migration task, and the task status in a list whose name is in the following format: internal_datax_jobjobName.

    Note

    The migration tool provided by LindormTSDB supports multiple migration tasks. The ID of each migration task is recorded in a task ID list. The migration of data in a time group does not start until all read subtasks in the previous time group are complete.

  4. The write component receives the data sent by each read subtask and writes the data to LindormTSDB by using the multi-value data model.

Usage notes

  • If your application is deployed on an ECS instance, we recommend that you deploy the ECS instance and the Lindorm instance in the same VPC as the TSDB instance that you want to migrate to ensure the communication between the instances.

  • If you migrate data from a TSDB instance to LindormTSDB over the Internet, make sure that the public endpoints of the Lindorm and TSDB instances are enabled, and the IP address of your client is added to the whitelists of the Lindorm and TSDB instances. For more information, see Configure whitelists.

  • During the migration process, data is read from the TSDB instance and is written to LindormTSDB. Therefore, check whether your business is affected during the migration from the following dimensions before you migrate data:

    • The specification of the TSDB instance

    • The specifications of the environment (such as an ECS instance) on which applications are deployed

    • The number of time series in the TSDB instance

    • The total size of data that you want to migrate

    • The average frequency at which data in each time series are reported

    • The time range of the data that you want to migrate

    • The interval at which each migration task is split

    Note

    For more information about performance evaluation, see Performance testing.

  • Data written by using the multi-value data model cannot be queried by using SQL statements. To use SQL statements to query the migrated data, create a time series table before you migrate data to LindormTSDB.

  • By default, the timestamps used in LindormTSDB are 13 bits in length, which indicate time values in milliseconds. The timestamps used in TSDB are 10 bits in length, which indicate time values in seconds. After data is migrated from the TSDB instance to LindormTSDB, the timestamps of the data are converted to 13 bits in length.

  • The single-value model is not recommended in LindormTSDB for writing data. Therefore, data that has been written to the TSDB instance by using the single-value model must be queried by using the multi-value data model. The following sample code shows how to query data that has been written to the TSDB instance by using the single-value data model in TSDB and LindormTSDB:

    // // The statement used to query data in TSDB.
    curl -u username:password ts-xxxxx:3242/api/query -XPOST -d '{
        "start": 1657004460,
        "queries": [
            {
                "aggregator": "none",
                "metric": "test_metric"
            }
        ]
    }'
    // The query results in TSDB.
    [
        {
            "aggregateTags": [],
            "dps": {
                "1657004460": 1.0
            },
            "fieldName": "",
            "metric": "test_metric",
            "tags": {
                "tagkey1": "1"
            }
        }
    ]
    
    // The statement used to query data in LindormTSDB.
    curl -u username:password ld-xxxxx:8242/api/mquery -XPOST -d '{
        "start":1657004460,
        "queries": [
            {
                "metric": "test_metric",
                "fields": [
                    {
                        "field": "*",
                        "aggregator": "none"
                    }
                ],
                "aggregator": "none"
            }
        ]
    }'
    // The query results in LindormTSDB.
    [
      {
        "aggregatedTags": [],
        "columns": [
          "timestamp",
          "value"
        ],
        "metric": "test_metric",
        "tags": {
          "tagkey1": "1"
        },
        "values": [
          [
            1657004460000,
            1.0
          ]
        ]
      }
    ]

Configure a data migration task

Configure the parameters described in the following three tables and save the configurations as a JSON file such as job.json.

  • Configure parameters related to the task.

    Parameter

    Required

    Description

    channel

    No

    The number of concurrent tasks that can be performed at the same time. Default value: 1.

    errorLimit

    No

    The number of write errors that are allowed during the migration task. Default value: 0.

  • Configure parameters related to data reading. Specify the values of the parameters based on the specification of the TSDB instance.

    Parameter

    Required

    Description

    sinkDbType

    Yes

    The type of the destination database. Set this parameter to LINDORM-MIGRATION.

    endpoint

    Yes

    The endpoint that is used to connect to the TSDB instance. For more information, see Network connection.

    beginDateTime

    Yes

    The time when the migration task starts.

    endDateTime

    Yes

    The time when the migration task ends.

    splitIntervalMs

    Yes

    The interval at which the migration task is split. The value is calculated based on the total duration of the migration task and the average frequency at which data in each time series is reported. Example: 604800000 (7 days).

    • If data in each time series is reported at a frequency of seconds or less, we recommend that you set the interval to a value shorter than one day.

    • If data in each time series is reported at a frequency of hours, you can set the interval to a larger value based on your requirements.

    selfId

    Yes

    The ID of the custom migration task.

    • If you use multiple concurrent tasks to migrate data, specify the IDs of all tasks in the value of the jobIds parameter.

    • If you use only one task to migrate data, specify the ID of the task in the value of the jobIds parameter.

    jobIds

    Yes

    The IDs of the migration tasks.

    jobName

    Yes

    The name of the migration task. The name of a migration task is the same as the suffix of the task in the task status list. If you use multiple concurrent tasks to migrate data, the names of the migration tasks must be the same.

    oidPath

    Yes

    The path in which all time series that you want to migrate are stored in the TSDB instance.

    oidBatch

    Yes

    The number of time series that are read by each read subtask each time.

    oidCache

    Yes

    Specifies whether to cache the time series migrated by the migration task to the memory. If you want to migrate tens of billions of time series, not all time series can be cached in the memory.

    metrics

    No

    The table that you want to migrate. This parameter does not have the default value. Example: ["METRIC_1","METRIC_2"...].

    Note

    The amount of data that is read each time in a migration task is determined by the splitIntervalMs and oidBatch parameters and the average frequency at which data in each time series is reported. For example, if the value of splitIntervalMs is set to 604800000 and the value of oidBatch is set to 100, and data in each time series is reported on an hourly basis, the number of data records that can be read each time can be calculated by using the following formula: 100 × 604800000/3600000 = 16800.

  • Configure parameters related to data writing.

    Parameter

    Required

    Description

    endpoint

    Yes

    The endpoint used to access LindormTSDB. For more information, see View endpoints.

    batchSize

    Yes

    The maximum number of data points that can be sent to LindormTSDB at a time.

    multiField

    Yes

    Specifies whether the multi-value data model is used to write data. If you use the multi-value data model to write data to LindormTSDB, set this parameter to true.

The following example shows the content contained in the job.json file:

{
    "job": {
        "setting": {
            "speed": {
                "channel": 1
            },
            "errorLimit": {
                "record": 0,
                "percentage": 0.00
            }
        },
        "content": [
            {
                "reader": {
                    "name": "tsdbreader",
                    "parameter": {
                        "sinkDbType": "LINDORM-MIGRATION",
                        "endpoint": "ts-xxxx:3242",
                        "beginDateTime": "2022-5-2 00:00:00",
                        "endDateTime": "2022-7-2 00:00:00",
                        "splitIntervalMs": 86400000,
                        "jobName":"myjob",
                        "selfId":1,
                        "jobIds":[1],
                        "oidPath":"{$myworkplace}/oidfile",
                        "oidBatch":100,
                        "oidCache":true
                    }
                },
                "writer": {
                    "name": "tsdbwriter",
                    "parameter": {
                        "endpoint": "ld-xxxx:8242",
                        "multiField":true,
                        "batchSize":500
                    }
                }
            }
        ]
    }
}
                

Start the data migration task

  1. Download the migration tool for time series data.

  2. Run the following command to decompress the downloaded package of the migration tool:

    tar -zxvf tsdb2lindorm.tar.gz
  3. Run the following command to start the data migration task:

    python datax/bin/datax.py  --jvm="-Xms8G -Xmx8G" job.json > job.result
    Note

    Replace job in the preceding command with the name of the configuration file for the actual migration task.

    After the command is run, check whether error information is recorded in the job.result file. If no error information is returned, the migration task is successful.

  4. (Optional)If the migration task fails, you can execute the following multi-value statement to query the task status list of the TSDB instance:

    curl -u username:password ts-****:3242/api/mquery -XPOST -d '{
        "start": 1,
        "queries": [
            {
                "metric": "internal_datax_jobjobName",
                "fields": [
                    {
                        "field": "*",
                        "aggregator": "none"
                    }
                ]
            }
        ]
    }'
    Note
    • username:password: Replace this value with the account and password that you use to access the TSDB instance. For more information, see Manage accounts.

    • ts-****: Replace this value with the ID of the TSDB instance.

    • jobName: Replace this value with the name of the actual migration task. Example: internal_datax_jobmyjob.

    The following table describes the returned task status list.

    Timestamp (endtime)

    jobId (Tag)

    state(field)

    1651795199999 (2022-05-05 23:59:59.999)

    3

    ok

    1651795199999 (2022-05-05 23:59:59.999)

    2

    ok

    1651795199999 (2022-05-05 23:59:59.999)

    1

    ok

    1651881599999 (2022-05-06 23:59:59.999)

    2

    ok

    To prevent an executed migration task from being executed again, modify the value of beginDateTime in the job.json file before you start the task. In this example, the value of beginDateTime is changed to 2022-05-06 00:00:00.

Performance test

Before you migrate data from a TSDB instance, you must evaluate the performance of the TSDB instance. The following tables show the performance test results of TSDB Basic Edition instance and TSDB Standard Edition instances for reference.

  • Test results of two TSDB Basic Edition II instances each with 4 CPU cores and 8 GB of memory

    Tests

    Amount of data

    Number of processes in a task

    Configurations

    Size of time series files

    Number of data points migrated per second

    Migration duration

    TSDB resource utilization

    1

    • Total number of time series: 30,000

    • Total number of data points: 86,400,000

    1

    • channel:2

    • oidCache:true

    • oidBatch:100

    • splitInterval:6h

    • mem:-Xms6G -Xmx6G

    1.5 MB

    230000

    12 minutes 30 seconds

    CPU utilization: 30%

    2

    • Total number of time series: 6,000,000

    • Total number of data points: 2,592,000,000

    1

    • channel:10

    • oidCache:true

    • oidBatch:100

    • splitInterval:6h

    • mem:-Xms8G -Xmx8G

    292 MB

    200000

    2 hours 55 minutes 30 seconds

    CPU utilization: 70% to 90%

    3

    • Total number of time series: 30,000,000

    • Total number of data points: 4,320,000,000

    1

    • channel:10

    • oidCache:false

    • oidBatch:100

    • splitInterval:6h

    • mem:-Xms28G -Xmx28G

    1.5 GB

    140000

    9 hours

    CPU utilization: 40% to 80%

    4

    • Total number of time series: 30,000,000

    • Total number of data points: 4,320,000,000

    3

    • channel:10

    • oidCache:false

    • oidBatch:100

    • splitInterval:6h

    • mem:-Xms8G -Xmx8G

    1.5 GB

    250000

    5 hours

    CPU utilization: 90%

  • Test results of two TSDB Standard Edition I instances each with 8 CPU cores and 16 GB of memory

    Amount of data

    Number of processes in a task

    Configurations

    Size of time series files

    Number of data points migrated per second

    Migration duration

    TSDB resource utilization

    • Total number of time series: 40,000,000

    • Total number of data points: 5,760,000,000

    3

    • channel:10

    • oidCache:false

    • oidBatch:100

    • splitInterval:6h

    • mem:-Xms8G -Xmx8G

    2 GB

    150000~200000

    9 hours

    CPU utilization: 10% to 20%