All Products
Search
Document Center

Data Management:Task orchestration

Last Updated:Nov 15, 2024

This topic describes how to migrate the historical data that was generated six months ago from a source table to a destination table by using the Single Instance SQL node provided by the task orchestration feature of Data Management (DMS). After the migration, the data is deleted from the source table and archived to the destination table.

Prerequisites

  • The instance in which you want to use the task orchestration feature is registered with DMS. For more information about how to register an ApsaraDB instance, see Register an Alibaba Cloud database instance.

  • The status of the instance is normal. To view the status of the instance, perform the following operations: Go to the Home page of the DMS console. In the left-side navigation pane, move the pointer over the instance that you want to query. Then, view the instance status from the tip that appears.

  • The following types of databases are supported:

    Supported database types

    • Relational databases

      • MySQL: ApsaraDB RDS for MySQL, PolarDB for MySQL, ApsaraDB MyBase for MySQL, PolarDB for Xscale, and MySQL databases from other sources

      • SQL Server: ApsaraDB RDS for SQL Server, ApsaraDB MyBase for SQL Server, and SQL Server databases from other sources

      • PostgreSQL: ApsaraDB RDS for PostgreSQL, PolarDB for PostgreSQL, ApsaraDB MyBase for PostgreSQL, and PostgreSQL databases from other sources

      • OceanBase: ApsaraDB for OceanBase in MySQL mode, ApsaraDB for OceanBase in Oracle mode, and self-managed OceanBase databases

      • PolarDB for PostgreSQL (Compatible with Oracle)

      • Oracle

      • Dameng (DM)

      • Db2

    • NoSQL database: Lindorm

    • Data warehouses:

      • AnalyticDB for MySQL

      • AnalyticDB for PostgreSQL

      • Data Lake Analytics (DLA)

      • MaxCompute

      • Hologres

    • Object storage: Object Storage Service (OSS)

Usage notes

If the control mode of the instance is Flexible Management or Stable Change, make sure that you are the owner of the task flow that you want to edit. Otherwise, the task flow may fail after you edit the node configurations of the task flow. If the control mode of the instance is Security Collaboration, this requirement is unnecessary. For more information about how to specify the owner of a task flow, see Overview.

Preparations

In this example, a source table named test is used. You can execute the following statements to create the test table and insert data into the test table:

-- Create a table.
CREATE TABLE test (
    `id` bigint(20) UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT COMMENT 'Primary key',
      `gmt_create` DATETIME NOT NULL COMMENT 'Creation time',
      `gmt_modified` DATETIME NOT NULL COMMENT 'Modification time',
    `content` TEXT COMMENT 'Test data'
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='Table for a task orchestration test';

-- Insert data into the table.
INSERT INTO test( `gmt_create`, `gmt_modified`, `content`) VALUES ('2020-01-01 01:00:00', '2020-01-01 01:00:00', 'value1');
INSERT INTO test( `gmt_create`, `gmt_modified`, `content`) VALUES ('2020-02-01 01:00:00', '2020-02-01 01:00:00', 'value2');
INSERT INTO test( `gmt_create`, `gmt_modified`, `content`) VALUES ('2020-03-01 01:00:00', '2020-03-01 01:00:00', 'value3');
INSERT INTO test( `gmt_create`, `gmt_modified`, `content`) VALUES ('2020-04-01 01:00:00', '2020-04-01 01:00:00', 'value4');
INSERT INTO test( `gmt_create`, `gmt_modified`, `content`) VALUES ('2020-05-01 01:00:00', '2020-05-01 01:00:00', 'value5');
INSERT INTO test( `gmt_create`, `gmt_modified`, `content`) VALUES ('2020-06-01 01:00:00', '2020-06-01 01:00:00', 'value6');
INSERT INTO test( `gmt_create`, `gmt_modified`, `content`) VALUES ('2020-07-01 01:00:00', '2020-07-01 01:00:00', 'value7');
INSERT INTO test( `gmt_create`, `gmt_modified`, `content`) VALUES ('2020-08-01 01:00:00', '2020-08-01 01:00:00', 'value8');
INSERT INTO test( `gmt_create`, `gmt_modified`, `content`) VALUES ('2020-09-01 01:00:00', '2020-09-01 01:00:00', 'value9');
INSERT INTO test( `gmt_create`, `gmt_modified`, `content`) VALUES ('2020-10-01 01:00:00', '2020-10-01 01:00:00', 'value10');

You can also create the test table on the SQLConsole tab of DMS or by submitting a data change ticket. For more information, see the "Create a table" section of the Manage a database on the SQLConsole tab topic and the Normal data modify topic.

Procedure

Step 1: Create a task flow

  1. Log on to the DMS console V5.0.
  2. In the top navigation bar, choose DTS > Data Development > Task Orchestration.

    Note

    If you are using the DMS console of the simple mode, click the 2022-10-21_15-25-22.png icon in the upper-left corner and choose All Functions > DTS > Data Development > Task Orchestration.

  3. On the page that appears, click New Task Flow.

  4. In the New Task Flow dialog box, specify the Task Flow Name and Description parameters. Then, click OK.

Step 2: Configure the variables of the task flow

The variables of the task flow are time variables. You can use the variables in all nodes in the format of ${Variable name}. This example describes how to configure the time variables as filter conditions in SQL statements to periodically migrate the historical data that was generated six months ago in the test table.

In this example, the yearmonth6_name and yearmonth6_date variables are created.

In the lower part of the page, click the Variable configuration tab. In the left-side pane, click Task Flow Variable.

  • Create the yearmonth6_name variable. Set the Time Format parameter to yyyy-MM. Specify -, 6, and Month in the fields below the Time Format parameter.

    This specifies the month that is six months previous to the current month. The value is accurate to the month. If the current date is June 27, 2023, the value of the variable is 2022-12.

  • Create the yearmonth6_date variable. Set the Time Format parameter to yyyy-MM-01. Specify -, 6, and Month in the fields below the Time Format parameter.

    This specifies the first day of the month that is six months previous to the current month. The value is accurate to the day. If the current date is June 27, 2023, the value of the variable is 2022-12-01.

For information about the rules and usage of variables, see Variables.

Step 3: Create and configure nodes in the task flow

  • Create a node to migrate data

    1. From the left-side node type list of the Task Type section on the Task Orchestration page, drag the Single Instance SQL node to the blank area on the canvas.

    2. Right-click the node. In the shortcut menu that appears, click Rename to rename the node to Migrate Data.

    3. Double-click the node to go to the page to write SQL statements.

    4. Search for the desired database in the database search box and enter the following SQL statements in the SQL editor:

      CREATE TABLE IF NOT EXISTS `test_${yearmonth6_name}` LIKE test;
      INSERT INTO `test_${yearmonth6_name}`
      SELECT * FROM `test`
      WHERE gmt_create < '${yearmonth6_date}';

      The preceding statements are used to create a table named test_${yearmonth6_name} for archiving historical data in the current database and migrating historical data that meets the date requirements from the test table to the test_${yearmonth6_name} table.

    5. Click Preview to check the validity of the SQL statements.

  • Create a node to clear historical data

    1. From the left-side node type list of the Task Type section on the Task Orchestration page, drag the Single Instance SQL node to the blank area on the canvas.

    2. Right-click the node. In the shortcut menu that appears, click Rename to rename the node to Clear Historical Data.

    3. Double-click the node to go to the page to write SQL statements.

    4. Search for the desired database in the database search box and enter the following SQL statement in the SQL editor:

      Warning

      To delete a large amount of data, we recommend that you use the Single Instance SQL node to execute the DELETE statement during off-peak hours. This helps prevent business interruptions caused by long-term table locks.

      DELETE FROM `test` WHERE gmt_create < '${yearmonth6_date}';

      The preceding statement is used to delete the migrated data from the test table.

    5. Click Preview to check the validity of the SQL statement.

Step 4: Connect the nodes and configure scheduling information

  1. On the canvas, move the pointer over the Migrate Data node, click the hollow circle on the right side, and then drag the connection line to the Clear Historical Data node.

    Note

    You can specify the execution order of the nodes by dragging a connection line. In this example, the Migrate data node is run before the Clear Historical Data node.

  2. In the lower part of the page, click the Task Flow Information tab. In the Scheduling Settings section, configure the following parameters.

    Parameter

    Example

    Enable Scheduling

    Specifies whether to enable the scheduling feature. Turn on Enable Scheduling.

    Scheduling Type

    The scheduling type. Select Timed Scheduling/Cyclic scheduling.

    Effective Time

    The period during which the scheduling properties take effect. The default value is 1970-01-01 - 9999-01-01, which indicates that the scheduling properties permanently take effect.

    Scheduling Cycle

    The scheduling cycle. Select Month.

    Specified Time

    The time at which the scheduling properties take effect. Select 1st of each month.

    Specific Point in Time

    The specific point in time at which the scheduling properties take effect. Set this parameter to 01:00.

    Cron Expression

    The CRON expression, which is automatically generated based on the specified scheduling cycle and time settings.

    Note

    In this example, the task flow is scheduled to run at 01:00 on the first day of each month. You can schedule the task flow to run as needed. For more information, see Overview.

Step 5: Run the task flow and verify the results

  1. Click Try Run in the upper-left corner of the canvas.

    • If status SUCCEEDED appears in the last line of the logs, the test run is successful.

    • If status FAILED appears in the last line of the logs, the test run fails.

      Note

      If the test run fails, view the node on which the failure occurs and the reason for the failure in the logs. Then, modify the configurations of the node and try again.

  2. Go to the SQLConsole tab to check whether historical data in the test table is migrated to the destination table and whether the migrated data is deleted from the test table. For more information, see the "Query data" section of the Manage a database on the SQLConsole tab topic.