This topic describes how to migrate the historical data that was generated six months ago from a source table to a destination table by using the Single Instance SQL node provided by the task orchestration feature of Data Management (DMS). After the migration, the data is deleted from the source table and archived to the destination table.
Prerequisites
The instance in which you want to use the task orchestration feature is registered with DMS. For more information about how to register an ApsaraDB instance, see Register an Alibaba Cloud database instance.
The status of the instance is normal. To view the status of the instance, perform the following operations: Go to the Home page of the DMS console. In the left-side navigation pane, move the pointer over the instance that you want to query. Then, view the instance status from the tip that appears.
The following types of databases are supported:
Usage notes
If the control mode of the instance is Flexible Management or Stable Change, make sure that you are the owner of the task flow that you want to edit. Otherwise, the task flow may fail after you edit the node configurations of the task flow. If the control mode of the instance is Security Collaboration, this requirement is unnecessary. For more information about how to specify the owner of a task flow, see Overview.
Preparations
In this example, a source table named test
is used. You can execute the following statements to create the test table and insert data into the test table:
-- Create a table.
CREATE TABLE test (
`id` bigint(20) UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT COMMENT 'Primary key',
`gmt_create` DATETIME NOT NULL COMMENT 'Creation time',
`gmt_modified` DATETIME NOT NULL COMMENT 'Modification time',
`content` TEXT COMMENT 'Test data'
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='Table for a task orchestration test';
-- Insert data into the table.
INSERT INTO test( `gmt_create`, `gmt_modified`, `content`) VALUES ('2020-01-01 01:00:00', '2020-01-01 01:00:00', 'value1');
INSERT INTO test( `gmt_create`, `gmt_modified`, `content`) VALUES ('2020-02-01 01:00:00', '2020-02-01 01:00:00', 'value2');
INSERT INTO test( `gmt_create`, `gmt_modified`, `content`) VALUES ('2020-03-01 01:00:00', '2020-03-01 01:00:00', 'value3');
INSERT INTO test( `gmt_create`, `gmt_modified`, `content`) VALUES ('2020-04-01 01:00:00', '2020-04-01 01:00:00', 'value4');
INSERT INTO test( `gmt_create`, `gmt_modified`, `content`) VALUES ('2020-05-01 01:00:00', '2020-05-01 01:00:00', 'value5');
INSERT INTO test( `gmt_create`, `gmt_modified`, `content`) VALUES ('2020-06-01 01:00:00', '2020-06-01 01:00:00', 'value6');
INSERT INTO test( `gmt_create`, `gmt_modified`, `content`) VALUES ('2020-07-01 01:00:00', '2020-07-01 01:00:00', 'value7');
INSERT INTO test( `gmt_create`, `gmt_modified`, `content`) VALUES ('2020-08-01 01:00:00', '2020-08-01 01:00:00', 'value8');
INSERT INTO test( `gmt_create`, `gmt_modified`, `content`) VALUES ('2020-09-01 01:00:00', '2020-09-01 01:00:00', 'value9');
INSERT INTO test( `gmt_create`, `gmt_modified`, `content`) VALUES ('2020-10-01 01:00:00', '2020-10-01 01:00:00', 'value10');
You can also create the test table on the SQLConsole tab of DMS or by submitting a data change ticket. For more information, see the "Create a table" section of the Manage a database on the SQLConsole tab topic and the Normal data modify topic.
Procedure
Step 1: Create a task flow
- Log on to the DMS console V5.0.
In the top navigation bar, choose DTS > Data Development > Task Orchestration.
NoteIf you are using the DMS console of the simple mode, click the icon in the upper-left corner and choose All Functions > DTS > Data Development > Task Orchestration.
On the page that appears, click New Task Flow.
In the New Task Flow dialog box, specify the Task Flow Name and Description parameters. Then, click OK.
Step 2: Configure the variables of the task flow
The variables of the task flow are time variables. You can use the variables in all nodes in the format of ${Variable name}
. This example describes how to configure the time variables as filter conditions in SQL statements to periodically migrate the historical data that was generated six months ago in the test table.
In this example, the yearmonth6_name
and yearmonth6_date
variables are created.
In the lower part of the page, click the Variable configuration tab. In the left-side pane, click Task Flow Variable.
Create the
yearmonth6_name
variable. Set the Time Format parameter toyyyy-MM
. Specify-, 6, and Month
in the fields below the Time Format parameter.This specifies the month that is six months previous to the current month. The value is accurate to the month. If the current date is June 27, 2023, the value of the variable is 2022-12.
Create the
yearmonth6_date
variable. Set the Time Format parameter toyyyy-MM-01
. Specify-, 6, and Month
in the fields below the Time Format parameter.This specifies the first day of the month that is six months previous to the current month. The value is accurate to the day. If the current date is June 27, 2023, the value of the variable is 2022-12-01.
For information about the rules and usage of variables, see Variables.
Step 3: Create and configure nodes in the task flow
Create a node to migrate data
From the left-side node type list of the Task Type section on the Task Orchestration page, drag the Single Instance SQL node to the blank area on the canvas.
Right-click the node. In the shortcut menu that appears, click Rename to rename the node to Migrate Data.
Double-click the node to go to the page to write SQL statements.
Search for the desired database in the database search box and enter the following SQL statements in the SQL editor:
CREATE TABLE IF NOT EXISTS `test_${yearmonth6_name}` LIKE test; INSERT INTO `test_${yearmonth6_name}` SELECT * FROM `test` WHERE gmt_create < '${yearmonth6_date}';
The preceding statements are used to create a table named test_${yearmonth6_name} for archiving historical data in the current database and migrating historical data that meets the date requirements from the test table to the test_${yearmonth6_name} table.
Click Preview to check the validity of the SQL statements.
Create a node to clear historical data
From the left-side node type list of the Task Type section on the Task Orchestration page, drag the Single Instance SQL node to the blank area on the canvas.
Right-click the node. In the shortcut menu that appears, click Rename to rename the node to Clear Historical Data.
Double-click the node to go to the page to write SQL statements.
Search for the desired database in the database search box and enter the following SQL statement in the SQL editor:
WarningTo delete a large amount of data, we recommend that you use the Single Instance SQL node to execute the DELETE statement during off-peak hours. This helps prevent business interruptions caused by long-term table locks.
DELETE FROM `test` WHERE gmt_create < '${yearmonth6_date}';
The preceding statement is used to delete the migrated data from the test table.
Click Preview to check the validity of the SQL statement.
Step 4: Connect the nodes and configure scheduling information
On the canvas, move the pointer over the Migrate Data node, click the hollow circle on the right side, and then drag the connection line to the Clear Historical Data node.
NoteYou can specify the execution order of the nodes by dragging a connection line. In this example, the Migrate data node is run before the Clear Historical Data node.
In the lower part of the page, click the Task Flow Information tab. In the Scheduling Settings section, configure the following parameters.
Parameter
Example
Enable Scheduling
Specifies whether to enable the scheduling feature. Turn on Enable Scheduling.
Scheduling Type
The scheduling type. Select Timed Scheduling/Cyclic scheduling.
Effective Time
The period during which the scheduling properties take effect. The default value is 1970-01-01 - 9999-01-01, which indicates that the scheduling properties permanently take effect.
Scheduling Cycle
The scheduling cycle. Select Month.
Specified Time
The time at which the scheduling properties take effect. Select 1st of each month.
Specific Point in Time
The specific point in time at which the scheduling properties take effect. Set this parameter to 01:00.
Cron Expression
The CRON expression, which is automatically generated based on the specified scheduling cycle and time settings.
NoteIn this example, the task flow is scheduled to run at 01:00 on the first day of each month. You can schedule the task flow to run as needed. For more information, see Overview.
Step 5: Run the task flow and verify the results
Click Try Run in the upper-left corner of the canvas.
If
status SUCCEEDED
appears in the last line of the logs, the test run is successful.If
status FAILED
appears in the last line of the logs, the test run fails.NoteIf the test run fails, view the node on which the failure occurs and the reason for the failure in the logs. Then, modify the configurations of the node and try again.
Go to the SQLConsole tab to check whether historical data in the test table is migrated to the destination table and whether the migrated data is deleted from the test table. For more information, see the "Query data" section of the Manage a database on the SQLConsole tab topic.