This topic describes how to migrate data across MaxCompute projects in the same region by using DataWorks.
Prerequisites
All the steps in the tutorial DataWorks for MaxCompute Workshop are completed. For more information, see the topics in the DataWorks for MaxCompute Workshop directory.
Background information
This topic uses the WorkShop2023 workspace that is created in the tutorial DataWorks for MaxCompute Workshop as the source DataWorks workspace. The workspace is associated with the source MaxCompute project. You need to create a destination DataWorks workspace that is associated with the MaxCompute project. Then, you can migrate tables, resources, configurations, and data across the projects by using DataWorks.
Procedure
Create a destination workspace.
Log on to the DataWorks console, create a workspace, add a MaxCompute data source, and associate the MaxCompute data source with DataStudio. For more information, see Create a workspace, Add a MaxCompute data source, and Preparations before data development: Associate a data source or a cluster with DataStudio.
NoteThe WorkShop2023 workspace is in standard mode. In this example, a destination workspace named clone_test_doc in standard mode is created in DataWorks.
Clone node configurations and resources across workspaces.
You can use the cross-workspace cloning feature of DataWorks to clone the node configurations and resources from the WorkShop2023 workspace to the clone_test_doc workspace. For more information, see Clone nodes across workspaces.
NoteThe cross-workspace cloning feature cannot clone table schemas or data.
The cross-workspace cloning feature cannot clone combined nodes. If the destination workspace needs to use the combined nodes that exist in the source workspace, you need to manually create the combined nodes in the destination workspace.
Go to the DataStudio page for the WorkShop2023 workspace and click Cross-project cloning in the top navigation bar. The Cross-project cloning page appears.
Set Target Workspace to clone_test_doc and Workflow to Workshop. Select all the nodes in the workflow and click Add to List. Click To-Be-Cloned Node List in the upper-right corner.
In the pane that appears, click Clone All. In the dialog box that appears, click Clone. The selected nodes are cloned to the clone_test_doc workspace.
Go to the destination workspace and check whether the nodes are cloned.
Create tables.
The cross-workspace cloning feature cannot clone table schemas. Therefore, you need to manually create required tables in the destination workspace.
For non-partitioned tables, we recommend that you use the following SQL statement to synchronize the table schema from the source workspace:
create table table_name as select * from Source MaxCompute project.Table name;
For partitioned tables, we recommend that you use the following SQL statement to synchronize the table schema from the source workspace:
create table table_name partitioned by (Partition key column string);
After you create tables, commit the tables to the production environment. For more information about table creation, see Create and manage MaxCompute tables.
Synchronize data.
The cross-workspace cloning feature cannot clone data from the source workspace to the destination workspace. You need to manually synchronize the required data to the destination workspace. To synchronize the data of the ods_user_info_d table from the source workspace to the destination workspace, perform the following steps:
Add a data source.
Go to the Data Integration page and click Data Source in the left-side navigation pane.
On the Data Sources page, click Add Data Source. In the Add Data Source dialog box, click MaxCompute.
In the Add MaxCompute Data Source dialog box, configure the parameters such as Data Source Name, Creation Method, and MaxCompute Project Name, and click Complete. For more information, see Add a MaxCompute data source.
Create a batch synchronization task.
For more information, see Configure a batch synchronization task by using the codeless UI.
In the Scheduled Workflow pane of the DataStudio page, find the Workshop workflow that you copied and click Workshop. Right-click Data Integration and choose
to create a batch synchronization task.On the configuration tab of the batch synchronization task, configure the required parameters. In this example, set Data Source Name under Source to WorkShop2023 and that under Destination to odps_source. Use ods_user_info_d as the table from which you want to read data. After the configuration is complete, click the Properties tab in the right-side navigation pane.
Click Use Root Node in the Dependencies section and commit the offline synchronization task.
Backfill data for the offline synchronization task.
On the DataStudio page, click the DataWorks icon in the upper-left corner and choose
.On the page that appears, choose
in the left-side navigation pane.On the page that appears, find the offline synchronization task you created in the task list and click the node name. On the canvas that appears on the right, right-click the offline synchronization task and choose
.In the Backfill Data panel, configure the required parameters. In this example, set Data Timestamp to June 11, 2019 to June 17, 2019 to synchronize data from multiple partitions. Click OK.
NoteYou can configure the data timestamp based on your business requirements.
In the left-side navigation pane, choose
. On the page that appears, check the running status of the data backfill instances that are generated. If Successful appears for a data backfill instance, the related data is synchronized.
Verify the data synchronization result.
On the DataStudio page, choose
to create an ODPS SQL node. On the configuration tab of the ODPS SQL node, execute the following SQL statement to check whether data is synchronized to the destination workspace:select * from ods_user_info_d where dt BETWEEN '20190611' and '20190617';