DataWorks allows you to create a Data Lake Analytics node to build an online extract, transform, and load (ETL) process.
Background information
Data Lake Analytics nodes are used to connect to Alibaba Cloud Data Lake Analytics (DLA). For more information about DLA, see What is DLA?
Tasks on Data Lake Analytics nodes can be run on serverless resource groups or old-version exclusive resource groups for scheduling. We recommend that you run tasks on serverless resource groups. For more information about how to purchase and use a serverless resource group, see Create and use a serverless resource group.
Procedure
Go to the DataStudio page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. Then, choose in the left-side navigation pane. On the page that appears, select the desired workspace from the drop-down list and click Go to DataStudio.
On the DataStudio page, move the pointer over the icon and choose .
Alternatively, you can find the desired workflow, click the workflow name, right-click UserDefined, and then choose
.- In the Create Node dialog box, configure the Name and Path parameters. Note The node name must be 1 to 128 characters in length and can contain letters, digits, underscores (_), and periods (.).
- Click Confirm.
Configure the Data Lake Analytics node.
Select a data source.
Select a data source for the node. If you cannot find the data source that you want to use from the drop-down list, click Add Data Source to the right of Select Data Source and add a data source on the Data Sources page. For more information, see Add a Data Lake Analytics data source.
Write SQL statements for the node.
After you select a data source, write SQL statements based on the syntax that is supported by DLA. You can write data manipulation language (DML) or data definition language (DDL) statements.
Click the icon in the top toolbar.
Click the icon in the top toolbar to execute SQL statements.
If you want to use another resource group to test the Data Lake Analytics node on the DataStudio page, click the icon in the top toolbar and select a serverless resource group that you want to use.
NoteA serverless resource group is required to access a data source that is deployed in a virtual private cloud (VPC). In this case, you must select a serverless resource group that is connected to the data source.
On the configuration tab of the node, click Properties in the right-side navigation pane. On the Properties tab, configure scheduling properties for the node. For more information, see Configure basic properties.
You must select a serverless resource group that is connected to the Data Lake Analytics node to periodically schedule tasks on the Data Lake Analytics node.
Click the icon in the top toolbar to save the node.
Click the icon in the top toolbar.
In the Submit dialog box, configure the Change description parameter.
Click Confirm.
- Perform O&M operations on the node. For more information, see Perform basic O&M operations on auto triggered nodes.
Save and commit the node.
You must configure the Rerun and Parent Nodes parameters on the Properties tab before you commit the node.
If the workspace that you use is in standard mode, you must click Deploy in the upper-right corner of the configuration tab to deploy the node after you commit it. For more information, see Perform basic O&M operations on auto triggered nodes.