All Products
Search
Document Center

DataWorks:Create and use a Data Lake Analytics node

Last Updated:Sep 20, 2024

DataWorks allows you to create a Data Lake Analytics node to build an online extract, transform, and load (ETL) process.

Background information

Data Lake Analytics nodes are used to connect to Alibaba Cloud Data Lake Analytics (DLA). For more information about DLA, see What is DLA?

Note

Tasks on Data Lake Analytics nodes can be run on serverless resource groups or old-version exclusive resource groups for scheduling. We recommend that you run tasks on serverless resource groups. For more information about how to purchase and use a serverless resource group, see Create and use a serverless resource group.

Procedure

  1. Go to the DataStudio page.

    Log on to the DataWorks console. In the top navigation bar, select the desired region. Then, choose Data Modeling and Development > DataStudio in the left-side navigation pane. On the page that appears, select the desired workspace from the drop-down list and click Go to DataStudio.

  2. On the DataStudio page, move the pointer over the 新建 icon and choose Create Node > Custom > Data Lake Analytics.

    Alternatively, you can find the desired workflow, click the workflow name, right-click UserDefined, and then choose Create Node > Data Lake Analytics.

  3. In the Create Node dialog box, configure the Name and Path parameters.
    Note The node name must be 1 to 128 characters in length and can contain letters, digits, underscores (_), and periods (.).
  4. Click Confirm.
  5. Configure the Data Lake Analytics node.

    1. Select a data source.

      Select a data source for the node. If you cannot find the data source that you want to use from the drop-down list, click Add Data Source to the right of Select Data Source and add a data source on the Data Sources page. For more information, see Add a Data Lake Analytics data source.

    2. Write SQL statements for the node.

      After you select a data source, write SQL statements based on the syntax that is supported by DLA. You can write data manipulation language (DML) or data definition language (DDL) statements.

    3. Click the 保存 icon in the top toolbar.

    4. Click the 运行 icon in the top toolbar to execute SQL statements.

    If you want to use another resource group to test the Data Lake Analytics node on the DataStudio page, click the 高级运行 icon in the top toolbar and select a serverless resource group that you want to use.

    Note

    A serverless resource group is required to access a data source that is deployed in a virtual private cloud (VPC). In this case, you must select a serverless resource group that is connected to the data source.

  6. On the configuration tab of the node, click Properties in the right-side navigation pane. On the Properties tab, configure scheduling properties for the node. For more information, see Configure basic properties.

    You must select a serverless resource group that is connected to the Data Lake Analytics node to periodically schedule tasks on the Data Lake Analytics node.

  7. Save and commit the node.

    Note

    You must configure the Rerun and Parent Nodes parameters on the Properties tab before you commit the node.

    1. Click the 保存 icon in the top toolbar to save the node.

    2. Click the 提交 icon in the top toolbar.

    3. In the Submit dialog box, configure the Change description parameter.

    4. Click Confirm.

    If the workspace that you use is in standard mode, you must click Deploy in the upper-right corner of the configuration tab to deploy the node after you commit it. For more information, see Perform basic O&M operations on auto triggered nodes.

  8. Perform O&M operations on the node. For more information, see Perform basic O&M operations on auto triggered nodes.