All Products
Search
Document Center

DataWorks:Create a Shell node

Last Updated:Nov 13, 2024

Shell nodes support the standard shell syntax but not the interactive syntax.

Limits

  • Shell nodes support the standard shell syntax but not the interactive syntax.

  • Tasks on Shell nodes can be run on serverless resource groups or old-version exclusive resource groups for scheduling. We recommend that you run tasks on serverless resource groups. For more information about how to purchase a serverless resource group, see Create and use a serverless resource group.

  • A Shell node that is run on a serverless resource group may need to access a data source for which a whitelist is configured. In this case, you must add the required elastic IP address (EIP) or CIDR block to the whitelist of the data source.

  • Do not start a large number of subprocesses in a Shell node. If you start a large number of subprocesses in a Shell node that is run on an exclusive resource group for scheduling, other nodes that are run on the resource group may be affected because DataWorks does not impose a limit on the resource usage for running Shell nodes.

Note

If you want to use a specific development environment to develop a task, you can create a custom image in the DataWorks console. For more information, see Manage images.

Prerequisites

A workflow is created. Development operations in different types of compute engines are performed based on workflows in DataStudio. Therefore, before you create a node, you must create a workflow. For more information, see Create a workflow.

Create a common Shell node

  1. Go to the DataStudio page.

    Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Development and Governance > Data Development. On the page that appears, select the desired workspace from the drop-down list and click Go to Data Development.

  2. Move the pointer over the 新建 icon and choose Create Node > General > Shell. In the Create Node dialog box, configure the Name and Path parameters.

  3. Click Confirm to create the node.

Enable a Shell node to use resources

Before a node can use a resource in DataWorks, you must upload the resource to DataWorks and reference the resource in the runtime environment of the node. This section describes the procedure.

Upload a resource

DataWorks allows you to create a resource or upload an existing resource. You can select a method based on the GUIs of each type of resource.

You can create MaxCompute and EMR resources in the DataWorks console. For more information, see Create and use MaxCompute resources and Create and use an EMR resource.

Note

Resources must be committed before the resources can be referenced in a node. If nodes in the production environment need to use this resource, you also need to deploy the resource to the production environment. For more information, see Deploy nodes.

Reference the resource in the node

To enable the node to use the resource, you must reference the resource in the node. After the resource is referenced, the @resource_reference{"Resource name"} comment is displayed in the upper part of the node code.

Procedure:

  1. Open the Shell node that you create and go to the node editing page.

  2. In the Scheduled Workflow pane of the DataStudio page, find the resource that you uploaded.

  3. Right-click the resource and select Insert Resource Path to reference the resource in the current node.

    On the node editing page, you can write code to run the resource.

使用资源

Scheduling parameters used by Shell nodes

You are not allowed to customize variable names for common Shell nodes. The variables must be named based on their ordinal numbers, such as $1, $2, and $3. If the number of parameters reaches or exceeds 10, use ${Number} to declare the excess variables. For example, use ${10} to declare the tenth variable. For more information about how to configure and use scheduling parameters, see Configure and use scheduling parameters. For more information about the methods to assign values to scheduling parameters, see Supported formats of scheduling parameters.

配置参数In the preceding figure, custom parameters are assigned to the custom variables $1, $2, and $3 in the Parameters section, and the custom variables are referenced in the code editor. Examples:

  • $1: Specify $bizdate as $1. This variable is used to obtain the data timestamp. $bizdate is a built-in parameter.

  • $2: Specify ${yyyymmdd} as $2. This variable is used to obtain the data timestamp.

  • $3: Specify $[yyyymmdd] as $3. This variable is used to obtain the data timestamp.

Note

For common Shell nodes, you can assign custom parameters to custom variables only by using expressions. The parameters must be separated by a space, and the parameter values must match the order in which the parameters are defined. For example, the first parameter $bizdate that you enter in the Parameters section is assigned to the first variable $1.

How do I determine whether a custom Shell script is successfully run?

The exit code of the custom Shell script determines whether the script is successfully run. Exit codes:

  • 0: indicates that the custom Shell script is successfully run.

  • -1: indicates that the custom Shell script is terminated.

  • 2: indicates that the custom Shell script needs to be automatically rerun.

  • Other exit codes: indicate that the custom Shell script fails to run.

For a Shell script, if the first command is an invalid command, an error is returned. If a valid command is run after the invalid command, the Shell script can be successfully run. Example:

#! /bin/bash
curl http://xxxxx/asdasd
echo "nihao"

The Shell script is successfully run because the script exited as expected.

If you change the previous script to the following script, a different result is returned. Example:

#!  /bin/bash
curl http://xxxxx/asdasd
if [[ $?  == 0 ]];then  
    echo "curl success"
else  
    echo "failed"  
    exit 1
fi
echo "nihao"

In this case, the script fails to run.

Use a Shell script to access OSSUtils

You can use the following default installation path if you want to install OSSUtils:

  • /home/admin/usertools/tools/ossutil64.

  • For information about the common commands in OSSUtils, see Common commands.

You can configure the username and password that are used to access Object Storage Service (OSS) in a configuration file based on your business requirements. Then, you can use O&M Assistant to upload the configuration file to the /home/admin/usertools/tools/myconfig directory.

[Credentials]
        language = CH
        endpoint = oss.aliyuncs.com
        accessKeyID = your_accesskey_id
        accessKeySecret = your_accesskey_secret
        stsToken = your_sts_token
        outputDir = your_output_dir
        ramRoleArn = your_ram_role_arn

Command syntax:

#!  /bin/bash
/home/admin/usertools/tools/ossutil64 --config-file  /home/admin/usertools/tools/myconfig  cp oss://bucket/object object
if [[ $?  == 0 ]];then
    echo "access oss success"
else
    echo "failed"
    exit 1
fi
echo "finished"

Subsequent operations

If the Shell node needs to be periodically scheduled, you need to define the scheduling properties for the Shell node and deploy the node to the production environment. For information about how to configure scheduling properties for nodes, see Step 6: Configure scheduling properties for the batch synchronization task. For information about how to deploy nodes to the production environment, see Deploy nodes.

References

For information about how to run Python scripts on Shell nodes by using Python 2 or Python 3 commands, see Use a Shell node to run Python scripts.