DataWorks provides Shell nodes that you can use to run Python scripts. This topic describes how to use a common Shell node or an E-MapReduce (EMR) Shell node to run Python 2 or Python 3 scripts.
Background information
DataWorks allows you to upload Python scripts as resources. You can specify the execution path of Python 3 or Python 2 scripts in a common Shell node or an EMR Shell node to reference uploaded resources for Python script running.
Prerequisites
For information about the prerequisites of using an EMR Shell node, see Create an EMR Shell node.
For information about the prerequisites of using a common Shell node, see Create a Shell node.
A third-party package is installed based on the resource group that you use. The third-party package must be referenced when you run Python scripts on a DataWorks resource group.
If you use a serverless resource group (recommended), you can use the image management feature to install the third-party package. For more information, see Manage images.
If you use an exclusive resource group for scheduling, you can use the O&M Assistant feature to install the third-party package. For more information, see Use the O&M Assistant feature.
NoteThe third-party package that you want to install must support
Python 2
andPython 3
.
Limits
For information about the limits imposed when you use an EMR Shell node, see Create an EMR Shell node.
For information about the limits imposed when you use a common Shell node, see Create a Shell node.
Use a Shell node to run Python scripts
DataWorks allows you to use a common Shell node or an EMR Shell node to run Python scripts by referencing resources. The path to access Python scripts varies based on the Python version.
Python 2:
python xx.py
Python 3:
/home/tops/bin/python3 xx.py
The following sections describe how to use the two types of access paths in detail. You can select a method to run Python scripts based on your business requirements.
Use a common Shell node to run Python scripts
Create a resource.
Go to the DataStudio page.
Log on to the DataWorks console. In the top navigation bar, select a desired region. In the left-side navigation pane, choose . On the page that appears, select a desired workspace from the drop-down list and click Go to Data Development.
Create a MaxCompute Python resource.
On the DataStudio page, find the desired workflow, right-click the workflow name, and then choose Create Resource > MaxCompute > Python. In the Create Resource dialog box, set the Name parameter to
mc.py
and click Create.Notemc.py
is a sample resource name. You can specify the resource name based on your business requirements.Edit the MaxCompute Python resource.
On the configuration tab of the MaxCompute Python resource, develop node code. Sample code:
Python 3
print('This is a test text')
Python 2
print "This is a test text"
Separately click the and icons in the top toolbar of the configuration tab of the resource to save and commit the resource.
Reference the resource.
Create a common Shell node.
On the DataStudio page, find the desired workflow, right-click the workflow name, and then choose Create Node > General > Shell. In the Create Node dialog box, configure the Name parameter and click Confirm.
Reference the resource.
On the configuration tab of the common Shell node, find the
mc.py
resource that you want to reference under Resource in the MaxCompute folder, right-click the resource name, and then select Insert Resource Path.If the information that is shown in the following figure appears on the configuration tab of the common Shell node, the resource is referenced by the common Shell node.
Verify the result.
Use Python 3 to run the referenced resource in the common Shell node
Configure the common Shell node.
Add the following Python 3 command execution path to the configuration tab of the common Shell node:
##@resource_reference{"mc.py"} /home/tops/bin/python3 mc.py
Click the icon. In the Warning message, click Continue to Run. In the Runtime Parameters dialog box, select a resource group, specify a custom image, and then click OK. The information that is shown in the following figure is returned.
Use Python 2 to run the referenced resource in the common Shell node
Configure the common Shell node.
Add the following Python 2 command execution path to the configuration tab of the common Shell node:
##@resource_reference{"mc.py"} python mc.py
Click the icon. In the Warning message, click Continue to Run. In the Runtime Parameters dialog box, select a resource group, specify a custom image, and then click OK. The information that is shown in the following figure is returned.
Use an EMR Shell node to run Python scripts
Create a resource.
Go to the DataStudio page.
Log on to the DataWorks console. In the top navigation bar, select a desired region. In the left-side navigation pane, choose . On the page that appears, select a desired workspace from the drop-down list and click Go to Data Development.
Create an EMR file resource.
On the DataStudio page, find the desired workflow, right-click the workflow name, and then choose Create Resource > EMR > EMR File. In the Create Resource dialog box, select Local for the File Source parameter and click Upload to upload the
emr.py
script. Then, click Create. Sample script content:Python 3
print('This is a test text')
Python 2
print "This is a test text"
Noteemr.py
is a sample resource name. You can specify the resource name based on your business requirements.Click the icon in the top toolbar of the configuration tab of the resource to commit the resource.
Reference the resource.
Create an EMR Shell node.
On the DataStudio page, find the desired workflow, right-click the workflow name, and then choose Create Node > EMR > EMR Shell. In the Create Node dialog box, configure the Name parameter and click Confirm.
Reference the EMR file resource.
Find the
emr.py
resource that you want to reference under Resource in the EMR folder, right-click the resource name, and then select Insert Resource Path.If the information that is shown in the following figure appears on the configuration tab of the EMR Shell node, the resource is referenced by the EMR Shell node.
Verify the result.
Use Python 3 to run the referenced resource in the EMR Shell node
Configure the EMR Shell node.
Add the Python 3 command execution path
/home/tops/bin/python3
to the configuration tab of the EMR Shell node.##@resource_reference{"emr.py"} /home/tops/bin/python3 emr.py
Click the icon. In the Parameters dialog box, select a resource group, specify a custom image, and then click Run. The information that is shown in the following figure is returned.
Use Python 2 to run the referenced resource in the EMR Shell node
Configure the EMR Shell node.
Add the Python 2 command execution path
python
to the configuration tab of the EMR Shell node.##@resource_reference{"emr.py"} python emr.py
Click the icon. In the Parameters dialog box, select a resource group, specify a custom image, and then click Run. The information that is shown in the following figure is returned.