A workflow consists of a series of jobs with clarified dependencies and specified running sequence. If you want to run jobs at specific points in time, you can create a workflow, add nodes in the workflow, and then configure scheduling policies in the workflow. This topic describes how to create and run a workflow.
Prerequisites
A workspace is created. For more information, see Manage workspaces.
Jobs are developed and published.
Create a workflow
Go to the Workflows page.
Log on to the E-MapReduce (EMR) console.
In the left-side navigation pane, choose
.On the Spark page, find the desired workspace and click the name of the workspace.
In the left-side navigation pane of the EMR Serverless Spark page, choose Operation Center > Workflows.
On the Workflows tab, click Create Workflow.
In the Create Workflow panel, configure the parameters and click Next. The following table describes the parameters.
Parameter
Description
Name
The name of the workflow. The name must be unique in a workspace.
Resource Queue
The default resource queue for the workflow.
NoteThe resource queue specified for workflow nodes can override the default resource queue.
Other Settings
Scheduling Type
The mode in which the workflow is run in the production environment. Valid values:
None (Manual): The workflow is manually run. This is the default value.
Scheduler: The workflow runs based on the settings of the scheduler. The workflow can be scheduled to run by minute, hour, or day.
If you set the Scheduling Type parameter to Scheduler, you must configure the Scheduling Time and Scheduling Started At parameters.
Scheduling Time
The scheduling cycle of the workflow. This parameter determines the scheduling frequency of the workflow in the production environment. Workflow runs are generated based on the scheduling frequency of a workflow. This parameter is required only if the Scheduling Type parameter is set to Scheduler.
Valid values:
Days: Nodes run once a day at the specified point in time.
Hours: Nodes run once
every N hours
within the specified period every day.Minutes: Nodes run once
every N minutes
within the specified period every day.
Scheduling Started At
The date and time when the workflow is scheduled to run. The default value is the current time. This parameter is required only if the Scheduling Type parameter is set to Scheduler.
ImportantIf you create a workflow whose Scheduling Type is set to Scheduler, you must turn on the Scheduling Status switch for the workflow on the Workflows tab of the Workflows page. Otherwise, the workflow cannot be triggered at the scheduling time.
Retries After Failure
The number of retries after a workflow node fails to run. By default, no retry is performed.
NoteThe number of retries specified for a workflow node can override the value of this parameter.
Failure Notification
The email address to which a notification is sent after the workflow fails to run.
Tags
The tags that are used to identify the workflow. You can specify the key and value of each tag.
Add a node in the workflow.
On the page that appears, click Add Node in the lower part of the canvas.
In the Add Node panel, configure the parameters. The following table describes the parameters.
Parameter
Description
Source File Path
The job path that corresponds to the node. The job in the path must be published.
Node Type
The type of the node. By default, the system infers the type of the node based on the job in the corresponding path.
Node Name
The name of the node. The system automatically enters a node name based on the value of Source File Path. You can also specify a name based on your business requirements.
Upstream Node
The upstream node of the current node. The upstream node must be a node that is created in the current workflow.
You do not need to specify an upstream node for the first node in the workflow.
Number of Retries
The number of retries defined in the workflow is used. By default, no retry is performed.
Timeout (Seconds)
The timeout period for a single run of the node. By default, no limit is imposed.
Subscription
The email address to which a notification is sent when the node is in the specified state.
Tags
The tags of the node. By default, the workflow_name and task_name tags are provided for each node.
Resource Queue
The resource queue that is used to run the node. By default, the resource queue that you specify for the workflow is used. You can configure a resource queue for the node to override the resource queue that you specified for the workflow.
ImportantAfter you specify a resource queue for the workflow node, the specified resource queue prevails even if you modify the resource queue configured for the workflow.
NoteIf you use an SQL job, you can configure the parameters in the Task Configuration section based on your business requirements. By default, the values of the parameters in the Task Configuration section are the same as the values of the parameters that you configure for the job. For more information, see Manage default configurations.
Click Save.
You can continue to click Add Node to add nodes based on your business requirements.
Save and publish the workflow.
In the upper-right corner, click Publish Workflow.
In the Publish dialog box, configure the Remarks parameter and click OK.
Run a workflow
Each time a workflow runs, a workflow run is generated. You can view workflow runs on the Workflow Runs tab of the workflow details page.
Debugging
When you edit a workflow, you can debug the workflow of the latest version.
Find the desired workflow and click Edit in the Actions column. On the page that appears, move pointer over the icon and select Edit Execution Parameters.
In the Edit Execution Parameters dialog box, select a resource queue used in the development environment and click Save.
Click Debug.
Scheduled run
If you set the Scheduling Type parameter to Scheduler when you create the workflow and turn on the switch in the Scheduling Status column after the workflow is created, the workflow is scheduled to run at the specified point in time.
Manual run
On the Workflows tab, click the name of the workflow that you want to run. In the upper-right corner of the page that appears, click Manually Run.
Check the status of workflow runs and workflow nodes
You can check the status of workflow runs in the Workflow Runs Status column and the status of workflow nodes in the Workflow Node Runs Status column.
Status of workflow runs
Status
Description
Blue
Running
Green
Succeeded
Red
Failed
Purple
Pending
Status of workflow nodes
Status
Description
Blue
Running
Green
Succeeded
Red
Failed
Yellow
Retrying
Purple
Pending
References
For more information about workflows, see Terms.
For information about how to view information such as workflow runs and workflow node runs, see Manage workflow runs and node runs.