A workflow is an orderly process that consists of a series of related jobs with clear dependencies and a specified running order. If you want to run jobs at specific points in time, you can create a workflow and configure scheduling nodes and policies in the workflow. This topic describes how to create and run a workflow.
Prerequisites
A workspace is created. For more information, see Manage workspaces.
Jobs are developed and published.
Create a workflow
Go to the Workflows page.
Log on to the E-MapReduce (EMR) console.
In the left-side navigation pane, choose
.On the Spark page, click the name of the workspace in which you want to create a workflow.
In the left-side navigation pane of the EMR Serverless Spark page, click Workflows.
On the Workflows page, click Create Workflow.
In the Create Workflow panel, configure the parameters described in the following table and click Next Step.
Parameter
Description
Workflow Name
The workflow name. The name must be unique in the current workspace.
Resource Queue
The default resource queue for the workflow.
NoteThe resource queue specified for workflow nodes can override the default resource queue.
Other Configurations
Schedule Type
The mode in which workflow nodes run in the production environment. Valid values:
None (Manual) (default): The workflow can be triggered only manually. It runs once after it is triggered.
Scheduler: The workflow runs based on the specified scheduling rules. It can be scheduled by minute, hour, or day.
If you set Schedule Type to Scheduler, you must also specify Schedule Period and Schedule StartTime.
Schedule Period
The interval at which jobs are scheduled to run automatically. It determines how often the code logic of workflow nodes is executed in the production environment. Workflow runs are generated based on Schedule Type and Schedule Period. Jobs are scheduled to run automatically in workflow runs. This parameter is required only if Schedule Type is set to Scheduler.
Valid values:
Day: Jobs run once a day at the specified point in time.
Hour: Jobs run once
every N hours
within the specified period every day.Minute: Jobs run once
every N minutes
within the specified period every day.
Schedule StartTime
The date and time when the workflow is scheduled to run. The default value is the current time. This parameter is required only if Schedule Type is set to Scheduler.
ImportantIf you create a workflow whose Schedule Type is set to Scheduler, you must turn on the Schedule Status switch on the Workflows page. Otherwise, the workflow cannot be triggered at the scheduled time.
Fail Retry Count
The number of retries after a workflow node fails to run. By default, no retry is performed.
NoteThe number of retries specified for a workflow node can override the default value of this parameter.
Failure Notification
The email address to which a notification is sent after the workflow fails to run.
Tags
The tags that are used to identify the workflow. You can specify the key and value of each tag.
Edit a workflow node.
On the Edit Workflow page, double-click a node or click Add Node below the canvas.
In the Edit Node panel, configure the parameters.
Parameter
Description
Source Path
The job path that corresponds to the workflow node. The job in the path must be published.
Node type
The type of the workflow node. By default, the system infers the type of the workflow node based on the job in the corresponding path.
Node Name
The node name. You can specify a custom name. The system automatically enters a node name based on the value of Source.
Upstream Nodes
The upstream node of the workflow node. The upstream node must be a node that is created in the current workflow.
You do not need to specify an upstream node for the first node in the workflow.
Failure Retries
The number of retries after the workflow node fails to run. The number of retries specified in the workflow configurations applies. By default, no retry is performed.
Timeout(seconds
The timeout period for a single run of the workflow node. By default, no limit is imposed.
Subscriptions
The email address to which a notification is sent when the workflow node is in the specified state.
Tags
The tags of the workflow node. By default, the following built-in tags are automatically added to each workflow node: workflow_name and task_name.
Resource Queue
The resource queue that is used for the workflow node to run. By default, the resource queue configured for the workflow applies. You can specify a resource queue for the workflow node to override the resource queue that you configured when you created the workflow.
ImportantIf you specify a resource queue for the workflow node, the specified resource queue prevails even after you modify the resource queue configured for the workflow.
NoteIf your task source is SQL development, you also need to configure task parameters. The task parameters inherit the task template by default, and you can adjust the default values by modifying the task template. For parameter details, see Manage templates.
Click Save.
After the first node is configured, you can click Add Node in the lower part of the page to add more nodes.
Save and publish the workflow.
Click Publish Workflow in the upper-right corner of the page.
In the Publish dialog box, you can enter information in the Publish Message field. Then, click Confirm.
Run a workflow
Each time a workflow runs, a workflow run is generated. You can view workflow runs on the Workflow Runs tab of the workflow details page.
Debugging
When you edit a workflow, you can put the latest version of the workflow into a test run for debugging.
Choose
.In the Edit Running Parameters dialog box, select a resource queue used in the development environment and click Save.
Click Debug.
Scheduled run
If you set Schedule Type to Scheduler when you configure the workflow and turn on the Schedule Status switch after the configurations are complete, the workflow can be triggered at the scheduled time.
Manually run
On the Workflows page, click the name of the workflow that you want to run. In the upper-right corner of the page that appears, click Manual.
Check the running state
You can check the states of all the runs and nodes of a workflow in the Workflow Runs Status and Workflow Job Runs Status columns of the workflow.
Workflow Runs Status
State
Description
Blue
RUNNING
Green
SUCCESS
Red
FAILED
Purple
PENDING
Workflow Job Runs Status
State
Description
Blue
RUNNING
Green
SUCCESS
Red
FAILED
Yellow
RETRYING
Purple
PENDING
References
For information about the concepts related to workflows, see Terms.
For information about how to view information such as workflow runs and node runs, see Manage workflow runs and nodes.