All Products
Search
Document Center

E-MapReduce:Manage workflows

Last Updated:Jul 16, 2024

A workflow is an orderly process that consists of a series of related jobs with clear dependencies and a specified running order. If you want to run jobs at specific points in time, you can create a workflow and configure scheduling nodes and policies in the workflow. This topic describes how to create and run a workflow.

Prerequisites

  • A workspace is created. For more information, see Manage workspaces.

  • Jobs are developed and published.

Create a workflow

  1. Go to the Workflows page.

    1. Log on to the E-MapReduce (EMR) console.

    2. In the left-side navigation pane, choose EMR Serverless > Spark.

    3. On the Spark page, click the name of the workspace in which you want to create a workflow.

    4. In the left-side navigation pane of the EMR Serverless Spark page, click Workflows.

  2. On the Workflows page, click Create Workflow.

  3. In the Create Workflow panel, configure the parameters described in the following table and click Next Step.

    Parameter

    Description

    Workflow Name

    The workflow name. The name must be unique in the current workspace.

    Resource Queue

    The default resource queue for the workflow.

    Note

    The resource queue specified for workflow nodes can override the default resource queue.

    Other Configurations

    Schedule Type

    The mode in which workflow nodes run in the production environment. Valid values:

    • None (Manual) (default): The workflow can be triggered only manually. It runs once after it is triggered.

    • Scheduler: The workflow runs based on the specified scheduling rules. It can be scheduled by minute, hour, or day.

      If you set Schedule Type to Scheduler, you must also specify Schedule Period and Schedule StartTime.

    Schedule Period

    The interval at which jobs are scheduled to run automatically. It determines how often the code logic of workflow nodes is executed in the production environment. Workflow runs are generated based on Schedule Type and Schedule Period. Jobs are scheduled to run automatically in workflow runs. This parameter is required only if Schedule Type is set to Scheduler.

    Valid values:

    • Day: Jobs run once a day at the specified point in time.

    • Hour: Jobs run once every N hours within the specified period every day.

    • Minute: Jobs run once every N minutes within the specified period every day.

    Schedule StartTime

    The date and time when the workflow is scheduled to run. The default value is the current time. This parameter is required only if Schedule Type is set to Scheduler.

    Important

    If you create a workflow whose Schedule Type is set to Scheduler, you must turn on the Schedule Status switch on the Workflows page. Otherwise, the workflow cannot be triggered at the scheduled time.

    Fail Retry Count

    The number of retries after a workflow node fails to run. By default, no retry is performed.

    Note

    The number of retries specified for a workflow node can override the default value of this parameter.

    Failure Notification

    The email address to which a notification is sent after the workflow fails to run.

    Tags

    The tags that are used to identify the workflow. You can specify the key and value of each tag.

  4. Edit a workflow node.

    1. On the Edit Workflow page, double-click a node or click Add Node below the canvas.

    2. In the Edit Node panel, configure the parameters.

      Parameter

      Description

      Source Path

      The job path that corresponds to the workflow node. The job in the path must be published.

      Node type

      The type of the workflow node. By default, the system infers the type of the workflow node based on the job in the corresponding path.

      Node Name

      The node name. You can specify a custom name. The system automatically enters a node name based on the value of Source.

      Upstream Nodes

      The upstream node of the workflow node. The upstream node must be a node that is created in the current workflow.

      You do not need to specify an upstream node for the first node in the workflow.

      Failure Retries

      The number of retries after the workflow node fails to run. The number of retries specified in the workflow configurations applies. By default, no retry is performed.

      Timeout(seconds

      The timeout period for a single run of the workflow node. By default, no limit is imposed.

      Subscriptions

      The email address to which a notification is sent when the workflow node is in the specified state.

      Tags

      The tags of the workflow node. By default, the following built-in tags are automatically added to each workflow node: workflow_name and task_name.

      Resource Queue

      The resource queue that is used for the workflow node to run. By default, the resource queue configured for the workflow applies. You can specify a resource queue for the workflow node to override the resource queue that you configured when you created the workflow.

      Important

      If you specify a resource queue for the workflow node, the specified resource queue prevails even after you modify the resource queue configured for the workflow.

      Note

      If your task source is SQL development, you also need to configure task parameters. The task parameters inherit the task template by default, and you can adjust the default values by modifying the task template. For parameter details, see Manage templates.

    3. Click Save.

      After the first node is configured, you can click Add Node in the lower part of the page to add more nodes.

  5. Save and publish the workflow.

    1. Click Publish Workflow in the upper-right corner of the page.

    2. In the Publish dialog box, you can enter information in the Publish Message field. Then, click Confirm.

Run a workflow

Each time a workflow runs, a workflow run is generated. You can view workflow runs on the Workflow Runs tab of the workflow details page.

  • Debugging

    When you edit a workflow, you can put the latest version of the workflow into a test run for debugging.

    1. Choose image > Edit Running Parameters.

      image

    2. In the Edit Running Parameters dialog box, select a resource queue used in the development environment and click Save.

    3. Click Debug.

  • Scheduled run

    If you set Schedule Type to Scheduler when you configure the workflow and turn on the Schedule Status switch after the configurations are complete, the workflow can be triggered at the scheduled time.

    image.png

  • Manually run

    On the Workflows page, click the name of the workflow that you want to run. In the upper-right corner of the page that appears, click Manual.

    image

Check the running state

You can check the states of all the runs and nodes of a workflow in the Workflow Runs Status and Workflow Job Runs Status columns of the workflow.image.png

  • Workflow Runs Status

    State

    Description

    Blue

    RUNNING

    Green

    SUCCESS

    Red

    FAILED

    Purple

    PENDING

  • Workflow Job Runs Status

    State

    Description

    Blue

    RUNNING

    Green

    SUCCESS

    Red

    FAILED

    Yellow

    RETRYING

    Purple

    PENDING

References

  • For information about the concepts related to workflows, see Terms.

  • For information about how to view information such as workflow runs and node runs, see Manage workflow runs and nodes.