All Products
Search
Document Center

E-MapReduce:Manage a workflow

Last Updated:Aug 16, 2023

This topic describes how to create and manage a workflow.

Prerequisites

A project is created. For more information, see Create a project.

Create a workflow

  1. Go to the Project tab.

    1. Log on to the E-MapReduce (EMR) console.

    2. In the left-side navigation pane, choose EMR Studio > Workflow.

    3. Click the Project tab.

    4. On the Project tab, click the name of an existing project.

  2. On the project details page, choose Workflow > Workflow Definition in the left-side navigation pane.

  3. On the Workflow Definition page, click Create Workflow.

  4. On the Create Workflow page, drag HIVECLI to the canvas. In the Current node settings dialog box, configure the parameters and click Confirm.

    In this example, a HIVECLI node is created. For more information, see HIVECLI. For more information about other node types, see Node types.

  5. Optional. Configure dependencies between nodes.

    EMR Workflow allows you to configure custom node dependencies between workflows.

    • Move the pointer over the image..png icon on the right side of a node, and drag the connection line to connect the node to another node.

    • Click a connection line or a node, and click the image..png icon in the upper-right corner of the canvas to delete the node dependency or the node.

  6. Save the workflow.

    1. Click Save in the upper-right corner of the canvas.

    2. In the Basic Information dialog box, configure the parameters that are described in the following table and click Confirm.

      Parameter

      Description

      Workflow Name

      The name of the workflow.

      Description

      The feature description of the workflow.

      Timeout Alert

      By default, Timeout Alert is turned off. If you turn on Timeout Alert, you must specify a timeout period. If the execution time of a node exceeds the timeout period, an alert is triggered.

      Process execute type

      The mode in which the instances of the workflow are run. Valid values:

      • parallel: If multiple workflow instances are generated by the same workflow, the workflow instances are concurrently run.

      • Serial wait: If multiple workflow instances are generated by the same workflow, the workflow instances are run in sequence.

      Global Variables

      A global variable is valid for all nodes of the workflow.

Operations on a workflow

Operation

Description

image..png (Edit)

You can edit only a workflow that is in the Offline state.

image..png (Start)

You can start only a workflow that is in the Online state. However, you cannot edit a workflow that is in the Online state. For more information, see the Run a workflow section of this topic.

image..png (Timing)

You can configure scheduling settings only for a workflow that is in the Online state. The system automatically schedules the workflow based on the scheduling settings. After you configure scheduling settings for the workflow, the scheduled workflow is in the Offline state. To make the scheduled workflow take effect, you must change the state of the scheduled workflow to Online on the Cron manage page. For more information, see the Configure a scheduled workflow section of this topic.

image..png (Online)

If a workflow is in the Offline state, you can change the state of the workflow to Online.

image..png (Offline)

If a workflow is in the Online state, you can change the state of the workflow to Offline. You can edit a workflow that is in the Offline state, but you cannot start the workflow.

image..png (Copy Workflow)

You can generate a new workflow by copying an existing workflow.

image..png (Cron manage)

On the Cron manage page of a scheduled workflow, you can edit or delete the scheduled workflow, or change the state of the scheduled workflow to Offline or Online.

image..png (Delete)

You can delete a workflow. Before you delete a workflow, you must change the state of the workflow to Offline. In a project, you can delete only the workflows that you created, but cannot delete the workflows that were created by other users.

image..png (Tree View)

You can view the types and status of nodes of a workflow in a tree structure.

image..png (Export)

You can export a workflow to your computer. The exported workflow is a JSON file.

image..png (Version Info)

You can view the version information of a workflow.

Run a workflow

Each time a workflow is run, a workflow instance is generated and displayed on the Workflow Instance page.

  1. On the Workflow Definition page, find the workflow that you want to run and click the image..png icon in the Operation column.

  2. Click the image..png icon in the Operation column.

  3. In the dialog box that appears, configure the parameters that are described in the following table and click Confirm.

    Parameter

    Description

    Failure Strategy

    The policy that is used to run other concurrent nodes if a node fails in a workflow.

    • Continue: If a node fails, other nodes run as expected.

    • End: If a node fails, the downstream nodes of the node are terminated.

    Notification Strategy

    The workflow status based on which the system sends a notification about the workflow execution information when the workflow ends. Valid values: None, Success, Failure, and All.

    Workflow Priority

    The priority of the node in the workflow. Default value: MEDIUM. Valid values:

    • HIGHEST

    • HIGH

    • MEDIUM

    • LOW

    • LOWEST

    Execution Cluster

    The cluster that is used to run the workflow. You can select a cluster that is associated on the Cluster Manage page of the Security tab from the drop-down list.

    Alarm Group

    The alert group. You can select an alert group that is created on the Alarm Group Manage page of the Security tab from the drop-down list.

    Complement Data

    Specifies whether to generate retroactive data based on the data backfill settings when the workflow is run within the specified time range.

    If you select Whether it is a complement process?, you must configure the following parameters:

    • Mode of dependent: specifies whether to generate retroactive data for the workflows that depend on the current workflow. Valid values: Close and Open. Default value: Close.

      Retroactive data is generated for the workflows that depend on the current workflow only if the current workflow is in the Online state and scheduling settings are configured for the current workflow.

    • Mode of execution

      • The mode in which retroactive data is generated. Valid values: Serial execution: The system generates retroactive data for each day contained in the specified time range in chronological order, and multiple workflow instances are generated in sequence.

      • Parallel execution: The system generates retroactive data for multiple days contained in the specified time range at the same time, and multiple workflow instances are generated at the same time.

        In this mode, you must configure the Custom Parallelism parameter to specify the maximum number of workflow instances for which the system generates retroactive data at the same time.

        Note

        If you specify parallel as the mode in which a workflow executes nodes when you create the workflow, you must select Parallel execution. If you specify Serial wait as the mode in which a workflow executes nodes when you create the workflow, you must select Serial execution.

    • Scheduling Date: the time range during which the workflow is run.

    Startup Parameter

    A startup parameter and its value. The value is used to define a global variable or overwrite the existing value of the global variable when a new workflow instance is started.

    Whether Dry-Run

    Specifies whether to perform a dry run for the workflow. If you perform a dry run for the workflow, a success log is recorded.

  4. On the project details page, choose Workflow > Workflow Instance in the left-side navigation pane to view the status of the workflow instances.

Import a workflow

  1. On the Workflow Definition page of the project details page, click Import Workflow.

  2. In the Upload dialog box, click Upload. Select a workflow that was exported to your computer. The exported workflow is a JSON file.

  3. Click Confirm.

Configure a scheduled workflow

  1. On the Workflow Definition page, find the workflow that you want to manage and click the image..png icon in the Operation column.

  2. In the dialog box that appears, configure the Start and stop time, Timing, and Execution Cluster parameters, and click Confirm.

    • Start and stop time: the time range within which the workflow is scheduled to run. No scheduled workflow instances are generated if the workflow is not run within the specified time range.

    • Timing: the interval at which the workflow is scheduled to run.

  3. Change the state of the scheduled workflow to Online.

    After you configure scheduling settings for the workflow, the scheduled workflow is in the Offline state. To make the scheduled workflow take effect, you must perform the following operations to change the state of the scheduled workflow to Online:

    1. On the Workflow Definition page, find the workflow and click the image..png icon in the Operation column.

    2. On the Cron manage page, find the scheduled workflow and click the image..png icon in the Operation column.