All Products
Search
Document Center

E-MapReduce:Use EMR Workflow

Last Updated:Aug 15, 2023

This topic describes how to use E-MapReduce (EMR) Workflow. In this topic, a HIVECLI node is used.

Prerequisites

  • Authorization is complete for EMR Workflow. For more information, see Assign a RAM role to EMR Workflow.

  • A cluster is created on the EMR on ECS page. For more information, see Create a cluster.

    The created cluster is an EMR data lake cluster, a Hadoop cluster, or a custom cluster.

Procedure

Step 1: Associate an EMR cluster

  1. Log on to the EMR console.

  2. In the left-side navigation pane, choose EMR Studio > Workflow.

  3. On the page that appears, click the Security tab.

  4. On the Cluster Manage page, click Bind Cluster.

  5. In the Bind Cluster dialog box, configure the Cluster Type, Cluster ID, and vSwitch ID parameters and click Confirm.

    You can refresh the Cluster Manage page to view the association progress. If Associated is displayed in the State column, the cluster is associated.

    Note

    The association process takes about 5 to 10 minutes. Wait until the association is complete.

Step 2: Create a project

  1. Click the Project tab.

  2. On the Project tab, click Create Project.

  3. In the Create Project dialog box, specify a name for the project and click Confirm.

    In this example, the project is named project_test.

Step 3: Edit a workflow

  1. On the Project tab, click project_test.

  2. On the project details page, choose Workflow > Workflow Definition in the left-side navigation pane.

  3. On the Workflow Definition page, click Create Workflow.

  4. On the Create Workflow page, drag the HIVECLI node to the canvas.

    In this example, a HIVECLI node is used. For more information about HIVECLI, see Node types.

  5. In the Current node settings dialog box, configure the Node Name and Script parameters and click Confirm.

    The following table describes the settings of the Node Name and Scrip parameters. Specify the default values for other parameters. For more information, see HIVECLI.

    Parameter

    Example

    Node Name

    hivecli

    Script

    create table if not exists mytable(a string, b int);
    insert into mytable values ('abc', 1), ('def', 2);
    select a, sum(b) from mytable group by a;
  6. Save the workflow.

    1. Click Save in the upper-right corner of the canvas.

    2. In the Basic Information dialog box, configure the Workflow Name parameter and click Confirm.

      In this example, the Workflow Name parameter is set to workflow_test.

Step 4: Run the workflow

  1. On the Workflow Definition page, find the workflow_test workflow and click the image..png icon in the Operation column.

  2. Click the image..png icon.

  3. In the Please set the parameters before starting dialog box, select the cluster that is associated in Step 1 from the Execution Cluster drop-down list and click Confirm.

Step 5: View the logs of a task instance

  1. On the project details page, choose Workflow > Workflow Instance in the left-side navigation pane.

  2. On the project details page, choose Task > Task Instance in the left-side navigation pane.

  3. On the Task Instance page, find the task instance whose logs you want to view and click the image..png icon in the Operation column to view the run logs of the task.

Step 6: (Optional) Change the state of a workflow to Offline

On the Workflow Definition page, find the workflow that you want to manage and click the image..png icon in the Operation column.

References