This topic describes how to use E-MapReduce (EMR) Workflow. In this topic, a HIVECLI node is used.
Prerequisites
Authorization is complete for EMR Workflow. For more information, see Assign a RAM role to EMR Workflow.
A cluster is created on the EMR on ECS page. For more information, see Create a cluster.
The created cluster is an EMR data lake cluster, a Hadoop cluster, or a custom cluster.
Procedure
Step 1: Associate an EMR cluster
Log on to the EMR console.
In the left-side navigation pane, choose EMR Studio > Workflow.
On the page that appears, click the Security tab.
On the Cluster Manage page, click Bind Cluster.
In the Bind Cluster dialog box, configure the Cluster Type, Cluster ID, and vSwitch ID parameters and click Confirm.
You can refresh the Cluster Manage page to view the association progress. If Associated is displayed in the State column, the cluster is associated.
NoteThe association process takes about 5 to 10 minutes. Wait until the association is complete.
Step 2: Create a project
Click the Project tab.
On the Project tab, click Create Project.
In the Create Project dialog box, specify a name for the project and click Confirm.
In this example, the project is named project_test.
Step 3: Edit a workflow
On the Project tab, click project_test.
On the project details page, choose
in the left-side navigation pane.On the Workflow Definition page, click Create Workflow.
On the Create Workflow page, drag the HIVECLI node to the canvas.
In this example, a HIVECLI node is used. For more information about HIVECLI, see Node types.
In the Current node settings dialog box, configure the Node Name and Script parameters and click Confirm.
The following table describes the settings of the Node Name and Scrip parameters. Specify the default values for other parameters. For more information, see HIVECLI.
Parameter
Example
Node Name
hivecli
Script
create table if not exists mytable(a string, b int); insert into mytable values ('abc', 1), ('def', 2); select a, sum(b) from mytable group by a;
Save the workflow.
Click Save in the upper-right corner of the canvas.
In the Basic Information dialog box, configure the Workflow Name parameter and click Confirm.
In this example, the Workflow Name parameter is set to workflow_test.
Step 4: Run the workflow
On the Workflow Definition page, find the workflow_test workflow and click the icon in the Operation column.
Click the icon.
In the Please set the parameters before starting dialog box, select the cluster that is associated in Step 1 from the Execution Cluster drop-down list and click Confirm.
Step 5: View the logs of a task instance
On the project details page, choose Workflow > Workflow Instance in the left-side navigation pane.
On the project details page, choose Task > Task Instance in the left-side navigation pane.
On the Task Instance page, find the task instance whose logs you want to view and click the icon in the Operation column to view the run logs of the task.
Step 6: (Optional) Change the state of a workflow to Offline
On the Workflow Definition page, find the workflow that you want to manage and click the icon in the Operation column.
References
For more information about node types, see Node types.
For more information about how to manage a workflow, see Manage a workflow.