This topic describes how to create an E-MapReduce (EMR) function.
Prerequisites
An Alibaba Cloud EMR cluster is created, and an inbound rule that contains the following content is added to the security group to which the cluster belongs.
Action: Allow
Protocol type: Custom TCP
Port range: 8898/8898
Authorization object: 100.104.0.0/16
An EMR compute engine instance is associated with your workspace. The EMR folder is displayed only after you associate an EMR compute engine instance with the workspace on the Workspace Management page. For more information, see Create and manage workspaces.
The required resources are uploaded.
Procedure
Go to the DataStudio page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to Data Development.
Create a workflow. For more information, see Create an auto triggered workflow.
Write Java code in an offline Java environment, compress the code to a JAR package, and then upload the package as a JAR resource to DataWorks. For more information, see Create and use an EMR resource.
Create a function.
Click the workflow in the Business Flow section, right-click EMR, and then choose Create Solution.
In the Create Function dialog box, set the Name, Engine Instance, and Path parameters.
Click Create.
In the Function information section of the configuration tab that appears, set the parameters.
Parameter
Description
Function Type
The type of the function. Valid values: Mathematical Operation Functions, Aggregate Functions, String Processing Functions, Date Functions, Window Functions, and Other Functions.
Engine Instance
The EMR cluster that is associated with the current workspace. By default, you cannot change the engine instance.
Engine Type
The type of the compute engine. By default, you cannot change the engine type.
EMR database
The database where the EMR cluster resides. Select a database from the drop-down list. To create a database, click New Library. In the New Library dialog box, set the parameters and click OK.
Function Name
The name of the function. You can use this name to reference the function in SQL statements. The function name must be globally unique and cannot be changed after the function is created.
Owner
This parameter is automatically set.
Class Name
Required. The name of the class that implements the function.
Resource
Required. The resource to be used in the function. Select a resource from the ones that are created in the current workspace from the drop-down list. To create a resource, click Create Resource. In the Create Resource dialog box, set the parameters and click Create.
Description
The description of the function.
Expression Syntax
The syntax of the function. Example:
test
.Parameter Description
The description of the input and output parameters that are supported.
Return Value
Optional. The return value. Example: 1.
Example
Optional. The example of the function.
Click the icon in the top toolbar.
Commit the function.
Click the icon in the top toolbar.
NoteYou must select a resource group for scheduling when you commit the EMR function. We recommend that you use an exclusive resource group for scheduling. If no exclusive resource groups for scheduling are available, you can purchase and configure one. For more information, see Create and use an exclusive resource group for scheduling.
In the Commit Node dialog box, enter your comments in the Change description field.
Click OK.
Commit the UDF.
Click the icon in the top toolbar.
In the Commit Node dialog box, enter your comments in the Change description field.
Click OK.