All Products
Search
Document Center

DataWorks:Create an EMR function

Last Updated:Nov 13, 2024

This topic describes how to create an E-MapReduce (EMR) function.

Prerequisites

  • An Alibaba Cloud EMR cluster is created, and an inbound rule that contains the following content is added to the security group to which the cluster belongs.

    • Action: Allow

    • Protocol type: Custom TCP

    • Port range: 8898/8898

    • Authorization object: 100.104.0.0/16

  • An EMR compute engine instance is associated with your workspace. The EMR folder is displayed only after you associate an EMR compute engine instance with the workspace on the Workspace Management page. For more information, see Create and manage workspaces.

  • The required resources are uploaded.

Procedure

  1. Go to the DataStudio page.

    Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Development and Governance > Data Development. On the page that appears, select the desired workspace from the drop-down list and click Go to Data Development.

  2. Create a workflow. For more information, see Create an auto triggered workflow.

  3. Write Java code in an offline Java environment, compress the code to a JAR package, and then upload the package as a JAR resource to DataWorks. For more information, see Create and use an EMR resource.

  4. Create a function.

    1. Click the workflow in the Business Flow section, right-click EMR, and then choose Create Solution.

    2. In the Create Function dialog box, set the Name, Engine Instance, and Path parameters.

    3. Click Create.

    4. In the Function information section of the configuration tab that appears, set the parameters.

      Function information section

      Parameter

      Description

      Function Type

      The type of the function. Valid values: Mathematical Operation Functions, Aggregate Functions, String Processing Functions, Date Functions, Window Functions, and Other Functions.

      Engine Instance

      The EMR cluster that is associated with the current workspace. By default, you cannot change the engine instance.

      Engine Type

      The type of the compute engine. By default, you cannot change the engine type.

      EMR database

      The database where the EMR cluster resides. Select a database from the drop-down list. To create a database, click New Library. In the New Library dialog box, set the parameters and click OK.

      Function Name

      The name of the function. You can use this name to reference the function in SQL statements. The function name must be globally unique and cannot be changed after the function is created.

      Owner

      This parameter is automatically set.

      Class Name

      Required. The name of the class that implements the function.

      Resource

      Required. The resource to be used in the function. Select a resource from the ones that are created in the current workspace from the drop-down list. To create a resource, click Create Resource. In the Create Resource dialog box, set the parameters and click Create.

      Description

      The description of the function.

      Expression Syntax

      The syntax of the function. Example: test.

      Parameter Description

      The description of the input and output parameters that are supported.

      Return Value

      Optional. The return value. Example: 1.

      Example

      Optional. The example of the function.

  5. Click the Save icon in the top toolbar.

  6. Commit the function.

    1. Click the Submit icon icon in the top toolbar.

      Note

      You must select a resource group for scheduling when you commit the EMR function. We recommend that you use an exclusive resource group for scheduling. If no exclusive resource groups for scheduling are available, you can purchase and configure one. For more information, see Create and use an exclusive resource group for scheduling.

    2. In the Commit Node dialog box, enter your comments in the Change description field.

    3. Click OK.

  7. Commit the UDF.

    1. Click the Submit icon in the top toolbar.

    2. In the Commit Node dialog box, enter your comments in the Change description field.

    3. Click OK.