All Products
Search
Document Center

Function Compute:DataWorks

Last Updated:Nov 13, 2024

DataWorks provides Function Compute nodes that allow you to use custom code to implement different business requirements. Function Compute nodes support periodic scheduling. This facilitates the running of scheduled tasks. In addition, Function Compute nodes can work together with other types of nodes to help build a complete data processing process. This topic describes how to create and use a Function Compute node.

Background information

DataWorks is an end-to-end big data development and governance platform that provides data warehousing, data lake, and data lakehouse solutions based on big data compute engines, such as MaxCompute, Hologres, E-MapReduce (EMR), AnalyticDB, and CDH. The DataStudio service of DataWorks allows you to define the development and scheduling properties of auto triggered nodes. DataStudio works with Operation Center to provide a visualized development interface for nodes of various types of compute engines, such as MaxCompute, Hologres, and E-MapReduce (EMR). You can configure settings on the visualized development interface to perform intelligent code development, multi-engine node orchestration in workflows, and standardized node deployment. This way, you can build offline data warehouses, real-time data warehouses, and ad hoc analysis systems to ensure efficient and stable data production.

DataStudio of DataWorks can invoke event functions of Function Compute to process requests. Automatic scheduling can be implemented by configuring the periodic scheduling properties of nodes and publishing the properties to a production environment.

Before you start

Limits

  • Limits on features

    DataWorks allows you to invoke only event functions. If you want to periodically schedule an event processing function in DataWorks, you must create an event function rather than an HTTP function to process event requests in Function Compute. For information about more function types, see Function type selection.

  • Limits on regions

    You can use the features provided by Function Compute only in the workspaces that are created in the following regions: China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Shenzhen), China (Hong Kong), Singapore, UK (London), US (Silicon Valley), US (Virginia), India (Mumbai) Closing Down, Germany (Frankfurt), and Australia (Sydney) Closing Down.

Precautions

  • When you use a Function Compute node, you must invoke the event function to be executed based on the service that you created. You may not obtain the list of created services when you want to select a service. This issue may occur due to one of the following reasons:

    • Your current account has overdue payments. In this case, top up your account and refresh the node configuration page to try again.

    • Your account does not have the required permissions to obtain the service list. In this case, contact the Alibaba Cloud account to grant you the fc:ListServices permission or attach the AliyunFCFullAccess policy to your account. After the authorization is complete, refresh the node configuration page to try again. For more information about authorization, see Grant permissions to a RAM user.

  • When you invoke a function to run a Function Compute node in DataWorks, if the running duration of the node exceeds one hour, set the Invocation Method parameter to Asynchronous Invocation for the node. For more information about asynchronous invocation, see Overview.

  • If you develop a Function Compute node as a RAM user, the following system policies or custom policies must be attached to the RAM user.

    Policy type

    Description

    References

    System policy

    Attach the AliyunFCFullAccess policy to the RAM user or the AliyunFCReadOnlyAccess and AliyunFCInvocationAccess policies to the RAM user.

    System policies

    Custom policy

    Grant the RAM user all the following permissions by using custom policies:

    • fc:GetService

    • fc:ListServices

    • fc:GetFunction

    • fc:InvokeFunction

    • fc:ListFunctions

    • fc:GetFunctionAsyncInvokeConfig

    • fc:ListServiceVersions

    • fc:ListAliases

    • fc:GetAlias

    • fc:ListFunctionAsyncInvokeConfigs

    • fc:GetStatefulAsyncInvocation

    • fc:StopStatefulAsyncInvocation

    Custom policies

Step 1: Go to the entry point for creating a Function Compute node

  1. Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Development and Governance > Data Development. On the page that appears, select the desired workspace from the drop-down list and click Go to Data Development.

  2. Go to the entry point for creating a Function Compute node.

    On the DataStudio page, you can use one of the methods shown in the following figure to go to the entry point for creating a Function Compute node.函数计算节点创建入口

Step 2: Create and configure a Function Compute node

  1. Create a Function Compute node.

    After you go to the entry point for creating a Function Compute node, configure basic information such as the node path and node name as prompted to create a Function Compute node.

  2. Configure parameters for the Function Compute node.

    On the configuration tab of the Function Compute node, select the function that you want to invoke to run the node and specify the invocation method and variables based on your business requirements.配置节点参数

    Parameter

    Description

    Select Service

    Select a service to provide resources for function invocation. All functions of a service share the same settings, such as service authorization and log configurations. If no service is available, create a service. For more information, see Create a service.

    Select Version Or Alias

    Select the version or alias of the service that you want to use for subsequent function invocation. The default version is LATEST.

    • Service version

      Function Compute provides the service-level versioning feature, which allows you to release one or more versions for a service. A version is similar to a service snapshot that contains the information such as the service settings, and the code and settings of functions that belong to the service. A version does not contain trigger information. When you release a version, the system generates a snapshot for the service and assigns a version number that is associated with the snapshot for future use. For more information about how to release a version, see Manage versions.

    • Version alias

      Function Compute allows you to create an alias for a service version. An alias points to a specific version of a service. You can use an alias to perform version release, rollback, or canary release with ease. An alias is dependent on a service or a version. When you use an alias to access a service or function, Function Compute parses the alias into the version to which the alias points. This way, the invoker does not need to know the specific version to which the alias points. For information about how to create an alias, see Manage aliases.

    Select Function

    Select a function that you want to invoke to run the Function Compute node. If no function is available, create a function. For more information, see Create a function.

    Note

    DataWorks allows you to invoke only event functions. If you want to periodically schedule an event processing function in DataWorks, you must create an event function rather than an HTTP function to process event requests in Function Compute. For information about more function types, see Function type selection.

    In this example, the para_service_01_by_time_triggers function is selected. When you create such a function, use the sample code for triggering a function at a scheduling time. Code logic:

    import json
    import logging
    
    logger = logging.getLogger()
    
    def handler(event, context):
        logger.info('event: %s', event)
    
        # Parse the json
        evt = json.loads(event)
        triggerName = evt["triggerName"]
        triggerTime = evt["triggerTime"]
        payload = evt["payload"]
    
        logger.info('triggerName: %s', triggerName)
        logger.info("triggerTime: %s", triggerTime)
        logger.info("payload: %s", payload)
    
        return 'Timer Payload: ' + payload

    For more information about the sample code of other functions, see Sample code.

    Invocation Method

    The method to invoke a function. Valid values:

    • Synchronous Invocation: When you synchronously invoke a function, an event directly triggers the function, and Function Compute executes the function and waits for a response. After the function is invoked, Function Compute returns the execution results of the function.

    • Asynchronous Invocation: When you asynchronously invoke a function, Function Compute immediately returns a response after the request is persisted instead of returning a response only after the request execution is complete.

      • If your function has the logic that is time-consuming, resource-consuming, or error-prone, you can use this method to allow your programs to respond to traffic spikes in an efficient and reliable manner.

      • We recommend that you use this method for Function Compute tasks of which the running duration exceeds one hour.

    Variable

    Assign values to the variables in the function based on your business requirements. The data in this field corresponds to the content on the Create New Test Event tab of the Configure Test Parameters panel for the function in the Function Compute console. To go to the Configure Test Parameters panel, go to the details page of the function in the Function Compute console and choose Test Function > Configure Test Parameters on the Code tab.

    In this example, assign the following parameters to the variables as values in the para_service_01_by_time_triggers function. The ${} format is used to define the bizdate variable. You need to assign a value to the variable in Step 4.

    {
        "payload": "payload1",
        "triggerTime": "${bizdate}",
        "triggerName": "triggerName1"
    }
  3. Optional. Debug and run the Function Compute node.

    After the Function Compute node is configured, you can click the 运行 icon to specify the resource group for running the node and assign constants to variables in the code to debug and run the node and test whether the code logic of the node is correct. The parameters that you configured to run the node are in the key=value format. If you configure multiple parameters, separate them with commas (,).

    Note

    For more information about node debugging, see Debugging procedure.

  4. Configure scheduling properties for the node to schedule and run the node on a regular basis.

    DataWorks provides scheduling parameters, which are used to implement dynamic parameter passing in node code in scheduling scenarios. After you define variables for a selected function on the configuration tab of the Function Compute node, you need to go to the Properties tab to assign values to the variables. In this example, $[yyyymmdd-1] is assigned to the bizdate variable. This way, DataWorks runs the Function Compute node at the time that is one day earlier than the scheduling time of the node. For more information about settings of scheduling parameters, see Supported formats of scheduling parameters. 配置节点周期性调度For more information about scheduling properties, see Overview.

Step 3: Commit and deploy the Function Compute node

Function Compute nodes can be automatically scheduled only after they are committed and deployed to the production environment.

  1. Save and commit the Function Compute node.

    Click the 保存 and 提交 icons in the top toolbar on the configuration tab of the Function Compute node to save and commit the Function Compute node. When you commit a node, enter a change description as prompted and specify whether to perform code review and smoke testing.

    Note
    • You can commit the node only after you configure the Rerun and Parent Nodes parameters on the Properties tab.

    • If the code review feature is enabled, a node can be deployed only after the code of the node is approved by a specified reviewer. For more information, see Code review.

    • To ensure that the node you created can be run as expected, we recommend that you perform smoke testing before you deploy the node. For more information, see Perform smoke testing.

  2. Optional. Deploy the Function Compute node.

    If the workspace that you use is in standard mode, you must click Deploy in the upper-right corner to deploy the node after you commit it. For more information, see Differences between workspaces in basic mode and workspaces in standard mode and Deploy nodes.

What to do next