All Products
Search
Document Center

DataWorks:Dynamically add watermarks to PDF files by using a Function Compute node in DataWorks

更新時間:Dec 03, 2024

This topic describes how to use a Function Compute node in DataWorks to call a Function Compute service and periodically add watermarks to incremental PDF files in Object Storage Service (OSS).

Background information

DataWorks allows you to use a Function Compute node to call a Function Compute service. You can perform custom configurations for various features in a Function Compute service and then use a Function Compute node in DataWorks to call the service.

Prerequisites

  • DataWorks is activated. For more information, see Activate DataWorks.

  • Function Compute is activated. For more information, see Quickly create a function.

  • OSS is activated. For more information, see Activate OSS. A bucket is created, and a PDF file to which you want to add a watermark is uploaded to the bucket. In this example, a directory named 2023-08-15 is created in the bucket-test222 bucket, and a file named example.pdf is uploaded to the directory.

Limits

  • Limits on features

    DataWorks allows you to invoke only event functions. If you want to periodically schedule an event processing function in DataWorks, you must create an event function rather than an HTTP function to process event requests in Function Compute. For information about more function types, see Function type selection.

  • Limits on regions

    You can use the features provided by Function Compute only in the workspaces that are created in the following regions: China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Shenzhen), China (Hong Kong), Singapore, UK (London), US (Silicon Valley), US (Virginia) and Germany (Frankfurt).

Step 1: Create a Function Compute application

  1. Log on to the Function Compute console. In the left-side navigation pane, click Applications.

  2. On the Applications page, click Create Application. On the Create Application page, select Use a Template to Create an Application. In the search box, enter start-pdf-watermark and click the search icon. Move the pointer over the start-pdf-watermark application template that is displayed after the search and click Create Now.

    image

    Note

    You can visit GitHub to view the source code of the start-pdf-watermark application template. The implementation logic of the application is to add a specified watermark to a PDF file in OSS and write the PDF file with the watermark back to the same OSS path.

  3. On the Create Application page, configure the parameters.

    Parameter

    Description

    Deployment Type

    Set the value to Directly Deploy.

    Application Name

    The system automatically generates a name that meets the related requirements. You can change the name based on your business requirements.

    Role Name

    AliyunFCServerlessDevsRole is selected by default. You can change the policies attached to the role based on your business requirements.

    • When you deploy applications in Serverless Application Center, make sure that Function Compute is granted with the required permissions. For example, some permissions are required when you deploy specific service and function resources and access other Alibaba Cloud services, such as Virtual Private Cloud (VPC), File Storage NAS (NAS), and Simple Log Service. First of all, you must associate a RAM role with the application or environment and set Function Compute as the trusted service. Then, Service Application Center can call the AssumeRole operation to obtain a Security Token Service (STS) token and assume the RAM role to access Alibaba Cloud services.

    • To simplify authorization, Serverless Application Center provides the default role AliyunFCServerlessDevsRole. This role has the permissions on some Alibaba Cloud resources that are accessed by Service Application Center. You can log on to the Resource Access Management (RAM) console to view the permissions of the AliyunFCServerlessDevsRole role.

    Region

    The region in which you want to create the application. After you select a region, only OSS buckets in the selected region are available for the OSS bucket name parameter.

    Function name

    The system automatically generates a name that meets the related requirements. You can change the name based on your business requirements.

    Time Zone

    The time zone to which the selected region belongs is selected by default. You can change the value based on your business requirements.

    OSS bucket name

    The name of the OSS bucket that you want to use. Only OSS buckets that reside in the region specified by the Region parameter are available.

    RAM's ARN

    AliyunFCDefaultRole is selected by default. You can change the value based on your business requirements.

    To simplify authorization operations, Function Compute provides a default service role named AliyunFCDefaultRole. This role has permissions on specific Alibaba Cloud resources that Function Compute needs to access. For information about how to create the AliyunFcDefaultRole default service role and assign the role, see the Step 1: Activate Function Compute section in the "Quickly create a function" topic.

    Note

    If the system prompts that additional permissions are required for the application when you create the application, you can click Authorize to obtain the required permissions.

  4. Click Create and Deploy Default Environment. If Deployed is displayed next to Deployment Status on the right side of the details page that appears, the application is created and deployed.

    image.png

  5. On the Applications page, find the created application and click the application name. The Environment Details tab appears.

    image

  6. In the Resource Information section of the Environment Details tab, click the value of Function to go to the details page of the function.

    image

  7. On the details page of the function, click the Test Function tab. On the Test Function tab, expand Configure test events and configure the following parameters.

    图片编辑

    • Event name: Enter a name for the test event in the Event Name field.

    • Event content: Enter JSON-formatted code in the code editor. In this example, the following code is entered.

      Important

      If you directly copy the following code, you must delete the forward slashes (/) and the comments after the forward slashes (/). Otherwise, the code may fail the JSON format verification.

      // The following code provides an example on how to add the watermark DataWorks to a PDF file named example.pdf in the 2023-08-15/ path. The font of the watermark is Helvetica, and the font size is 30. For information about the parameters in the code, see the comments of the parameters.
      {
          "pdf_file": "2023-08-15/example.pdf",  // The path of the PDF file in the OSS bucket.
          "mark_text": "DataWorks",    // The watermark text. If you want to add a watermark to a PDF file, this parameter is required.
          "pagesize": [595.275590551181, 841.8897637795275], // Optional. The default value is the A4 paper size (21 cm, 29.7 cm). 1 cm is equivalent to 28.346456692913385 points.
          "font": "Helvetica",     // Optional. The font of the watermark. The default value is Helvetica. If you want to add a watermark in Chinese to the PDF file, you can set this parameter to zenhei or microhei.
          "font_size": 20,         // Optional. The font size of the watermark. The default value is 30.
          "font_color": [0, 0, 0], // The font color of the watermark, in the RGB format. The default color is black.
          "rotate": 30,            // Optional. The rotation angle of the watermark. The default value is 0.
          "opacity": 0.1,          // Optional. The transparency of the watermark. The default value is 0.1. The value 1 indicates that the watermark is not transparent.
          "density": [198.4251968503937, 283.46456692913387] // The density of the watermark. The default value is [141.73228346456693, 141.73228346456693], which indicates an interval of 7 cm on the X-axis and an interval of 10 cm on the Y-axis exist between watermark texts.
      }
  • Click Test Function. If the code is successfully run, you can view the PDF file to which the watermark is added in the specified OSS path. In this example, the example-out.pdf file is generated.

    image.png

    You can view the source PDF file and the generated PDF file in OSS.

    image.png

Step 2: Create and configure a Function Compute node in the DataWorks console

  1. Log on to the DataWorks console.

  2. In the top navigation bar, select the region that you specify in Step 1: Create a Function Compute application.

  3. In the left-side navigation pane, choose Data Development and Governance > Data Development. On the Data Development page, select the desired workspace from the drop-down list and click Go to Data Development.

  4. In the Scheduled Workflow pane of the DataStudio page, find the desired workflow, click its name, right-click General, and then choose Create Node > Function Compute. In the Create Node dialog box, configure the Name parameter and click Confirm. A Function Compute node is created.

  5. On the configuration tab of the Function Compute node, configure the parameters.

    设置函数计算节点参数

    Parameter

    Description

    Select Function

    Select the function name that you specify in Substep 3 in Step 1. For information about how to create a function, see Manage functions.

    Note

    DataWorks allows you to invoke only event functions. If you want to periodically schedule an event processing function in DataWorks, you must create an event function rather than an HTTP function to process event requests in Function Compute. For information about more function types, see Function type selection.

    Select Version Or Alias

    Select the version or alias of the service that you want to use for subsequent function invocation. If you select Default Version, the Version parameter is displayed, and the value of the Version parameter is fixed as LATEST. In this example, Default Version is selected.

    • Service version

      Function Compute provides the service-level versioning feature, which allows you to release one or more versions for a service. A version is similar to a service snapshot that contains the information such as the service settings, and the code and settings of functions that belong to the service. A version does not contain trigger information. When you release a version, the system generates a snapshot for the service and assigns a version number that is associated with the snapshot for future use. For more information about how to release a version, see Manage versions.

    • Version alias

      Function Compute allows you to create an alias for a service version. An alias points to a specific version of a service. You can use an alias to perform version release, rollback, or canary release with ease. An alias is dependent on a service or a version. When you use an alias to access a service or function, Function Compute parses the alias into the version to which the alias points. This way, the invoker does not need to know the specific version to which the alias points. For information about how to create an alias, see Manage aliases.

    Invocation Method

    In this example, Synchronous Invocation is selected. For more information about invocation methods, see Synchronous invocations and the topics in the Asynchronous invocation directory.

    • Synchronous Invocation: When you synchronously invoke a function, an event directly triggers the function, and Function Compute executes the function and waits for a response. After the function is invoked, Function Compute returns the execution results of the function.

    • Asynchronous Invocation: When you asynchronously invoke a function, Function Compute immediately returns a response after the request is persisted instead of returning a response only after the request execution is complete.

      • If your function has the logic that is time-consuming, resource-consuming, or error-prone, you can use this method to allow your programs to respond to traffic spikes in an efficient and reliable manner.

      • We recommend that you use this method for Function Compute tasks of which the running duration exceeds one hour.

    Variable

    The parameters that are assigned to variables used in the code for invoking the function as values. In this example, the JSON content that you configure in Substep 7 in Step 1 is modified and used to add watermarks to incremental PDF files in OSS on a daily basis.

    // The following code provides an example on how to add a watermark to a PDF file named example.pdf in a path that is in the ${current_date}/ format.
    {
        "pdf_file": "${current_date}/example.pdf",  // The path of the PDF file in the OSS bucket.
        "mark_text": "DataWorks",    // The watermark text. If you want to add a watermark to a PDF file, this parameter is required.
        "pagesize": [595.275590551181, 841.8897637795275], // Optional. The default value is the A4 paper size (21 cm, 29.7 cm). 1 cm is equivalent to 28.346456692913385 points.
        "font": "Helvetica",     // Optional. The font of the watermark. The default value is Helvetica. If you want to add a watermark in Chinese to the PDF file, you can set this parameter to zenhei or microhei.
        "font_size": 20,         // Optional. The font size of the watermark. The default value is 30.
        "font_color": [0, 0, 0], // The font color of the watermark, in the RGB format. The default color is black.
        "rotate": 30,            // Optional. The rotation angle of the watermark. The default value is 0.
        "opacity": 0.1,          // Optional. The transparency of the watermark. The default value is 0.1. The value 1 indicates that the watermark is not transparent.
        "density": [198.4251968503937, 283.46456692913387] // The density of the watermark. The default value is [141.73228346456693, 141.73228346456693], which indicates an interval of 7 cm on the X-axis and an interval of 10 cm on the Y-axis exist between watermark texts.
    }
    Note
    • The value of pdf_file is in the ${current_date}/example.pdf format. ${current_date} indicates that a variable named current_date is used.

    • When DataWorks runs a task on the Function Compute node, DataWorks replaces ${current_date} with an actual value. You can configure the variable when you configure scheduling parameters for the Function Compute node. For example, if DataWorks runs a task on the Function Compute node on August 15, 2023, the value of pdf_file is 2023-08-15/example.pdf. If DataWorks runs a task on the Function Compute node on August 16, 2023, the value of pdf_file is 2023-08-16/example.pdf.

    • DataWorks can run a task on the Function Compute node to add watermarks to incremental PDF files every day only if the business system generates incremental PDF files in the specified OSS path every day based on specific time-related rules before the scheduling time of the Function Compute node.

    • For this example, you must upload a PDF file to a path that is in the /${current_date}/ format in OSS before DataWorks starts to run a task on the Function Compute node. For example, you can upload a PDF file named example.pdf to the 2023-08-15/ path.

  6. Optional. Debug and run a task on the Function Compute node. After the configuration is complete, click the 运行 icon in the top toolbar of the configuration tab of the Function Compute node. In the Runtime Parameters dialog box, select a resource group that you want to use to run a task on the Function Compute node, assign constants to the variables that you use as values, and then click Confirmation to test whether the code logic of the Function Compute node is correct. For example, if you assign 2023-08-15 to the ${current_date} variable as the value, DataWorks runs a task on the Function Compute node to add a watermark to the example.pdf file stored in the 2023-08-15/ path.

  7. Configure scheduling properties for the Function Compute node to periodically schedule and run a task on the node. DataWorks provides scheduling parameters, which are used to implement dynamic parameter passing in node code in scheduling scenarios. You can click Properties in the right-side navigation pane of the configuration tab of the Function Compute node. In the Scheduling Parameter section of the Properties tab, you can configure scheduling parameters for the Function Compute node. In this example, the current_date scheduling parameter is added, and $[yyyy-mm-dd] is assigned to the scheduling parameter as the value. yyyy-mm-dd indicates the year, month, and day when a task is run on the Function Compute node. For more information about scheduling parameter configurations, see Supported formats of scheduling parameters. For more information about scheduling properties, see Overview.

    image.png

Step 3: Commit and deploy the Function Compute node

Function Compute nodes can be automatically scheduled only after they are committed and deployed to the production environment.

  1. Save and commit the Function Compute node.

    Click the 保存 and 提交 icons in the top toolbar on the configuration tab of the Function Compute node to save and commit the Function Compute node. When you commit a node, enter a change description as prompted and specify whether to perform code review and smoke testing.

    Note
    • You can commit the node only after you configure the Rerun and Parent Nodes parameters on the Properties tab.

    • If the code review feature is enabled, a node can be deployed only after the code of the node is approved by a specified reviewer. For more information, see Code review.

    • To ensure that the node you created can be run as expected, we recommend that you perform smoke testing before you deploy the node. For more information, see Perform smoke testing.

  2. Optional. Deploy the Function Compute node.

    If the workspace that you use is in standard mode, you must click Deploy in the upper-right corner to deploy the node after you commit it. For more information, see Differences between workspaces in basic mode and workspaces in standard mode and Deploy nodes.

What to do next

References