All Products
Search
Document Center

DataWorks:Best practice for configuring data push nodes in a workflow

Last Updated:Nov 05, 2024

DataWorks DataStudio allows you to create data push nodes in a workflow to push data to specific destinations based on different push methods. DataStudio provides the following push methods: simple data push, combined data push, script data push, conditional data push, and MaxCompute data push. This topic describes how to configure data push nodes in a workflow to push data based on the selected push method.

Background information

In DataStudio, you can create data push nodes in a workflow. The data push nodes obtain the results of simple queries on data processed by other nodes in the workflow and push the obtained data to DingTalk groups, Lark groups, WeCom groups, or Microsoft Teams based on specific scheduling settings in a timely manner.

Process

  1. Create a node to prepare test data for different data push flows.

  2. Create data query nodes, assignment nodes, and other data processing nodes to process and query test data.

    Note

    You can create different types of nodes to prepare test data, process data, and query data based on your business requirements. In the example in this topic, MySQL nodes are used.

  3. Create data push nodes to receive the output parameters of data query nodes. Then, use the input parameters of the data push nodes to push the obtained data to DingTalk groups, Lark groups, WeCom groups, or Microsoft Teams.

Push methods

This topic provides the following push methods to allow you to configure data push nodes in a workflow and push data: simple data push, combined data push, script data push, conditional data push, and MaxCompute data push.

  • Simple data push: In a simple data push flow, a node queries data and uses the output parameters to pass the query results to a data push node. Then, the data push node pushes the data to specific destinations.

  • Combined data push: In a combined data push flow, multiple nodes query data and use the output parameters to pass the query results to the same data push node. Then, the data push node pushes the data to specific destinations.

  • Script data push: In a script data push flow, an assignment node processes data and uses the output parameters to pass the results to a data push node. Then, the data push node pushes the data to specific destinations. For more information about assignment nodes, see Configure an assignment node.

  • Conditional data push: In a conditional data push flow, a branch node performs a logical judgment and passes data that meets specific conditions and data that does not meet the conditions to different data query nodes. The data query nodes query the data and use the output parameters to pass the results to different data push nodes. Then, the data push nodes push the data to specific destinations. For more information about branch nodes, see Branch node.

  • MaxCompute data push: In a MaxCompute data push flow, an assignment node queries data from a MaxCompute data source and passes the query results to a data push node. Then, the data push node pushes data to specific destinations.

Prerequisites

Make sure that the following requirements are met:

  • DataWorks is activated. For more information, see Activate DataWorks.

  • A DataWorks workspace is created. For more information, see Create a workspace.

  • An ApsaraDB RDS for MySQL instance is created in the ApsaraDB RDS console, and a MySQL data source is added to the workspace. For more information, see Add and manage data sources.

  • A MaxCompute data source is added to the workspace. For more information, see Add a MaxCompute data source.

    Note

    In this topic, a MySQL data source and a MaxCompute data source are used. You can add different types of data sources based on your business requirements.

Limits

  • Limits on the data size:

    • If you want to push data to DingTalk, the data size cannot exceed 20 KB.

    • If you want to push data to Lark, the data size cannot exceed 30 KB, and the size of an image must be less than 10 MB.

    • If you want to push data to WeCom, each chatbot can send a maximum of 20 messages per minute.

    • If you want to push data to Microsoft Teams, the data size cannot exceed 28 KB.

    Note

    The tables that you add by using Markdown cannot be displayed as expected on the mobile clients of DingTalk and WeCom. We recommend that you select Table in the Body section to add tables to push the content that you want to push. The tables that you add by using Markdown can be displayed as expected on the mobile clients of Lark and Microsoft Teams

  • The data push feature is available only in DataWorks workspaces in the following regions: China (Hangzhou), China (Shanghai), China (Beijing), China (Shenzhen), China (Chengdu), China (Hong Kong), Singapore, Malaysia (Kuala Lumpur), US (Silicon Valley), US (Virginia), and Germany (Frankfurt).

  • If you want to push the data of a MaxCompute data source to specific destinations, you must purchase an exclusive resource group for scheduling. Serverless resource groups are not supported.

Preparations

Create a workflow

  1. Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Development and Governance > Data Development. On the page that appears, select the desired workspace from the drop-down list and click Go to Data Development.

  2. In the Scheduled Workflow pane, right-click Business Flow and select Create Workflow. In the Create Workflow dialog box, configure the Workflow Name parameter based on your business requirements. In this example, DataPushDemo is used.

Create nodes in the workflow

Double-click the name of the DataPushDemo workflow. On the configuration tab of the workflow, click the image icon to create nodes based on the push method that you use. The following table describes the nodes that you must create for different push methods. To help you complete the practice, we recommend that you name the nodes with the node names that are described in the following table.

Push method

Node name

Node type

Node description

Conditional data push

SalesAmountPreMonth

MySQL node

Queries the total sales amount of the previous month in the test data and uses the output parameters to pass the query results to the branch node.

Condition

Branch node

Receives the output parameters of the MySQL node, performs a logical judgment, and uses the output parameters of the branch node to pass data that meets and does not meet specific conditions to corresponding MySQL nodes.

CompliantData

MySQL node

Receives the output parameters that meet specific conditions of the branch node, queries data that meets specific conditions, and then uses the output parameters of the MySQL node to pass the query results to a data push node.

NonCompliantData

Receives the output parameters that do not meet specific conditions of the branch node, queries data that does not meet specific conditions, and then uses the output parameters of the MySQL node to pass the query results to a data push node.

Top3Categories

Data push node

Receives the output parameters that meet specific conditions of the MySQL node and uses the input parameters of the data push node to push compliant data to specific destinations.

Bottom3Categories

Receives the output parameters that do not meet specific conditions of the MySQL node and uses the input parameters of the data push node to push non-compliant data to specific destinations.

Script data push

SalesAmountPreWeek

MySQL node

Queries the total sales amount of the top three categories of the previous week in the test data and uses the output parameters to pass the query results to the assignment node.

Top3CategoryList

Assignment node

Receives the output parameters of the MySQL node, lists the parameters, and then uses the output parameters of the assignment node to pass the results to a data push node. Select Python for Language when you configure the assignment node.

Top3CategoriesPreWeek

Data push node

Receives the output parameters of the assignment node and uses the input parameters of the data push node to push data to specific destinations.

Combined data push

SalesAmountPreDay

MySQL node

  • Queries the total sales amount and sales amount growth of the previous day in the test data and uses the output parameters of MySQL nodes to pass the query results to the data push node.

  • The SalesAmountPreDay node can be used in both the combined data push method and the simple data push method.

SalesGrowthPreDay

CombinedPush

Data push node

Receives the output parameters of the MySQL nodes and uses the input parameters of the data push node to push data to specific destinations.

Simple data push

SalesAmountPreDay

MySQL node

Queries the total sales amount of the previous day in the test data and uses the output parameters to pass the query results to a data push node.

PushSalesAmountPreDay

Data push node

Receives the output parameters of the MySQL node and uses the input parameters of the data push node to push data to specific destinations.

MaxCompute data push

MaxComputeDataSync

Batch synchronization

Synchronizes the data in a MySQL database to the MaxCompute data source.

MaxComputeDataQuery

Assignment node

Queries data from the MaxCompute data source and uses the output parameters of the assignment node to pass the query results to a data push node.

MaxComputeDataPush

Data push node

Receives the output parameters of the assignment node and uses the input parameters of the data push node to push data to specific destinations.

Prepare data

In this topic, test data comes from order tables. This section describes how to create a test table and write data to the table. If you do not need test data, you can skip the operations in this section.

Create a test table

  1. Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Development and Governance > Data Development. On the page that appears, select the desired workspace from the drop-down list and click Go to Data Development.

  2. In the left-side navigation pane of the DataStudio page, click the image icon. In the Ad Hoc Query pane, move the pointer over the image icon and choose Create > MySQL. In the Create Node dialog box, configure the following parameters:

    • Node Type: MySQL.

    • Path: Ad Hoc Query.

    • Name: TableCreation.

  3. Create a test table named orders. Sample code:

CREATE TABLE orders (
     order_id INT NOT NULL AUTO_INCREMENT,
     category VARCHAR(100) NOT NULL, -- The category.
     sales DOUBLE NOT NULL, -- The sales amount of orders.
     datetime DATETIME NOT NULL, -- The time when an order is paid.
     PRIMARY KEY (order_id),
     INDEX (category)
);

Create a stored procedure

The following code provides an example on how to create a stored procedure that generates order data from the previous two months.

Note

You must create a stored procedure in the MySQL client.

DELIMITER $$

CREATE PROCEDURE InsertOrders(IN num_orders INT)
BEGIN
  DECLARE v_category VARCHAR(100);
  DECLARE v_sales DOUBLE;
  DECLARE v_datetime DATETIME;
  DECLARE v_category_list VARCHAR(255);
  DECLARE v_index INT;
  DECLARE i INT DEFAULT 0;
  
  -- Define the categories. Separate multiple categories with commas.
  SET v_category_list = 'Electronics,Books,Home & Kitchen,Fashion,Toys,Baby,Computers,Electronics,Games,Garden,Clothing,Grocery,Health,Jewelry,Kids';
  -- Obtain the total number of categories.
  SET v_index = ROUND((RAND() * (CHAR_LENGTH(v_category_list) - CHAR_LENGTH(REPLACE(v_category_list, ',', '')) + 1)));
  
  WHILE i < num_orders DO
    -- Generate a random index to select categories.
    SET v_index = FLOOR(1 + (RAND() * (CHAR_LENGTH(v_category_list) - CHAR_LENGTH(REPLACE(v_category_list, ',', '')) + 1)));
    -- Extract random categories from the list of categories.
    SET v_category = SUBSTRING_INDEX(SUBSTRING_INDEX(v_category_list, ',', v_index), ',', -1);
    
    -- Generate a random sales amount that ranges from 1,000 to 30,000.
    SET v_sales = 1000 + FLOOR(RAND() * 29000);
    
    -- Generate random dates in the previous two months.
    SET v_datetime = NOW() - INTERVAL FLOOR(RAND() * 61) DAY;
    
    -- Insert data of a new random order into the orders table.
    INSERT INTO orders (category, sales, datetime) VALUES (v_category, v_sales, v_datetime);
    
    SET i = i + 1;
  END WHILE;
END$$

DELIMITER ;

Write test data to the orders table

After the stored procedure is created, you can execute the CALL statement to call the stored procedure to insert random order data into the orders table.

-- Call the stored procedure to insert a specific number of order data records into the orders table.
CALL InsertOrders(1000); -- Insert 1,000 order data records into the orders table.

Configure data push flows in the workflow

This section describes how to configure data push flows in the workflow to push data to specific destinations by using the following push methods: conditional data push, script data push, combined data push, simple data push, and MaxCompute data push.

Conditional data push

Step 1: Configure a data push flow

Double-click the DataPushDemo workflow. On the workflow canvas, connect the SalesAmountPreMonth, Condition, CompliantData, Top3Categories, NonCompliantData, and Bottom3Categories nodes to form a data push flow. The following figure shows the data push flow.

image

Step 2: Configure the SQL query node

You can configure the SQL query node to query test data and add the outputs parameter in the Input and Output Parameters section to pass the query results to the branch node.

  1. Double-click the SalesAmountPreMonth node. On the configuration tab of the node, write code for the node. Sample code:

    -- Query the total sales amount of the previous month.
    SELECT SUM(sales) AS sales_amount
    FROM orders
    WHERE datetime BETWEEN DATE_FORMAT(CURRENT_DATE - INTERVAL 1 MONTH, '%Y-%m-01 00:00:00') AND DATE_FORMAT(LAST_DAY(CURRENT_DATE - INTERVAL 1 MONTH), '%Y-%m-%d 23:59:59');
  2. In the right-side navigation pane of the configuration tab of the node, click the Properties tab. On the Properties tab, configure parameters.

    • Scheduled time: Set the value to 08:00.

    • Resource Group: Select an existing serverless resource group.

    • Parent Nodes: Select Add Root Node.

    • Output Parameters: In the Input and Output Parameters section, click Add assignment parameter on the right side of Output Parameters to add the outputs parameter of the SQL query node. The outputs parameter is used as the input parameter of the data push node.image

Step 3: Configure the branch node

You can configure the outputs parameter of the SQL query node as the input parameter of the branch node in the Input and Output Parameters section. After the branch node performs a logical judgment, the branch node uses its outputs parameter to pass the query results to the child nodes of the branch node.

  1. Double-click the Condition node. In the Definition section on the configuration tab of the node, click Add Branch. In the Branch Definition dialog box, configure the parameters. The following table describes the parameters that you must configure.

    Parameter

    Child node that receives compliant data

    Child node that receives non-compliant data

    Condition

    ${inputs[0][0]}>=500000

    ${inputs[0][0]}<500000

    Associated Node Output

    Compliant

    Non-compliant

    Description

    Sales amount that meets specific conditions

    Sales amount that does not meet specific conditions

    Note

    [0][0] is a two-dimensional array that specifies the data in the outputs parameter of the ancestor node of the branch node.

    • If the ancestor node is an SQL query node, a two-dimensional array is used to specify the data in the outputs parameter of the ancestor node of the branch node.

    • If the ancestor node is a Python node, a one-dimensional array is used to specify the data in the outputs parameter of the ancestor node of the branch node.

  2. In the right-side navigation pane of the configuration tab of the branch node, click the Properties tab. On the Properties tab, configure parameters. The following table describes the parameters that you must configure.

    Section and parameter

    Description

    Schedule

    Scheduling Cycle

    Set the value to Day.

    Scheduled time

    Set the value to 08:00.

    Rerun

    Set the value to Allow Regardless of Running Status.

    Resource Group

    Resource Group

    Select an existing resource group for scheduling.

    Note

    The first time you use a data push node, you must submit a ticket to upgrade your resource group for scheduling.

    Output Name of Current Node

    After you configure the branches, the output names are automatically parsed and displayed. The output name is the same as the value that you set for the Associated Node Output parameter when you added the branch.

    Input and Output Parameters

    Input Parameters

    Click Create. Parameter Name: inputs.

    Value Source: Select the outputs parameter of the SalesAmountPreMonth node.

    Output Parameters

    By default, the system adds the outputs parameter.

  3. In the top toolbar, click the image icon to save the configurations.

Step 4: Configure the child nodes of the branch node

In this example, the CompliantData and NonCompliantData nodes are child nodes of the branch node. After the branch node passes the query results to its child nodes by using the outputs parameter, you can add the outputs parameter in the Input and Output Parameters section for each child node to pass the query results to the corresponding data push node.

  1. Double-click the CompliantData and NonCompliantData nodes respectively. On the configuration tab of each node, write code for the node.

    Sample code for the CompliantData node

    SET @all_cat_sales_volume_month := 0.0;
    SELECT SUM(sales) INTO @all_cat_sales_volume_month FROM orders WHERE datetime BETWEEN DATE_FORMAT(CURRENT_DATE - INTERVAL 1 MONTH, '%Y-%m-01 00:00:00') AND DATE_FORMAT(LAST_DAY(CURRENT_DATE - INTERVAL 1 MONTH), '%Y-%m-%d 23:59:59');
    
    -- Create a temporary table.
    CREATE TEMPORARY TABLE IF NOT EXISTS temp_array (
      category VARCHAR(255),
      sales DOUBLE,
      all_cat_sales_volume_month DOUBLE
    );
    -- Query data and write the data to the temporary table.
    INSERT INTO temp_array (category, sales, all_cat_sales_volume_month) SELECT category, SUM(sales) AS amount, @all_cat_sales_volume_month FROM orders WHERE datetime BETWEEN DATE_FORMAT(CURRENT_DATE - INTERVAL 1 MONTH, '%Y-%m-01 00:00:00') AND DATE_FORMAT(LAST_DAY(CURRENT_DATE - INTERVAL 1 MONTH), '%Y-%m-%d 23:59:59') GROUP BY category ORDER BY amount DESC limit 3;
    -- Query the top three categories whose sales amount meets specific conditions.
    SELECT category, sales, all_cat_sales_volume_month FROM temp_array;

    Sample code for the NonCompliantData node

    SET @all_cat_sales_volume_month := 0.0;
    SELECT SUM(sales) INTO @all_cat_sales_volume_month FROM orders WHERE datetime BETWEEN DATE_FORMAT(CURRENT_DATE - INTERVAL 1 MONTH, '%Y-%m-01 00:00:00') AND DATE_FORMAT(LAST_DAY(CURRENT_DATE - INTERVAL 1 MONTH), '%Y-%m-%d 23:59:59');
    
    -- Create a temporary table.
    CREATE TEMPORARY TABLE IF NOT EXISTS temp_array (
      category VARCHAR(255),
      sales DOUBLE,
      all_cat_sales_volume_month DOUBLE
    );
    
    -- Query data and write the data to the temporary table.
    INSERT INTO temp_array (category, sales, all_cat_sales_volume_month) SELECT category, SUM(sales) AS amount, @all_cat_sales_volume_month FROM orders WHERE datetime BETWEEN DATE_FORMAT(CURRENT_DATE - INTERVAL 1 MONTH, '%Y-%m-01 00:00:00') AND DATE_FORMAT(LAST_DAY(CURRENT_DATE - INTERVAL 1 MONTH), '%Y-%m-%d 23:59:59') GROUP BY category ORDER BY amount ASC limit 3;
    
    -- Query the bottom three categories whose sales amount does not meet specific conditions.
    SELECT category, sales, all_cat_sales_volume_month FROM temp_array;
  2. In the right-side navigation pane of the configuration tab of the node, click the Properties tab. On the Properties tab, configure parameters.

    • Scheduled time: Set the value to 08:00.

    • Resource Group: Select an existing serverless resource group.

    • Parent Nodes: Check whether the dependencies between the branch node and its child nodes are correct. The dependencies are configured when you configure the data push flow in Step 1.

      • CompliantData: In the Dependencies section, check whether the value in the Output Name of Ancestor Node column is Compliant.

      • NonCompliantData: In the Dependencies section, check whether the value in the Output Name of Ancestor Node column is Non-compliant.

    • Output Parameters: In the Input and Output Parameters section, click Add assignment parameter on the right side of Output Parameters to add the outputs parameter of the SQL query node. The outputs parameter is used as the input parameter of the data push node.image

  3. In the top toolbar, click the image icon to save the configurations.

Step 5: Configure the data push nodes

You can add an input parameter to Input Parameters in the Input and Output Parameters section for each data push node to obtain the outputs parameters of the CompliantData and NonCompliantData nodes. This way, the data push nodes can use the input parameters to push data to specific destinations.

  1. Double-click the Top3Categories and Bottom3Categories nodes respectively. In the right-side navigation pane of the configuration tab of related node, click the Properties tab. On the Properties tab, configure parameters. The following table describes the parameters that you must configure.

    Section and parameter

    Description

    Screenshot

    Scheduling Parameter

    Parameter Name

    Set the value to curdate.

    image

    Parameter Value

    Set the value to $[yyyymmddhh:mi:ss].

    Schedule

    Scheduling Cycle

    Set the value to Day.

    image

    Scheduled time

    Set the value to 08:00.

    Note

    In this example, make sure that data can be pushed to specific destinations at 08:00 every day. You can configure this parameter based on your business requirements.

    Rerun

    Set the value to Allow Regardless of Running Status.

    Resource Group

    Resource Group

    Select an existing resource group for scheduling.

    Note

    The first time you use a data push node, you must submit a ticket to upgrade your resource group for scheduling.

    image

    Input and Output Parameters

    Input Parameters

    Click Create to add an input parameter.

    • Parameter Name: inputs.

    • Value Source:

      • Top3Categories: Select the outputs parameter of the CompliantData node from the drop-down list.

      • Bottom3Categories: Select the outputs parameter of the NonCompliantData node from the drop-down list.

    • Top3Categoriesimage

    • Bottom3Categoriesimage

  2. Configure the data to push.

    • Destination: Select a destination from the Destination drop-down list. If no destination is available, click Create Destination to create a destination. The following table describes the parameters for creating a destination.

      Parameter

      Description

      Type

      Select a push channel. DingTalk, Lark, WeCom, and Teams are supported.

      Name

      Enter a name based on your business requirements.

      WebHook

      Enter the webhook URL of the push channel that you selected. You need to obtain the URL of the related push channel in the corresponding platform.

      Note
    • Title: Enter Categories whose sales amount ranks the top three for the Top3Categories node and Categories whose sales amount ranks the bottom three for the Bottom3Categories node.

    • Body: Configure the content based on your business requirements. For more information, see the Configure the content to push section of the Data push topic.

      Note

      In the Body section, you can directly use the field names that are queried by the SQL query node as placeholders to obtain the input parameters of the data push node.

  3. In the top toolbar, click the image icon to save the configurations.

Step 6: Test the data push flow

After you configure the conditional data push flow, you must test the flow before you commit the workflow and deploy nodes in the workflow.

  1. Double-click the DataPushDemo workflow.

  2. On the configuration tab of the workflow, right-click the SalesAmountPreMonth node and select Run Current Node and Its Descendant Nodes.

    Note

    If a node in the data push flow fails, right-click the node and select View Log to view the logs.

    image

Script data push

Step 1: Configure a data push flow

Double-click the DataPushDemo workflow. On the workflow canvas, connect the SalesAmountPreWeek, Top3CategoryList, and Top3CategoriesPreWeek nodes to form a data push flow. The following figure shows the data push flow.

image

Step 2: Configure the SQL query node

You can configure the SQL query node to query test data and add the outputs parameter in the Input and Output Parameters section to pass the query results of the SQL query node to the assignment node.

  1. Double-click the SalesAmountPreWeek node. On the configuration tab of the node, write code for the node. Sample code:

    -- Query the total sales amount of the previous week.
    SELECT category, SUM(sales) AS amount
    FROM orders
    WHERE datetime BETWEEN DATE_FORMAT(DATE_SUB(CURDATE(), INTERVAL 1 WEEK), '%Y-%m-%d 00:00:00') AND DATE_FORMAT(DATE_SUB(CURDATE(), INTERVAL 1 DAY), '%Y-%m-%d 23:59:59')
    GROUP BY category ORDER BY amount DESC limit 3;
  2. In the right-side navigation pane of the configuration tab of the node, click the Properties tab. On the Properties tab, configure parameters.

    • Scheduled time: Set the value to 08:00.

    • Resource Group: Select an existing serverless resource group.

    • Parent Nodes: Select Add Root Node.

    • Output Parameters: In the Input and Output Parameters section, click Add assignment parameter on the right side of Output Parameters to add the outputs parameter of the SQL query node. The outputs parameter is used as the input parameter of the data push node.image

  3. In the top toolbar, click the image icon to save the configurations.

Step 3: Configure the assignment node

You can configure the outputs parameter of the SQL query node as the input parameter of the assignment node in the Input and Output Parameters section. Then, reassign a value to the input parameter to generate the outputs parameter of the assignment node. The assignment node uses the outputs parameter to pass data to the data push node.

  1. Double-click the Top3CategoryList node. On the configuration tab of the node, select Python for Language and write code for the node.

    def main():
    
        From datetime import date
        today = date.today()
        formatted_date = today.strftime('%Y-%m-%d')
        
        msg = 'Stat date: ' + formatted_date + ' \\n\\n ' \
        '- 1: ${inputs[0][0]}, sales: ${inputs[0][1]} \\n\\n ' \
        '- 2: ${inputs[1][0]}, sales: ${inputs[1][1]} \\n\\n ' \
        '- 3: ${inputs[2][0]}, sales: ${inputs[2][1]} \\n\\n '
        
        print(msg)
    
    
    if __name__ == "__main__":
        import sys
        main()
  2. In the right-side navigation pane of the configuration tab of the node, click the Properties tab. On the Properties tab, configure parameters.

    • Scheduled time: Set the value to 08:00.

    • Resource Group: Select an existing resource group for scheduling.

    • Input and Output Parameters

      • Input Parameters

        • Click Create. Parameter Name: inputs.

        • Value Source: Select the outputs parameter of the SalesAmountPreWeek node from the drop-down list.

      • Output Parameters: By default, the system adds the outputs parameter.

  3. In the top toolbar, click the image icon to save the configurations.

Step 4: Configure the data push node

You can configure the outputs parameter of the assignment node as the input parameter of the data push node in the Input and Output Parameters section. Then, the data push node uses the input parameter to push data to specific destinations.

  1. Double-click the Top3CategoriesPreWeek node. In the right-side navigation pane of the configuration tab of the node, click the Properties tab. On the Properties tab, configure parameters. The following table describes the parameters that you must configure.

    Section and parameter

    Description

    Screenshot

    Scheduling Parameter

    Parameter Name

    Set the value to curdate.

    image

    Parameter Value

    Set the value to $[yyyymmddhh:mi:ss].

    Schedule

    Scheduling Cycle

    Set the value to Day.

    image

    Scheduled time

    Set the value to 08:00.

    Note

    In this example, make sure that data can be pushed to specific destinations at 08:00 every day. You can configure this parameter based on your business requirements.

    Rerun

    Set the value to Allow Regardless of Running Status.

    Resource Group

    Resource Group

    Select an existing resource group for scheduling.

    Note

    The first time you use a data push node, you must submit a ticket to upgrade your resource group for scheduling.

    image

    Input and Output Parameters

    Input Parameters

    Click Create to add an input parameter.

    Parameter Name: inputs.

    Value Source: Select the outputs parameter of the Top3CategoryList node from the drop-down list.

    image

  2. Configure the data to push.

    • Destination: Select a destination from the Destination drop-down list. If no destination is available, click Create Destination to create a destination. The following table describes the parameters for creating a destination.

      Parameter

      Description

      Type

      Select a push channel. DingTalk, Lark, WeCom, and Teams are supported.

      Name

      Enter a name based on your business requirements.

      WebHook

      Enter the webhook URL of the push channel that you selected. You need to obtain the URL of the related push channel in the corresponding platform.

      Note
    • Title: Enter Categories whose sales amount ranks the top three in the previous week.

    • Body: Configure the content based on your business requirements. For more information, see the Configure the content to push section of the Data push topic.

      Note

      In the Body section, you can directly use the field names that are queried by the SQL query node as placeholders to obtain the input parameters of the data push node.

      image

  3. In the top toolbar, click the image icon to save the configurations.

Step 5: Test the data push flow

After you configure the script data push flow, you must test the flow before you commit the workflow and deploy nodes in the workflow.

  1. Double-click DataPushDemo workflow.

  2. On the configuration tab of the workflow, right-click the SalesAmountPreWeek node and select Run Current Node and Its Descendant Nodes.

    Note

    If a node in the data push flow fails, right-click the node and select View Log to view the logs.

    image

Simple data push

Step 1: Configure a data push flow

Double-click the DataPushDemo workflow. On the workflow canvas, connect the SalesAmountPreDay and PushSalesAmountPreDay nodes to form a data push flow. The following figure shows the data push flow.

image

Step 2: Configure the SQL query node

You can configure the SQL query node to query test data and add the outputs parameter in the Input and Output Parameters section to pass the query results of the SQL query node to the data push node.

  1. Double-click the SalesAmountPreDay node. On the configuration tab of the node, write code for the node. Sample code:

    -- Create a temporary table named temp_array.
    CREATE TEMPORARY TABLE IF NOT EXISTS temp_array (
      total_amount DOUBLE
    );
    
    -- Write data related to the total sales amount for yesterday to the temp_array table.
    INSERT INTO temp_array (total_amount) 
    SELECT SUM(sales)
    FROM orders
    WHERE datetime BETWEEN DATE_FORMAT(DATE_SUB(CURDATE(), INTERVAL 1 DAY), '%Y-%m-%d 00:00:00') AND DATE_FORMAT(DATE_SUB(CURDATE(), INTERVAL 1 DAY), '%Y-%m-%d 23:59:59');
    
    -- Query the temp_array table.
    select total_amount FROM temp_array;
  2. In the right-side navigation pane of the configuration tab of the node, click the Properties tab. On the Properties tab, configure parameters.

    • Scheduled time: Set the value to 08:00.

    • Resource Group: Select an existing serverless resource group.

    • Parent Nodes: Select Add Root Node.

    • Output Parameters: In the Input and Output Parameters section, click Add assignment parameter on the right side of Output Parameters to add the outputs parameter of the SQL query node. The outputs parameter is used as the input parameter of the data push node.image

  3. In the top toolbar, click the image icon to save the configurations.

Step 3: Configure the data push node

You can configure the outputs parameter of the SQL query node as the input parameter of the data push node in the Input and Output Parameters section. Then, the data push node uses the input parameter to push data to specific destinations.

  1. Double-click the PushSalesAmountPreDay node. In the right-side navigation pane of the configuration tab of the node, click the Properties tab. On the Properties tab, configure parameters. The following table describes the parameters that you must configure.

    Section and parameter

    Description

    Screenshot

    Scheduling Parameter

    Parameter Name

    Set the value to curdate.

    image

    Parameter Value

    Set the value to $[yyyymmddhh:mi:ss].

    Schedule

    Scheduling Cycle

    Set the value to Day.

    image

    Scheduled time

    Set the value to 08:00.

    Note

    In this example, make sure that data can be pushed to specific destinations at 08:00 every day. You can configure this parameter based on your business requirements.

    Rerun

    Set the value to Allow Regardless of Running Status.

    Resource Group

    Resource Group

    Select an existing resource group for scheduling.

    Note

    The first time you use a data push node, you must submit a ticket to upgrade your resource group for scheduling.

    image

    Input and Output Parameters

    Input Parameters

    Click Create to add an input parameter.

    Parameter Name: inputs.

    Value Source: Select the outputs parameter of the SalesAmountPreDay node from the drop-down list.

    image

  2. Configure the data to push.

    • Destination: Select a destination from the Destination drop-down list. If no destination is available, click Create Destination to create a destination. The following table describes the parameters for creating a destination.

      Parameter

      Description

      Type

      Select a push channel. DingTalk, Lark, WeCom, and Teams are supported.

      Name

      Enter a name based on your business requirements.

      WebHook

      Enter the webhook URL of the push channel that you selected. You need to obtain the URL of the related push channel in the corresponding platform.

      Note
    • Title: Enter Total sales amount for yesterday.

    • Body: Configure the content based on your business requirements. For more information, see the Configure the content to push section of the Data push topic.

      Note

      In the Body section, you can directly use the field names that are queried by the SQL query node as placeholders to obtain the input parameters of the data push node.

      image

  3. In the top toolbar, click the image icon to save the configurations.

Step 4: Test the flow

After you configure the simple data push flow, you must test the flow before you commit the workflow and deploy nodes in the workflow.

  1. Double-click DataPushDemo workflow.

  2. On the configuration tab of the workflow, right-click the SalesAmountPreDay node and select Run Current Node and Its Descendant Nodes.

    Note

    If a node in the data push flow fails, right-click the node and select View Log to view the logs.

    image

Combined data push

Step 1: Configure a data push flow

Double-click the DataPushDemo workflow. On the workflow canvas, connect the SalesAmountPreDay, SalesGrowthPreDay, and CombinedPush nodes to form a data push flow. The following figure shows the data push flow.

Note

The SalesAmountPreDay node can be used in both the combined data push method and the simple data push method.

image

Step 2: Configure the SQL query node

You can configure each SQL query node to query test data and use the outputs parameter in the Input and Output Parameters section to pass the query results of the SQL query node to the data push node.

  1. Click the SalesGrowthPreDay node. On the configuration tab of the node, write code for the node. Sample code:

    -- Create a table named temp_array1 to collect the total sales amount of the day before yesterday.
    CREATE TEMPORARY TABLE IF NOT EXISTS temp_array1 (
      category VARCHAR(255),
      sales DOUBLE
    );
    -- Write data of the day before yesterday to the temp_array1 table.
    INSERT INTO temp_array1 (category, sales) SELECT category, SUM(sales)
    FROM orders
    WHERE datetime BETWEEN DATE_FORMAT(DATE_SUB(CURDATE(), INTERVAL 2 DAY), '%Y-%m-%d 00:00:00') AND DATE_FORMAT(DATE_SUB(CURDATE(), INTERVAL 2 DAY), '%Y-%m-%d 23:59:59')
    GROUP BY category;
    
    -- Create a table named temp_array2 to collect the total sales amount for yesterday.
    CREATE TEMPORARY TABLE IF NOT EXISTS temp_array2 (
      category VARCHAR(255),
      sales DOUBLE
    );
    -- Write data for yesterday to the temp_array2 table.
    INSERT INTO temp_array2 (category, sales) SELECT category, SUM(sales)
    FROM orders
    WHERE datetime BETWEEN DATE_FORMAT(DATE_SUB(CURDATE(), INTERVAL 1 DAY), '%Y-%m-%d 00:00:00') AND DATE_FORMAT(DATE_SUB(CURDATE(), INTERVAL 1 DAY), '%Y-%m-%d 23:59:59')
    GROUP BY category;
    
    -- Create a table named result to collect the sales growth amount for yesterday.
    CREATE TEMPORARY TABLE IF NOT EXISTS result (
      category VARCHAR(255),
      diff DOUBLE
    );
    -- Write sales growth data to the result table.
    INSERT INTO result (category, diff) SELECT temp_array2.category AS category, temp_array2.sales - temp_array1.sales AS diff FROM temp_array1 LEFT JOIN temp_array2 ON temp_array1.category = temp_array2.category;
    
    -- Query data from the result table.
    SELECT category, diff FROM result;

  2. In the right-side navigation pane of the configuration tab of the node, click the Properties tab. On the Properties tab, configure parameters.

    • Scheduled time: Set the value to 08:00.

    • Resource Group: Select an existing serverless resource group.

    • Parent Nodes: Select Add Root Node.

    • Output Parameters: In the Input and Output Parameters section, click Add assignment parameter on the right side of Output Parameters to add the outputs parameter of the SQL query node. The outputs parameter is used as the input parameter of the data push node.image

  3. In the top toolbar, click the image icon to save the configurations.

Step 3: Configure the data push node

You can configure the outputs parameters of the SalesAmountPreDay and SalesGrowthPreDay nodes as the input parameters of the data push node in the Input and Output Parameters section. Then, the data push node uses the input parameters to push data to specific destinations.

  1. Double-click the CombinedPush node. In the right-side navigation pane of the configuration tab of the node, click the Properties tab. On the Properties tab, configure parameters. The following table describes the parameters that you must configure.

    Section and parameter

    Description

    Screenshot

    Scheduling Parameter

    Parameter Name

    Set the value to curdate.

    image

    Parameter Value

    Set the value to $[yyyymmddhh:mi:ss].

    Schedule

    Scheduling Cycle

    Set the value to Day.

    image

    Scheduled time

    Set the value to 08:00.

    Note

    In this example, make sure that data can be pushed to specific destinations at 08:00 every day. You can configure this parameter based on your business requirements.

    Rerun

    Set the value to Allow Regardless of Running Status.

    Resource Group

    Resource Group

    Select an existing resource group for scheduling.

    Note

    The first time you use a data push node, you must submit a ticket to upgrade your resource group for scheduling.

    image

    Input and Output Parameters

    Input Parameters

    Click Create to add an input parameter.

    Parameter 1:

    • Parameter Name: inputs1.

    • Value Source: Select the outputs parameter of the SalesAmountPreDay node from the drop-down list.

    Parameter 2:

    • Parameter Name: inputs2.

    • Value Source: Select the outputs parameter of the SalesGrowthPreDay from the drop-down list.

    image

  2. Configure the data to push.

    • Destination: Select a destination from the Destination drop-down list. If no destination is available, click Create Destination to create a destination. The following table describes the parameters for creating a destination.

      Parameter

      Description

      Type

      Select a push channel. DingTalk, Lark, WeCom, and Teams are supported.

      Name

      Enter a name based on your business requirements.

      WebHook

      Enter the webhook URL of the push channel that you selected. You need to obtain the URL of the related push channel in the corresponding platform.

      Note
    • Title: Enter Sales amount and the sales amount growth for yesterday.

    • Body: Configure the content based on your business requirements. For more information, see the Configure the content to push section of the Data push topic.

      Note

      In the Body section, you can directly use the field names that are queried by the SQL query node as placeholders to obtain the input parameters of the data push node.

  3. In the top toolbar, click the image icon to save the configurations.

Step 4: Test the data push flow

After you configure the combined data push flow and the simple data push flow, you must test the combined data push flow before you commit the workflow and deploy nodes in the workflow.

  1. Double-click DataPushDemo workflow.

  2. On the configuration tab of the workflow, right-click the CombinedPush node and select Run Current Node and Its Descendant Nodes.

    Note

    If a node in the data push flow fails, right-click the node and select View Log to view the logs.

    image

MaxCompute data push

Step 1: Configure a data push flow

Double-click the DataPushDemo workflow. On the workflow canvas, connect the MaxComputeDataSync, MaxComputeDataQuery, and MaxComputeDataPush nodes to form a data push flow. The following figure shows the data push flow.

Step 2: Configure the batch synchronization node

You can configure the MaxComputeDataSync node to synchronize test data that is written to MySQL in the Prepare data section to the created MaxCompute data source for future use.

  1. Double-click the MaxComputeDataSync node. On the configuration tab of the node, configure parameters. The following table describes the parameters.

    Item

    Description

    Screenshot

    Source

    Source

    Select MySQL.

    image

    Data Source Name

    Select the MySQL data source that you added.

    Resource group

    Select a serverless resource group.

    Destination

    Destination

    Select MaxCompute(ODPS).

    Data Source Name

    Select the MaxCompute data source that is added to the workspace.

    After the configuration is complete, the system tests the connectivity between the data sources and the resource group. After the data sources pass the network connectivity test, click Next to configure the source and destination.

  2. Configure the source and destination.

    Section and parameter

    Description

    Screenshot

    Source

    Data source

    Take note of the following items:

    • Retain the default value MySQL.

    • Select the MySQL data source that you added.

    image

    Table

    Select the orders table.

    Data filtering

    Configure this parameter based on your business requirements. In this example, this parameter is left empty.

    Split key

    You can use a column in the source table as the shard key. We recommend that you use the primary key column or an indexed column.

    Data preview

    You can click Data preview to check whether data obtained from the MySQL data source meets your expectations.

    Destination

    Data source

    Take note of the following items:

    • Retain the default value MaxCompute.

    • Select the MaxCompute data source that is added to the workspace.

    image

    Tunnel Resource Group

    Retain the default value Shared transmission resources. The specified resources are used as the Tunnel quota.

    Table

    Click Generate Destination Table Schema to generate a destination table.

    Partition information

    If you want daily incremental data to be stored in the partition of the corresponding date, you can configure the pt parameter for synchronization of daily incremental data. For example, you can set pt to ${bizdate}.

    Write Mode

    Select Clean up existing data before writing (Insert Overwrite).

  3. Map source fields to destination fields whose names are the same as those of source fields.image

  4. Configure channel control policies.

    • Task Expected Maximum Concurrency: The actual number of parallel threads that are used during data synchronization may be less than or equal to the specified threshold due to the specific characteristics of resources or tasks. You are charged for the resource group that you select based on the actual number of parallel threads that you use. In this example, set this parameter to 2.

    • Synchronization rate: Throttling can reduce the read and write workloads of the source and destination. If you do not enable throttling, the system processes nodes at the maximum transmission performance available in the existing hardware environment. In this example, select No current limit.

    • Policy for Dirty Data Records: Select Disallow Dirty Data Records.

    • Distributed Execution: By default, this switch is turned off. You can turn on the switch only if the maximum number of parallel threads that you specified is greater than or equal to 8.

    image

  5. Configure the scheduling properties.

    In the right-side navigation pane of the configuration tab of the node, click the Properties tab. On the Properties tab, configure parameters.

    • Scheduling Parameter

      • Parameter Name: Set the value to bizdate.

      • Parameter Value: Set the value to $[yyyymmdd-1].

    • Scheduled time: Set the value to 08:00.

    • Rerun: Set this parameter to Allow Regardless of Running Status.

    • Resource Group: Select an existing resource group for scheduling.

    • Dependencies: Select Add Root Node to use the root node as the ancestor node of the batch synchronization node.

  6. In the top toolbar, click the image icon to save the configurations.

Step 3: Configure the assignment node

MaxCompute data sources do not allow you to use an ODPS SQL node to query data and use output parameters to send data to a data push node. You can configure the assignment node to query MaxCompute data and add the outputs parameter in the Input and Output Parameters section to pass the query results to the data push node.

  1. Double-click the MaxComputeDataQuery node.

    1. On the configuration tab of the node, select ODPS SQL from the Language drop-down list.

    2. Write code for the node. Sample code:

      -- Create a subquery to rank the sales amounts in each partition in descending order. 
      -- Call the DENSE_RANK() function to assign a rank to each row in a partition. If multiple rows in a partition have identical sales amounts, the DENSE_RANK() function assigns the same rank to the rows. The ranking numbers are consecutive. 
      --
      -- Subquery:
      -- 1. Select the following columns from the orders table: order_id, category, sales, datetime, and pt. 
      -- 2. Call the DENSE_RANK() OVER (PARTITION BY pt ORDER BY sales DESC) function.
         -- PARTITION BY pt: Partitions data records based on the pt field. The subquery ranks data records in each partition specified the pt field. 
         -- ORDER BY sales DESC: Ranks data records in each partition based on the value of the sales field in descending order. 
         -- The rank column stores the rank of the sales amount for each row in each partition. 
      --
      -- Main query:
      -- Query the data records whose sales amount ranks the top three from the query results of the subquery. 
      -- This means that the data records whose sales amount ranks the top three for each data timestamp are queried. 
      
      SELECT
        order_id,          -- The order ID.
        category,          -- The product category.
        sales,             -- The sales amount.
        datetime, -- The payment time of the order.
        pt -- The date timestamp.
      FROM (
          SELECT
            order_id,
            category,
            sales,
            datetime,
            pt,
            DENSE_RANK() OVER (PARTITION BY pt ORDER BY sales DESC) AS rank  -- Calculate the ranking of sales amounts in each partition.
          FROM orders
          WHERE pt = '${bizdate}'  -- Query data records based on the data timestamp, which is specified by ${bizdate}.
      ) AS ranked_orders
      WHERE rank <= 3  -- Retain the top three data records in each partition.
  2. In the right-side navigation pane of the configuration tab of the node, click the Properties tab. On the Properties tab, configure parameters.

    • Scheduled time: Set the value to 08:00.

    • Resource Group: Select an existing serverless resource group.

    • Dependent Upstream Node: Check whether the value of the Ancestor Node Name parameter is MaxComputeDataSync.

    • Output Parameters: In the Input and Output Parameters section, click Create on the right side of Output Parameters to add the outputs parameter of the assignment node. The outputs parameter is used as the input parameter of the data push node.

      image

  3. In the top toolbar, click the image icon to save the configurations.

Step 4: Configure the data push node

You can configure the outputs parameter of the assignment node as the input parameter of the data push node in the Input and Output Parameters section. Then, the data push node uses the input parameter to push data to specific destinations.

  1. Double-click the MaxCompueDataPush node. In the right-side navigation pane of the configuration tab of the node, click the Properties tab. On the Properties tab, configure parameters. The following table describes the parameters that you must configure.

    Section and parameter

    Description

    Screenshot

    Scheduling Parameter

    Parameter Name

    Set the value to curdate.

    image

    Parameter Value

    Set the value to $[yyyymmddhh:mi:ss].

    Schedule

    Scheduling Cycle

    Set the value to Day.

    image

    Scheduled time

    Set the value to 08:00.

    Note

    In this example, make sure that data can be pushed to specific destinations at 08:00 every day. You can configure this parameter based on your business requirements.

    Rerun

    Set the value to Allow Regardless of Running Status.

    Resource Group

    Resource Group

    You must select a serverless resource group that was created after June 28, 2024, which is the release date of the data push feature. If the resource group was created before June 28, 2024, submit a ticket to upgrade the resource group.

    Note

    For more information about release notes, see Release records.

    image

    Input and Output Parameters

    Input Parameters

    Click Create to add an input parameter.

    • Click Create. Parameter Name: inputs.

    • Value Source: Select the outputs parameter of the MaxComputeDataQuery node from the drop-down list.

    image

  2. Configure the data to push.

    • Destination: Select a destination from the Destination drop-down list. If no destination is available, click Create Destination to create a destination. The following table describes the parameters for creating a destination.

      Parameter

      Description

      Type

      Select a push channel. DingTalk, Lark, WeCom, and Teams are supported.

      Name

      Enter a name based on your business requirements.

      WebHook

      Enter the webhook URL of the push channel that you selected. You need to obtain the URL of the related push channel in the corresponding platform.

      Note
    • Title: Enter MaxCompute data.

    • Body: Configure the content based on your business requirements. For more information, see the Configure the content to push section of the Data push topic.

      Note

      In the Body section, you can directly use the field names that are queried by the assignment node as placeholders to obtain the input parameters of the data push node.

  3. In the top toolbar, click the image icon to save the configurations.

Step 5: Test the flow

After you configure the data push flow, you must test the flow before you commit the workflow and deploy nodes in the workflow.

  1. Double-click the DataPushDemo workflow.

  2. Right-click the MaxComputeDataSync node and select Run Current Node and Its Descendant Nodes.

    Note

    If a node in the data push flow fails, right-click the node and select View Log to view the logs.

    image

Commit the workflow and deploy nodes in the workflow

After you configure all data push flows in the workflow, double-click the DataPushDemo workflow. On the configuration tab of the workflow, test all the data push flows. If all the data push flows are run as expected, commit the workflow and deploy nodes in the workflow.

  1. On the configuration tab of the DataPushDemo workflow, click the image icon to run the workflow.

  2. After image appears next to all nodes in the workflow, click the image icon to commit the DataPushDemo workflow.

  3. In the Commit dialog box, select the nodes that you want to commit, enter a description, and then select Ignore I/O Inconsistency Alerts.

  4. Click Confirm.

  5. Deploy nodes. For more information, see Deploy nodes.

What to do next

The workflow is run based on the specified scheduling cycle. You can perform various O&M operations on the deployed nodes in Operation Center. For more information, see Perform basic O&M operations on auto triggered nodes.