DataWorks DataStudio allows you to create data push nodes in a workflow to push data to specific destinations based on different push methods. DataStudio provides the following push methods: simple data push, combined data push, script data push, conditional data push, and MaxCompute data push. This topic describes how to configure data push nodes in a workflow to push data based on the selected push method.
Background information
In DataStudio, you can create data push nodes in a workflow. The data push nodes obtain the results of simple queries on data processed by other nodes in the workflow and push the obtained data to DingTalk groups, Lark groups, WeCom groups, or Microsoft Teams based on specific scheduling settings in a timely manner.
Process
Create a node to prepare test data for different data push flows.
Create data query nodes, assignment nodes, and other data processing nodes to process and query test data.
NoteYou can create different types of nodes to prepare test data, process data, and query data based on your business requirements. In the example in this topic, MySQL nodes are used.
Create data push nodes to receive the output parameters of data query nodes. Then, use the input parameters of the data push nodes to push the obtained data to DingTalk groups, Lark groups, WeCom groups, or Microsoft Teams.
Push methods
This topic provides the following push methods to allow you to configure data push nodes in a workflow and push data: simple data push, combined data push, script data push, conditional data push, and MaxCompute data push.
Simple data push: In a simple data push flow, a node queries data and uses the output parameters to pass the query results to a data push node. Then, the data push node pushes the data to specific destinations.
Combined data push: In a combined data push flow, multiple nodes query data and use the output parameters to pass the query results to the same data push node. Then, the data push node pushes the data to specific destinations.
Script data push: In a script data push flow, an assignment node processes data and uses the output parameters to pass the results to a data push node. Then, the data push node pushes the data to specific destinations. For more information about assignment nodes, see Configure an assignment node.
Conditional data push: In a conditional data push flow, a branch node performs a logical judgment and passes data that meets specific conditions and data that does not meet the conditions to different data query nodes. The data query nodes query the data and use the output parameters to pass the results to different data push nodes. Then, the data push nodes push the data to specific destinations. For more information about branch nodes, see Branch node.
MaxCompute data push: In a MaxCompute data push flow, an assignment node queries data from a MaxCompute data source and passes the query results to a data push node. Then, the data push node pushes data to specific destinations.
Prerequisites
Make sure that the following requirements are met:
DataWorks is activated. For more information, see Activate DataWorks.
A DataWorks workspace is created. For more information, see Create a workspace.
An ApsaraDB RDS for MySQL instance is created in the ApsaraDB RDS console, and a MySQL data source is added to the workspace. For more information, see Add and manage data sources.
A MaxCompute data source is added to the workspace. For more information, see Add a MaxCompute data source.
NoteIn this topic, a MySQL data source and a MaxCompute data source are used. You can add different types of data sources based on your business requirements.
Limits
Limits on the data size:
If you want to push data to DingTalk, the data size cannot exceed 20 KB.
If you want to push data to Lark, the data size cannot exceed 30 KB, and the size of an image must be less than 10 MB.
If you want to push data to WeCom, each chatbot can send a maximum of 20 messages per minute.
If you want to push data to Microsoft Teams, the data size cannot exceed 28 KB.
NoteThe tables that you add by using Markdown cannot be displayed as expected on the mobile clients of DingTalk and WeCom. We recommend that you select Table in the Body section to add tables to push the content that you want to push. The tables that you add by using Markdown can be displayed as expected on the mobile clients of Lark and Microsoft Teams
The data push feature is available only in DataWorks workspaces in the following regions: China (Hangzhou), China (Shanghai), China (Beijing), China (Shenzhen), China (Chengdu), China (Hong Kong), Singapore, Malaysia (Kuala Lumpur), US (Silicon Valley), US (Virginia), and Germany (Frankfurt).
If you want to push the data of a MaxCompute data source to specific destinations, you must purchase an exclusive resource group for scheduling. Serverless resource groups are not supported.
Preparations
Create a workflow
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to Data Development.
In the Scheduled Workflow pane, right-click Business Flow and select Create Workflow. In the Create Workflow dialog box, configure the Workflow Name parameter based on your business requirements. In this example,
DataPushDemo
is used.
Create nodes in the workflow
Double-click the name of the DataPushDemo
workflow. On the configuration tab of the workflow, click the icon to create nodes based on the push method that you use. The following table describes the nodes that you must create for different push methods. To help you complete the practice, we recommend that you name the nodes with the node names that are described in the following table.
Push method | Node name | Node type | Node description |
| MySQL node | Queries the total sales amount of the previous month in the test data and uses the output parameters to pass the query results to the branch node. | |
| Branch node | Receives the output parameters of the MySQL node, performs a logical judgment, and uses the output parameters of the branch node to pass data that meets and does not meet specific conditions to corresponding MySQL nodes. | |
| MySQL node | Receives the output parameters that meet specific conditions of the branch node, queries data that meets specific conditions, and then uses the output parameters of the MySQL node to pass the query results to a data push node. | |
| Receives the output parameters that do not meet specific conditions of the branch node, queries data that does not meet specific conditions, and then uses the output parameters of the MySQL node to pass the query results to a data push node. | ||
| Data push node | Receives the output parameters that meet specific conditions of the MySQL node and uses the input parameters of the data push node to push compliant data to specific destinations. | |
| Receives the output parameters that do not meet specific conditions of the MySQL node and uses the input parameters of the data push node to push non-compliant data to specific destinations. | ||
| MySQL node | Queries the total sales amount of the top three categories of the previous week in the test data and uses the output parameters to pass the query results to the assignment node. | |
| Assignment node | Receives the output parameters of the MySQL node, lists the parameters, and then uses the output parameters of the assignment node to pass the results to a data push node. Select Python for Language when you configure the assignment node. | |
| Data push node | Receives the output parameters of the assignment node and uses the input parameters of the data push node to push data to specific destinations. | |
| MySQL node |
| |
| |||
| Data push node | Receives the output parameters of the MySQL nodes and uses the input parameters of the data push node to push data to specific destinations. | |
| MySQL node | Queries the total sales amount of the previous day in the test data and uses the output parameters to pass the query results to a data push node. | |
| Data push node | Receives the output parameters of the MySQL node and uses the input parameters of the data push node to push data to specific destinations. | |
| Batch synchronization | Synchronizes the data in a MySQL database to the MaxCompute data source. | |
| Assignment node | Queries data from the MaxCompute data source and uses the output parameters of the assignment node to pass the query results to a data push node. | |
| Data push node | Receives the output parameters of the assignment node and uses the input parameters of the data push node to push data to specific destinations. |
Prepare data
Configure data push flows in the workflow
This section describes how to configure data push flows in the workflow to push data to specific destinations by using the following push methods: conditional data push, script data push, combined data push, simple data push, and MaxCompute data push.
Conditional data push
Step 1: Configure a data push flow
Double-click the DataPushDemo
workflow. On the workflow canvas, connect the SalesAmountPreMonth
, Condition
, CompliantData
, Top3Categories
, NonCompliantData
, and Bottom3Categories
nodes to form a data push flow. The following figure shows the data push flow.
Step 2: Configure the SQL query node
You can configure the SQL query node to query test data and add the outputs
parameter in the Input and Output Parameters section to pass the query results to the branch node.
Double-click the
SalesAmountPreMonth
node. On the configuration tab of the node, write code for the node. Sample code:-- Query the total sales amount of the previous month. SELECT SUM(sales) AS sales_amount FROM orders WHERE datetime BETWEEN DATE_FORMAT(CURRENT_DATE - INTERVAL 1 MONTH, '%Y-%m-01 00:00:00') AND DATE_FORMAT(LAST_DAY(CURRENT_DATE - INTERVAL 1 MONTH), '%Y-%m-%d 23:59:59');
In the right-side navigation pane of the configuration tab of the node, click the Properties tab. On the Properties tab, configure parameters.
Scheduled time: Set the value to
08:00
.Resource Group: Select an existing serverless resource group.
Parent Nodes: Select Add Root Node.
Output Parameters: In the Input and Output Parameters section, click Add assignment parameter on the right side of Output Parameters to add the outputs parameter of the SQL query node. The outputs parameter is used as the input parameter of the data push node.
Step 3: Configure the branch node
You can configure the outputs
parameter of the SQL query node as the input parameter of the branch node in the Input and Output Parameters section. After the branch node performs a logical judgment, the branch node uses its outputs parameter to pass the query results to the child nodes of the branch node.
Double-click the
Condition
node. In the Definition section on the configuration tab of the node, click Add Branch. In the Branch Definition dialog box, configure the parameters. The following table describes the parameters that you must configure.Parameter
Child node that receives compliant data
Child node that receives non-compliant data
Condition
${inputs[0][0]}>=500000
${inputs[0][0]}<500000
Associated Node Output
Compliant
Non-compliant
Description
Sales amount that meets specific conditions
Sales amount that does not meet specific conditions
Note[0][0] is a two-dimensional array that specifies the data in the outputs parameter of the ancestor node of the branch node.
If the ancestor node is an SQL query node, a two-dimensional array is used to specify the data in the outputs parameter of the ancestor node of the branch node.
If the ancestor node is a Python node, a one-dimensional array is used to specify the data in the outputs parameter of the ancestor node of the branch node.
In the right-side navigation pane of the configuration tab of the branch node, click the Properties tab. On the Properties tab, configure parameters. The following table describes the parameters that you must configure.
Section and parameter
Description
Schedule
Scheduling Cycle
Set the value to
Day
.Scheduled time
Set the value to
08:00
.Rerun
Set the value to Allow Regardless of Running Status.
Resource Group
Resource Group
Select an existing resource group for scheduling.
NoteThe first time you use a data push node, you must submit a ticket to upgrade your resource group for scheduling.
Output Name of Current Node
After you configure the branches, the output names are automatically parsed and displayed. The output name is the same as the value that you set for the Associated Node Output parameter when you added the branch.
Input and Output Parameters
Input Parameters
Click Create. Parameter Name:
inputs
.Value Source: Select the
outputs
parameter of theSalesAmountPreMonth
node.Output Parameters
By default, the system adds the
outputs
parameter.In the top toolbar, click the icon to save the configurations.
Step 4: Configure the child nodes of the branch node
In this example, the CompliantData
and NonCompliantData
nodes are child nodes of the branch node. After the branch node passes the query results to its child nodes by using the outputs parameter, you can add the outputs parameter in the Input and Output Parameters section for each child node to pass the query results to the corresponding data push node.
Double-click the
CompliantData
andNonCompliantData
nodes respectively. On the configuration tab of each node, write code for the node.Sample code for the CompliantData node
SET @all_cat_sales_volume_month := 0.0; SELECT SUM(sales) INTO @all_cat_sales_volume_month FROM orders WHERE datetime BETWEEN DATE_FORMAT(CURRENT_DATE - INTERVAL 1 MONTH, '%Y-%m-01 00:00:00') AND DATE_FORMAT(LAST_DAY(CURRENT_DATE - INTERVAL 1 MONTH), '%Y-%m-%d 23:59:59'); -- Create a temporary table. CREATE TEMPORARY TABLE IF NOT EXISTS temp_array ( category VARCHAR(255), sales DOUBLE, all_cat_sales_volume_month DOUBLE ); -- Query data and write the data to the temporary table. INSERT INTO temp_array (category, sales, all_cat_sales_volume_month) SELECT category, SUM(sales) AS amount, @all_cat_sales_volume_month FROM orders WHERE datetime BETWEEN DATE_FORMAT(CURRENT_DATE - INTERVAL 1 MONTH, '%Y-%m-01 00:00:00') AND DATE_FORMAT(LAST_DAY(CURRENT_DATE - INTERVAL 1 MONTH), '%Y-%m-%d 23:59:59') GROUP BY category ORDER BY amount DESC limit 3; -- Query the top three categories whose sales amount meets specific conditions. SELECT category, sales, all_cat_sales_volume_month FROM temp_array;
Sample code for the NonCompliantData node
SET @all_cat_sales_volume_month := 0.0; SELECT SUM(sales) INTO @all_cat_sales_volume_month FROM orders WHERE datetime BETWEEN DATE_FORMAT(CURRENT_DATE - INTERVAL 1 MONTH, '%Y-%m-01 00:00:00') AND DATE_FORMAT(LAST_DAY(CURRENT_DATE - INTERVAL 1 MONTH), '%Y-%m-%d 23:59:59'); -- Create a temporary table. CREATE TEMPORARY TABLE IF NOT EXISTS temp_array ( category VARCHAR(255), sales DOUBLE, all_cat_sales_volume_month DOUBLE ); -- Query data and write the data to the temporary table. INSERT INTO temp_array (category, sales, all_cat_sales_volume_month) SELECT category, SUM(sales) AS amount, @all_cat_sales_volume_month FROM orders WHERE datetime BETWEEN DATE_FORMAT(CURRENT_DATE - INTERVAL 1 MONTH, '%Y-%m-01 00:00:00') AND DATE_FORMAT(LAST_DAY(CURRENT_DATE - INTERVAL 1 MONTH), '%Y-%m-%d 23:59:59') GROUP BY category ORDER BY amount ASC limit 3; -- Query the bottom three categories whose sales amount does not meet specific conditions. SELECT category, sales, all_cat_sales_volume_month FROM temp_array;
In the right-side navigation pane of the configuration tab of the node, click the Properties tab. On the Properties tab, configure parameters.
Scheduled time: Set the value to
08:00
.Resource Group: Select an existing serverless resource group.
Parent Nodes: Check whether the dependencies between the branch node and its child nodes are correct. The dependencies are configured when you configure the data push flow in Step 1.
CompliantData: In the Dependencies section, check whether the value in the Output Name of Ancestor Node column is Compliant.
NonCompliantData: In the Dependencies section, check whether the value in the Output Name of Ancestor Node column is Non-compliant.
Output Parameters: In the Input and Output Parameters section, click Add assignment parameter on the right side of Output Parameters to add the outputs parameter of the SQL query node. The outputs parameter is used as the input parameter of the data push node.
In the top toolbar, click the icon to save the configurations.
Step 5: Configure the data push nodes
You can add an input parameter to Input Parameters in the Input and Output Parameters section for each data push node to obtain the outputs
parameters of the CompliantData
and NonCompliantData
nodes. This way, the data push nodes can use the input parameters to push data to specific destinations.
Double-click the
Top3Categories
andBottom3Categories
nodes respectively. In the right-side navigation pane of the configuration tab of related node, click the Properties tab. On the Properties tab, configure parameters. The following table describes the parameters that you must configure.Section and parameter
Description
Screenshot
Scheduling Parameter
Parameter Name
Set the value to
curdate
.Parameter Value
Set the value to
$[yyyymmddhh:mi:ss]
.Schedule
Scheduling Cycle
Set the value to
Day
.Scheduled time
Set the value to
08:00
.NoteIn this example, make sure that data can be pushed to specific destinations at 08:00 every day. You can configure this parameter based on your business requirements.
Rerun
Set the value to Allow Regardless of Running Status.
Resource Group
Resource Group
Select an existing resource group for scheduling.
NoteThe first time you use a data push node, you must submit a ticket to upgrade your resource group for scheduling.
Input and Output Parameters
Input Parameters
Click Create to add an input parameter.
Parameter Name:
inputs
.Value Source:
Top3Categories
: Select theoutputs
parameter of theCompliantData
node from the drop-down list.Bottom3Categories
: Select theoutputs
parameter of theNonCompliantData
node from the drop-down list.
Top3Categories
Bottom3Categories
Configure the data to push.
Destination: Select a destination from the Destination drop-down list. If no destination is available, click Create Destination to create a destination. The following table describes the parameters for creating a destination.
Parameter
Description
Type
Select a push channel. DingTalk, Lark, WeCom, and Teams are supported.
Name
Enter a name based on your business requirements.
WebHook
Enter the webhook URL of the push channel that you selected. You need to obtain the URL of the related push channel in the corresponding platform.
NoteFor more information about how to obtain the webhook URL of Lark, see Use the webhook trigger.
For more information about how to obtain the webhook URL of Teams, see Creating Incoming Webhooks.
Title: Enter
Categories whose sales amount ranks the top three
for the Top3Categories node andCategories whose sales amount ranks the bottom three
for the Bottom3Categories node.Body: Configure the content based on your business requirements. For more information, see the Configure the content to push section of the Data push topic.
NoteIn the Body section, you can directly use the field names that are queried by the SQL query node as placeholders to obtain the input parameters of the data push node.
In the top toolbar, click the icon to save the configurations.
Step 6: Test the data push flow
After you configure the conditional data push flow, you must test the flow before you commit the workflow and deploy nodes in the workflow.
Double-click the
DataPushDemo
workflow.On the configuration tab of the workflow, right-click the
SalesAmountPreMonth
node and select Run Current Node and Its Descendant Nodes.NoteIf a node in the data push flow fails, right-click the node and select View Log to view the logs.
Script data push
Step 1: Configure a data push flow
Double-click the DataPushDemo
workflow. On the workflow canvas, connect the SalesAmountPreWeek
, Top3CategoryList
, and Top3CategoriesPreWeek
nodes to form a data push flow. The following figure shows the data push flow.
Step 2: Configure the SQL query node
You can configure the SQL query node to query test data and add the outputs
parameter in the Input and Output Parameters section to pass the query results of the SQL query node to the assignment node.
Double-click the
SalesAmountPreWeek
node. On the configuration tab of the node, write code for the node. Sample code:-- Query the total sales amount of the previous week. SELECT category, SUM(sales) AS amount FROM orders WHERE datetime BETWEEN DATE_FORMAT(DATE_SUB(CURDATE(), INTERVAL 1 WEEK), '%Y-%m-%d 00:00:00') AND DATE_FORMAT(DATE_SUB(CURDATE(), INTERVAL 1 DAY), '%Y-%m-%d 23:59:59') GROUP BY category ORDER BY amount DESC limit 3;
In the right-side navigation pane of the configuration tab of the node, click the Properties tab. On the Properties tab, configure parameters.
Scheduled time: Set the value to
08:00
.Resource Group: Select an existing serverless resource group.
Parent Nodes: Select Add Root Node.
Output Parameters: In the Input and Output Parameters section, click Add assignment parameter on the right side of Output Parameters to add the outputs parameter of the SQL query node. The outputs parameter is used as the input parameter of the data push node.
In the top toolbar, click the icon to save the configurations.
Step 3: Configure the assignment node
You can configure the outputs
parameter of the SQL query node as the input parameter of the assignment node in the Input and Output Parameters section. Then, reassign a value to the input parameter to generate the outputs parameter of the assignment node. The assignment node uses the outputs parameter to pass data to the data push node.
Double-click the
Top3CategoryList
node. On the configuration tab of the node, select Python for Language and write code for the node.def main(): From datetime import date today = date.today() formatted_date = today.strftime('%Y-%m-%d') msg = 'Stat date: ' + formatted_date + ' \\n\\n ' \ '- 1: ${inputs[0][0]}, sales: ${inputs[0][1]} \\n\\n ' \ '- 2: ${inputs[1][0]}, sales: ${inputs[1][1]} \\n\\n ' \ '- 3: ${inputs[2][0]}, sales: ${inputs[2][1]} \\n\\n ' print(msg) if __name__ == "__main__": import sys main()
In the right-side navigation pane of the configuration tab of the node, click the Properties tab. On the Properties tab, configure parameters.
Scheduled time: Set the value to
08:00
.Resource Group: Select an existing resource group for scheduling.
Input and Output Parameters
Input Parameters
Click Create. Parameter Name:
inputs
.Value Source: Select the outputs parameter of the
SalesAmountPreWeek
node from the drop-down list.
Output Parameters: By default, the system adds the outputs parameter.
In the top toolbar, click the icon to save the configurations.
Step 4: Configure the data push node
You can configure the outputs
parameter of the assignment node as the input parameter of the data push node in the Input and Output Parameters section. Then, the data push node uses the input parameter to push data to specific destinations.
Double-click the
Top3CategoriesPreWeek
node. In the right-side navigation pane of the configuration tab of the node, click the Properties tab. On the Properties tab, configure parameters. The following table describes the parameters that you must configure.Section and parameter
Description
Screenshot
Scheduling Parameter
Parameter Name
Set the value to
curdate
.Parameter Value
Set the value to
$[yyyymmddhh:mi:ss]
.Schedule
Scheduling Cycle
Set the value to
Day
.Scheduled time
Set the value to
08:00
.NoteIn this example, make sure that data can be pushed to specific destinations at 08:00 every day. You can configure this parameter based on your business requirements.
Rerun
Set the value to Allow Regardless of Running Status.
Resource Group
Resource Group
Select an existing resource group for scheduling.
NoteThe first time you use a data push node, you must submit a ticket to upgrade your resource group for scheduling.
Input and Output Parameters
Input Parameters
Click Create to add an input parameter.
Parameter Name:
inputs
.Value Source: Select the
outputs
parameter of theTop3CategoryList
node from the drop-down list.Configure the data to push.
Destination: Select a destination from the Destination drop-down list. If no destination is available, click Create Destination to create a destination. The following table describes the parameters for creating a destination.
Parameter
Description
Type
Select a push channel. DingTalk, Lark, WeCom, and Teams are supported.
Name
Enter a name based on your business requirements.
WebHook
Enter the webhook URL of the push channel that you selected. You need to obtain the URL of the related push channel in the corresponding platform.
NoteFor more information about how to obtain the webhook URL of Lark, see Use the webhook trigger.
For more information about how to obtain the webhook URL of Teams, see Creating Incoming Webhooks.
Title: Enter
Categories whose sales amount ranks the top three in the previous week
.Body: Configure the content based on your business requirements. For more information, see the Configure the content to push section of the Data push topic.
NoteIn the Body section, you can directly use the field names that are queried by the SQL query node as placeholders to obtain the input parameters of the data push node.
In the top toolbar, click the icon to save the configurations.
Step 5: Test the data push flow
After you configure the script data push flow, you must test the flow before you commit the workflow and deploy nodes in the workflow.
Double-click
DataPushDemo
workflow.On the configuration tab of the workflow, right-click the
SalesAmountPreWeek
node and select Run Current Node and Its Descendant Nodes.NoteIf a node in the data push flow fails, right-click the node and select View Log to view the logs.
Simple data push
Step 1: Configure a data push flow
Double-click the DataPushDemo
workflow. On the workflow canvas, connect the SalesAmountPreDay
and PushSalesAmountPreDay
nodes to form a data push flow. The following figure shows the data push flow.
Step 2: Configure the SQL query node
You can configure the SQL query node to query test data and add the outputs
parameter in the Input and Output Parameters section to pass the query results of the SQL query node to the data push node.
Double-click the
SalesAmountPreDay
node. On the configuration tab of the node, write code for the node. Sample code:-- Create a temporary table named temp_array. CREATE TEMPORARY TABLE IF NOT EXISTS temp_array ( total_amount DOUBLE ); -- Write data related to the total sales amount for yesterday to the temp_array table. INSERT INTO temp_array (total_amount) SELECT SUM(sales) FROM orders WHERE datetime BETWEEN DATE_FORMAT(DATE_SUB(CURDATE(), INTERVAL 1 DAY), '%Y-%m-%d 00:00:00') AND DATE_FORMAT(DATE_SUB(CURDATE(), INTERVAL 1 DAY), '%Y-%m-%d 23:59:59'); -- Query the temp_array table. select total_amount FROM temp_array;
In the right-side navigation pane of the configuration tab of the node, click the Properties tab. On the Properties tab, configure parameters.
Scheduled time: Set the value to
08:00
.Resource Group: Select an existing serverless resource group.
Parent Nodes: Select Add Root Node.
Output Parameters: In the Input and Output Parameters section, click Add assignment parameter on the right side of Output Parameters to add the outputs parameter of the SQL query node. The outputs parameter is used as the input parameter of the data push node.
In the top toolbar, click the icon to save the configurations.
Step 3: Configure the data push node
You can configure the outputs
parameter of the SQL query node as the input parameter of the data push node in the Input and Output Parameters section. Then, the data push node uses the input parameter to push data to specific destinations.
Double-click the
PushSalesAmountPreDay
node. In the right-side navigation pane of the configuration tab of the node, click the Properties tab. On the Properties tab, configure parameters. The following table describes the parameters that you must configure.Section and parameter
Description
Screenshot
Scheduling Parameter
Parameter Name
Set the value to
curdate
.Parameter Value
Set the value to
$[yyyymmddhh:mi:ss]
.Schedule
Scheduling Cycle
Set the value to
Day
.Scheduled time
Set the value to
08:00
.NoteIn this example, make sure that data can be pushed to specific destinations at 08:00 every day. You can configure this parameter based on your business requirements.
Rerun
Set the value to Allow Regardless of Running Status.
Resource Group
Resource Group
Select an existing resource group for scheduling.
NoteThe first time you use a data push node, you must submit a ticket to upgrade your resource group for scheduling.
Input and Output Parameters
Input Parameters
Click Create to add an input parameter.
Parameter Name:
inputs
.Value Source: Select the
outputs
parameter of theSalesAmountPreDay
node from the drop-down list.Configure the data to push.
Destination: Select a destination from the Destination drop-down list. If no destination is available, click Create Destination to create a destination. The following table describes the parameters for creating a destination.
Parameter
Description
Type
Select a push channel. DingTalk, Lark, WeCom, and Teams are supported.
Name
Enter a name based on your business requirements.
WebHook
Enter the webhook URL of the push channel that you selected. You need to obtain the URL of the related push channel in the corresponding platform.
NoteFor more information about how to obtain the webhook URL of Lark, see Use the webhook trigger.
For more information about how to obtain the webhook URL of Teams, see Creating Incoming Webhooks.
Title: Enter
Total sales amount for yesterday
.Body: Configure the content based on your business requirements. For more information, see the Configure the content to push section of the Data push topic.
NoteIn the Body section, you can directly use the field names that are queried by the SQL query node as placeholders to obtain the input parameters of the data push node.
In the top toolbar, click the icon to save the configurations.
Step 4: Test the flow
After you configure the simple data push flow, you must test the flow before you commit the workflow and deploy nodes in the workflow.
Double-click
DataPushDemo
workflow.On the configuration tab of the workflow, right-click the
SalesAmountPreDay
node and select Run Current Node and Its Descendant Nodes.NoteIf a node in the data push flow fails, right-click the node and select View Log to view the logs.
Combined data push
Step 1: Configure a data push flow
Double-click the DataPushDemo
workflow. On the workflow canvas, connect the SalesAmountPreDay
, SalesGrowthPreDay
, and CombinedPush
nodes to form a data push flow. The following figure shows the data push flow.
The SalesAmountPreDay node can be used in both the combined data push method and the simple data push method.
Step 2: Configure the SQL query node
You can configure each SQL query node to query test data and use the outputs
parameter in the Input and Output Parameters section to pass the query results of the SQL query node to the data push node.
Click the
SalesGrowthPreDay
node. On the configuration tab of the node, write code for the node. Sample code:-- Create a table named temp_array1 to collect the total sales amount of the day before yesterday. CREATE TEMPORARY TABLE IF NOT EXISTS temp_array1 ( category VARCHAR(255), sales DOUBLE ); -- Write data of the day before yesterday to the temp_array1 table. INSERT INTO temp_array1 (category, sales) SELECT category, SUM(sales) FROM orders WHERE datetime BETWEEN DATE_FORMAT(DATE_SUB(CURDATE(), INTERVAL 2 DAY), '%Y-%m-%d 00:00:00') AND DATE_FORMAT(DATE_SUB(CURDATE(), INTERVAL 2 DAY), '%Y-%m-%d 23:59:59') GROUP BY category; -- Create a table named temp_array2 to collect the total sales amount for yesterday. CREATE TEMPORARY TABLE IF NOT EXISTS temp_array2 ( category VARCHAR(255), sales DOUBLE ); -- Write data for yesterday to the temp_array2 table. INSERT INTO temp_array2 (category, sales) SELECT category, SUM(sales) FROM orders WHERE datetime BETWEEN DATE_FORMAT(DATE_SUB(CURDATE(), INTERVAL 1 DAY), '%Y-%m-%d 00:00:00') AND DATE_FORMAT(DATE_SUB(CURDATE(), INTERVAL 1 DAY), '%Y-%m-%d 23:59:59') GROUP BY category; -- Create a table named result to collect the sales growth amount for yesterday. CREATE TEMPORARY TABLE IF NOT EXISTS result ( category VARCHAR(255), diff DOUBLE ); -- Write sales growth data to the result table. INSERT INTO result (category, diff) SELECT temp_array2.category AS category, temp_array2.sales - temp_array1.sales AS diff FROM temp_array1 LEFT JOIN temp_array2 ON temp_array1.category = temp_array2.category; -- Query data from the result table. SELECT category, diff FROM result;
In the right-side navigation pane of the configuration tab of the node, click the Properties tab. On the Properties tab, configure parameters.
Scheduled time: Set the value to
08:00
.Resource Group: Select an existing serverless resource group.
Parent Nodes: Select Add Root Node.
Output Parameters: In the Input and Output Parameters section, click Add assignment parameter on the right side of Output Parameters to add the outputs parameter of the SQL query node. The outputs parameter is used as the input parameter of the data push node.
In the top toolbar, click the icon to save the configurations.
Step 3: Configure the data push node
You can configure the outputs parameters of the SalesAmountPreDay
and SalesGrowthPreDay
nodes as the input parameters of the data push node in the Input and Output Parameters section. Then, the data push node uses the input parameters to push data to specific destinations.
Double-click the
CombinedPush
node. In the right-side navigation pane of the configuration tab of the node, click the Properties tab. On the Properties tab, configure parameters. The following table describes the parameters that you must configure.Section and parameter
Description
Screenshot
Scheduling Parameter
Parameter Name
Set the value to
curdate
.Parameter Value
Set the value to
$[yyyymmddhh:mi:ss]
.Schedule
Scheduling Cycle
Set the value to
Day
.Scheduled time
Set the value to
08:00
.NoteIn this example, make sure that data can be pushed to specific destinations at 08:00 every day. You can configure this parameter based on your business requirements.
Rerun
Set the value to Allow Regardless of Running Status.
Resource Group
Resource Group
Select an existing resource group for scheduling.
NoteThe first time you use a data push node, you must submit a ticket to upgrade your resource group for scheduling.
Input and Output Parameters
Input Parameters
Click Create to add an input parameter.
Parameter 1:
Parameter Name:
inputs1
.Value Source: Select the
outputs
parameter of theSalesAmountPreDay
node from the drop-down list.
Parameter 2:
Parameter Name:
inputs2
.Value Source: Select the
outputs
parameter of theSalesGrowthPreDay
from the drop-down list.
Configure the data to push.
Destination: Select a destination from the Destination drop-down list. If no destination is available, click Create Destination to create a destination. The following table describes the parameters for creating a destination.
Parameter
Description
Type
Select a push channel. DingTalk, Lark, WeCom, and Teams are supported.
Name
Enter a name based on your business requirements.
WebHook
Enter the webhook URL of the push channel that you selected. You need to obtain the URL of the related push channel in the corresponding platform.
NoteFor more information about how to obtain the webhook URL of Lark, see Use the webhook trigger.
For more information about how to obtain the webhook URL of Teams, see Creating Incoming Webhooks.
Title: Enter
Sales amount and the sales amount growth for yesterday
.Body: Configure the content based on your business requirements. For more information, see the Configure the content to push section of the Data push topic.
NoteIn the Body section, you can directly use the field names that are queried by the SQL query node as placeholders to obtain the input parameters of the data push node.
In the top toolbar, click the icon to save the configurations.
Step 4: Test the data push flow
After you configure the combined data push flow and the simple data push flow, you must test the combined data push flow before you commit the workflow and deploy nodes in the workflow.
Double-click
DataPushDemo
workflow.On the configuration tab of the workflow, right-click the
CombinedPush
node and select Run Current Node and Its Descendant Nodes.NoteIf a node in the data push flow fails, right-click the node and select View Log to view the logs.
MaxCompute data push
Step 1: Configure a data push flow
Double-click the DataPushDemo
workflow. On the workflow canvas, connect the MaxComputeDataSync
, MaxComputeDataQuery
, and MaxComputeDataPush
nodes to form a data push flow. The following figure shows the data push flow.
Step 2: Configure the batch synchronization node
You can configure the MaxComputeDataSync node to synchronize test data that is written to MySQL in the Prepare data section to the created MaxCompute data source for future use.
Double-click the
MaxComputeDataSync
node. On the configuration tab of the node, configure parameters. The following table describes the parameters.Item
Description
Screenshot
Source
Source
Select MySQL.
Data Source Name
Select the MySQL data source that you added.
Resource group
Select a serverless resource group.
Destination
Destination
Select MaxCompute(ODPS).
Data Source Name
Select the MaxCompute data source that is added to the workspace.
After the configuration is complete, the system tests the connectivity between the data sources and the resource group. After the data sources pass the network connectivity test, click Next to configure the source and destination.
Configure the source and destination.
Section and parameter
Description
Screenshot
Source
Data source
Take note of the following items:
Retain the default value MySQL.
Select the MySQL data source that you added.
Table
Select the
orders
table.Data filtering
Configure this parameter based on your business requirements. In this example, this parameter is left empty.
Split key
You can use a column in the source table as the shard key. We recommend that you use the primary key column or an indexed column.
Data preview
You can click Data preview to check whether data obtained from the MySQL data source meets your expectations.
Destination
Data source
Take note of the following items:
Retain the default value MaxCompute.
Select the MaxCompute data source that is added to the workspace.
Tunnel Resource Group
Retain the default value Shared transmission resources. The specified resources are used as the Tunnel quota.
Table
Click Generate Destination Table Schema to generate a destination table.
Partition information
If you want daily incremental data to be stored in the partition of the corresponding date, you can configure the pt parameter for synchronization of daily incremental data. For example, you can set pt to ${bizdate}.
Write Mode
Select Clean up existing data before writing (Insert Overwrite).
Map source fields to destination fields whose names are the same as those of source fields.
Configure channel control policies.
Task Expected Maximum Concurrency: The actual number of parallel threads that are used during data synchronization may be less than or equal to the specified threshold due to the specific characteristics of resources or tasks. You are charged for the resource group that you select based on the actual number of parallel threads that you use. In this example, set this parameter to 2.
Synchronization rate: Throttling can reduce the read and write workloads of the source and destination. If you do not enable throttling, the system processes nodes at the maximum transmission performance available in the existing hardware environment. In this example, select No current limit.
Policy for Dirty Data Records: Select Disallow Dirty Data Records.
Distributed Execution: By default, this switch is turned off. You can turn on the switch only if the maximum number of parallel threads that you specified is greater than or equal to 8.
Configure the scheduling properties.
In the right-side navigation pane of the configuration tab of the node, click the Properties tab. On the Properties tab, configure parameters.
Scheduling Parameter
Parameter Name: Set the value to
bizdate
.Parameter Value: Set the value to
$[yyyymmdd-1]
.
Scheduled time: Set the value to
08:00
.Rerun: Set this parameter to Allow Regardless of Running Status.
Resource Group: Select an existing resource group for scheduling.
Dependencies: Select Add Root Node to use the root node as the ancestor node of the batch synchronization node.
In the top toolbar, click the icon to save the configurations.
Step 3: Configure the assignment node
MaxCompute data sources do not allow you to use an ODPS SQL node to query data and use output parameters to send data to a data push node. You can configure the assignment node to query MaxCompute data and add the outputs parameter in the Input and Output Parameters section to pass the query results to the data push node.
Double-click the MaxComputeDataQuery node.
On the configuration tab of the node, select ODPS SQL from the Language drop-down list.
Write code for the node. Sample code:
-- Create a subquery to rank the sales amounts in each partition in descending order. -- Call the DENSE_RANK() function to assign a rank to each row in a partition. If multiple rows in a partition have identical sales amounts, the DENSE_RANK() function assigns the same rank to the rows. The ranking numbers are consecutive. -- -- Subquery: -- 1. Select the following columns from the orders table: order_id, category, sales, datetime, and pt. -- 2. Call the DENSE_RANK() OVER (PARTITION BY pt ORDER BY sales DESC) function. -- PARTITION BY pt: Partitions data records based on the pt field. The subquery ranks data records in each partition specified the pt field. -- ORDER BY sales DESC: Ranks data records in each partition based on the value of the sales field in descending order. -- The rank column stores the rank of the sales amount for each row in each partition. -- -- Main query: -- Query the data records whose sales amount ranks the top three from the query results of the subquery. -- This means that the data records whose sales amount ranks the top three for each data timestamp are queried. SELECT order_id, -- The order ID. category, -- The product category. sales, -- The sales amount. datetime, -- The payment time of the order. pt -- The date timestamp. FROM ( SELECT order_id, category, sales, datetime, pt, DENSE_RANK() OVER (PARTITION BY pt ORDER BY sales DESC) AS rank -- Calculate the ranking of sales amounts in each partition. FROM orders WHERE pt = '${bizdate}' -- Query data records based on the data timestamp, which is specified by ${bizdate}. ) AS ranked_orders WHERE rank <= 3 -- Retain the top three data records in each partition.
In the right-side navigation pane of the configuration tab of the node, click the Properties tab. On the Properties tab, configure parameters.
Scheduled time: Set the value to
08:00
.Resource Group: Select an existing serverless resource group.
Dependent Upstream Node: Check whether the value of the Ancestor Node Name parameter is
MaxComputeDataSync
.Output Parameters: In the Input and Output Parameters section, click Create on the right side of Output Parameters to add the outputs parameter of the assignment node. The outputs parameter is used as the input parameter of the data push node.
In the top toolbar, click the icon to save the configurations.
Step 4: Configure the data push node
You can configure the outputs
parameter of the assignment node as the input parameter of the data push node in the Input and Output Parameters section. Then, the data push node uses the input parameter to push data to specific destinations.
Double-click the
MaxCompueDataPush
node. In the right-side navigation pane of the configuration tab of the node, click the Properties tab. On the Properties tab, configure parameters. The following table describes the parameters that you must configure.Section and parameter
Description
Screenshot
Scheduling Parameter
Parameter Name
Set the value to
curdate
.Parameter Value
Set the value to
$[yyyymmddhh:mi:ss]
.Schedule
Scheduling Cycle
Set the value to
Day
.Scheduled time
Set the value to
08:00
.NoteIn this example, make sure that data can be pushed to specific destinations at 08:00 every day. You can configure this parameter based on your business requirements.
Rerun
Set the value to Allow Regardless of Running Status.
Resource Group
Resource Group
You must select a serverless resource group that was created after June 28, 2024, which is the release date of the data push feature. If the resource group was created before June 28, 2024, submit a ticket to upgrade the resource group.
NoteFor more information about release notes, see Release records.
Input and Output Parameters
Input Parameters
Click Create to add an input parameter.
Click Create. Parameter Name:
inputs
.Value Source: Select the outputs parameter of the
MaxComputeDataQuery
node from the drop-down list.
Configure the data to push.
Destination: Select a destination from the Destination drop-down list. If no destination is available, click Create Destination to create a destination. The following table describes the parameters for creating a destination.
Parameter
Description
Type
Select a push channel. DingTalk, Lark, WeCom, and Teams are supported.
Name
Enter a name based on your business requirements.
WebHook
Enter the webhook URL of the push channel that you selected. You need to obtain the URL of the related push channel in the corresponding platform.
NoteFor more information about how to obtain the webhook URL of Lark, see Use the webhook trigger.
For more information about how to obtain the webhook URL of Teams, see Creating Incoming Webhooks.
Title: Enter
MaxCompute data
.Body: Configure the content based on your business requirements. For more information, see the Configure the content to push section of the Data push topic.
NoteIn the Body section, you can directly use the field names that are queried by the assignment node as placeholders to obtain the input parameters of the data push node.
In the top toolbar, click the icon to save the configurations.
Step 5: Test the flow
After you configure the data push flow, you must test the flow before you commit the workflow and deploy nodes in the workflow.
Double-click the
DataPushDemo
workflow.Right-click the
MaxComputeDataSync
node and select Run Current Node and Its Descendant Nodes.NoteIf a node in the data push flow fails, right-click the node and select View Log to view the logs.
Commit the workflow and deploy nodes in the workflow
After you configure all data push flows in the workflow, double-click the DataPushDemo
workflow. On the configuration tab of the workflow, test all the data push flows. If all the data push flows are run as expected, commit the workflow and deploy nodes in the workflow.
On the configuration tab of the DataPushDemo workflow, click the icon to run the workflow.
After appears next to all nodes in the workflow, click the icon to commit the DataPushDemo workflow.
In the Commit dialog box, select the nodes that you want to commit, enter a description, and then select Ignore I/O Inconsistency Alerts.
Click Confirm.
Deploy nodes. For more information, see Deploy nodes.
What to do next
The workflow is run based on the specified scheduling cycle. You can perform various O&M operations on the deployed nodes in Operation Center. For more information, see Perform basic O&M operations on auto triggered nodes.