DataWorks DataService Studio provides the data push feature. This feature allows you to execute SQL statements to query data from a specific data source and push data to a webhook URL. You can perform simple configurations to periodically push business data to different webhook URLs. This topic describes how to configure and use the data push feature.
Overview
The data push feature allows you to create data push tasks. You can write SQL statements for single- or multi-table queries in a data push task to define the data that you want to push and organize the data by using rich text or tables. You can configure a scheduling cycle to periodically push data to destination webhook URLs.
Supported data source types and push channels
Supported data source types:
MySQL (compatible with StarRocks and Doris)
PostgreSQL (compatible with Snowflake and Redshift)
Hologres
MaxCompute
ClickHouse
Push channels: DingTalk, Lark, WeCom, and Microsoft Teams.
Limits
Limits on the data size:
If you want to push data to DingTalk, the data size cannot exceed 20 KB.
If you want to push data to Lark, the data size cannot exceed 30 KB, and the size of an image must be less than 10 MB.
If you want to push data to WeCom, each chatbot can send a maximum of 20 messages every minute.
If you want to push data to Microsoft Teams, the data size cannot exceed 28 KB.
NoteThe tables that are defined by using Markdown in the content that you want to push cannot be displayed as expected on the mobile client of DingTalk and WeCom. We recommend that you select Table in the Body section to add tables to push the content that you want to push. The tables that are defined by using Markdown in the content that you want to push can be displayed as expected on the mobile clients of Lark and Microsoft Teams.
The data push feature is available only in DataWorks workspaces in the following regions: China (Hangzhou), China (Shanghai), China (Beijing), China (Shenzhen), China (Chengdu), Singapore, US (Silicon Valley), and US (Virginia).
Prerequisites
A data source is added in a supported region. For information about how to add a data source to DataWorks, see Add and manage data sources.
Procedure
Step 1: Create a data push task
Step 2: Configure the data push task
Step 3: Configure data push settings
Step 1: Create a data push task
Go to the DataService Studio page.
Log on to the DataWorks console. In the top navigation bar, select the region in which the desired data source resides. In the left-side navigation pane, choose Data Development and Governance > DataService Studio. On the DataService Studio page, select a workspace from the drop-down list and click Go to DataService Studio.
Create a data push task.
In the Service Development pane of the DataService Studio page, click Data Push. Move the pointer over the icon and select Create Data Push Task. In the Data Push dialog box, configure the Name parameter and click OK.
Step 2: Configure the data push task
(Optional) Preparations
The following sections describe how to query data from a MaxCompute table and push queried data to a specific destination. In this example, a table named sales
is used. You can run the data push task to push the daily sales amount of each department and the difference between the sales amount of the current day and that of the previous day to a destination by using a specific push channel. Before you perform subsequent operations, you must create a table named sales
in the desired environment. The following sample code provides an example on how to create a table named sales
and insert data into the table. For more information about how to create a MaxCompute table, see Create and manage MaxCompute tables.
CREATE TABLE IF NOT EXISTS sales (
id BIGINT COMMENT 'Unique identifier',
department STRING COMMENT 'Department name',
revenue DOUBLE COMMENT 'Sales amount'
) PARTITIONED BY (ds STRING);
-- Insert data into partitions.
INSERT INTO TABLE sales PARTITION(ds='20240101')(id, department, revenue ) VALUES (1, 'Department 1', 12000.00);
INSERT INTO TABLE sales PARTITION(ds='20240101')(id, department, revenue ) VALUES (2, 'Department 2', 21000.00);
INSERT INTO TABLE sales PARTITION(ds='20240101')(id, department, revenue ) VALUES (3, 'Department 3', 5000.00);
INSERT INTO TABLE sales PARTITION(ds='20240102')(id, department, revenue ) VALUES (1, 'Department 1', 11000.00);
INSERT INTO TABLE sales PARTITION(ds='20240102')(id, department, revenue ) VALUES (2, 'Department 2', 20000.00);
INSERT INTO TABLE sales PARTITION(ds='20240102')(id, department, revenue ) VALUES (3, 'Department 3', 10000.00);
Select a data source
Configure the Datasource Type, Datasource Name, and Datasource Env. parameters. You can configure the Datasource Env. parameter based on the environment of the table whose data you want to push. Before you perform subsequent operations, confirm the environment of the sales table that you created in the (Optional) Preparations section.
For information about the supported data source types, see Supported data source types and push channels.
Write SQL statements
Define the data that you want to push and query data.
In the Edit Query SQL section, write SQL statements for single- or multi-table queries to define the data that you want to push. Example:
-- Query the sales amount of each department on January 2, 2024. SELECT id, department, revenue FROM sales WHERE ds='20240102'; -- Query the difference between the sales amount of the current day and that of the previous day. SELECT a.revenue - b.revenue AS diff FROM sales a LEFT JOIN sales b ON a.id = b.id AND a.ds > b.ds WHERE a.ds = '20240102'AND b.ds = '20240101';
After you write the SQL statements, the response parameters are automatically displayed on the Output Parameters tab of the Parameters tab. If the parameters fail to be displayed or are invalid, you can turn off Automatically Parse Parameters and manually add parameters.
You can configure variables in the
${Variable name}
format in the SQL statements and use assignment parameters to assign values to the variables. The assignment parameters can be set to date and time expressions or constants. This way, the values of variables are dynamically replaced in the code of the data push task based on the configurations of the assignment parameters. For more information, see Configure the content to push.-- Dynamically assign values to the time variables based on scheduling parameters. -- Query the latest sales amount of each department every day. SELECT id, department, revenue FROM sales WHERE ds='${date}'; -- Query the difference between the sales amount of the current day and that of the previous day. SELECT a.revenue - b.revenue AS diff FROM sales a LEFT JOIN sales b ON a.id = b.id and a.ds > b.ds WHERE a.ds = '${date}' AND b.ds = '${previous_date}';
Perform segmented queries.
If a table contains a large amount of data, the data push feature allows you to perform segmented queries by using the Next Token method. To use the Next Token method, perform the following steps: In the upper-right corner of the Edit Query SQL section, select Code Template from the Code Help drop-down list. In the Code Template dialog box, click Insert Template Code in the Next Token section.
Configure the content to push
In the Content to Push section, configure the content that you want to push to specific webhook URLs by using Markdown or tables.
Configure the Title parameter. Click Add to the right of the Body parameter, select Markdown or Table, and then configure the content. After you configure the content that you want to push, you can click Preview to preview the content. Examples:
Markdown
Configure variables: Add assignment parameters and output parameters as variables in the
${Parameter name}
format to the rich text. When the data push task is run, the variables in the rich text are replaced with the values of the assignment parameters and the SQL query output.Assignment parameters: In the right-side navigation pane, click Parameters. On the Assignment Parameters tab, assign constants or date and time expressions (scheduling parameters) to the variables.
Output parameters: Output parameters indicate the queried data. For example,
A and B
in theSELECT A,B… FROM TABLE
statement are the output parameters.
Automatically @ specified users: If you want to push content to webhook URLs of Lark, you can perform related configurations to allow the system to automatically @ specified users.
By default, if you select Markdown, the content is configured by using rich text. If you want to push the content to webhook URLs of Lark, you can click the icon and use
<at id="all" />
or<at email="username@example.com" />
to @ specified users.
In Markdown mode, you can also perform other operations, such as inserting images and DingTalk emojis.
Table
Click Add Column to add a column. Then, select a parameter from the Parameters drop-down list to associate the parameter with the column.
If you want to push content to webhook URLs of Lark, you can perform the following steps to configure the display effect of the content that you want to push: Find the desired column and click the icon to the right of the column. In the Modify Field dialog box, configure the Field, Display Name, Display Style, and Condition parameters.
Field: You can select another field from the drop-down list.
Display Name: the name to be displayed in the table header when you push the content to office software.
Display Style: If you specify a prefix or a suffix, the prefix or suffix is added to the return values of the field.
Condition: You can configure this parameter to compare the return value of a field with a specified value. You can specify the color of the value that meets or does not meet the specified condition and select an additional unicode from the drop-down list.
NoteDifferent push channels support different tables and display effects.
DingTalk: You can select Markdown or Table to add a table. If you select Table, the configuration of the Display Style and Condition parameters in the Modify Field dialog box does not take effect. For the mobile client of DingTalk, tables cannot be displayed.
Lark: You can select Markdown or Table to add a table. The tables that are defined by using Markdown or Table can be displayed as expected.
WeCom: You can select Markdown to add a table. However, the tables that are defined by using Markdown cannot be displayed as expected.
Mobile client of Teams: You can select Markdown to add a table. The tables that are defined by using Markdown can be displayed as expected based on the configurations.
Step 3: Configure data push settings
Create a destination
Before you configure data push settings, you must perform the following steps to create a destination: In the lower-left corner of the DataService Studio page, click the icon. On the page that appears, click the Destination Management tab. Then, click Create Destination.
In the Create Destination dialog box, configure the following parameters:
Type: Select a push channel.
Destination Name: Enter a name for the destination.
Webhook: Enter the webhook URL of the push channel that you selected.
For more information about how to obtain the webhook URL of Lark, see Use the webhook trigger.
For more information about how to obtain the webhook URL of Teams, see Creating Incoming Webhooks.
Configure data push settings
In the right-side navigation pane of the data push task, click the Push Settings tab. On this tab, you can specify the scheduling cycle, the resource group for scheduling, and the destinations. This way, the system periodically runs the data push task to push data to specified destinations. Parameters:
Parameters related to the scheduling cycle: specify the scheduling cycle of the data push task and the specific point in time at which the data push task is scheduled.
Scheduling cycle
Specified time
Scheduling time
Example
Month
The days on which the data push task is run in each month.
The point in time when the data push task is scheduled to run.
Scheduling period: Month
Specified time: Monthly 1
Scheduling time: 08:00
Description: The data push task runs at 08:00 on the first day of each month.
Week
The days on which the data push task is run in each week.
The point in time when the data push task is scheduled to run.
Scheduling period: Week
Specified time: Monday
Scheduling time: 09:00
Description: The data push task runs at 09:00 every Monday.
Day
NoteThe data push task runs every day.
The point in time when the data push task is scheduled to run.
Scheduling period: Day
Scheduling time: 08:00
Description: The data push task runs at 08:00 every day.
Hours
NoteYou can select one of the following methods to specify the time at which you want to run the data push task:
Specify an interval at which the data push task runs.
Specify the hours and minutes at which the data push task runs.
Specify an interval
Start Time: 02:00
Time interval: 1
End Time: 23:59
Description: The data push task runs every hour from 00:00 to 23:59 every day.
Specify the hours and minutes
Specify hours: 0 and 1
Specify minutes: 10
Description: The data push task runs at 00:10 and 01:10 every day.
Timeout Definition: specifies the timeout period for the data push task. If the data push task fails to finish running within the specified timeout period, the data push task is terminated. Valid values:
System Default: By default, the system dynamically adjusts the timeout period for data push tasks based on the load of the resource group that you use. The timeout period ranges from 3 to 7 days. If the running duration of a data push task exceeds the specified timeout period, the data push task is terminated.
Custom: You can specify the timeout period for data push tasks based on your business requirements. For example, Custom is selected, and the timeout period of the data push task is 1 hour. If the running duration of the data push task exceeds 1 hour, the data push task is terminated.
Effective Date: specifies the effective period of the data push task. Data push tasks are automatically run only during the specified effective period. Valid values:
Permanent: The data push task is permanently valid.
Specified time: You can specify the effective period for data push tasks based on your business requirements. For example, Specified time is selected, and the time period ranges from 2024-01-01 to 2024-12-31. The data push task is automatically run during the specified time period.
Resource Group for Scheduling: specifies the resource group for scheduling used to run the data push task. The resource group for scheduling provides resources to periodically run data push tasks.
Destination: specifies the destinations. You can select destinations from the drop-down list. You can manage destinations on the Destination Management tab.
NoteIf you want to push data to DingTalk webhook URLs, you must select Custom Keywords in the Security Settings section and click Add to add keywords when you add a DingTalk chatbot in a DingTalk group. The content that you want to push must also contain the keywords that you add.
Step 4: Test the data push task
After you create the data push task, click Save in the toolbar to save the configurations. Then, click Test in the toolbar. In the Test Data Push Task dialog box, assign constants to the variables in the Request Parameters section and click Test.
Before you submit and publish a data push task, you must test the push task in the development state.
Step 5: Publish the data push task
Manage data push task versions
After the data push task passes the test, click Submit in the toolbar to submit the data push task. If you do not submit the data push task, the data push task is in the draft state, and no version is generated.
In the right-side navigation pane, click the Version tab. On this tab, find the desired version and click Publish in the Actions column. After the data push task is published, the content that you want to push can be pushed to specified destinations based on the push settings.
On the Version tab, you can manage the versions of the data push task.
Status
Option
Description
Release
Data Push Task Management
Click Data Push Task Management in the Actions column to view the details of the published data push task. For more information, see the Manage data push tasks section in this topic.
Can Be Published
Publish
Click Publish in the Actions column to publish the data push task of a specific version.
Deprecate
Click Deprecate in the Actions column to deprecate a specific version of a data push task. After you perform this operation, the status of the data push task of the specific version changes to Deprecated.
Off-Line and Deprecated
Version Details
Click Version Details to view the configurations and the pushed content of the data push task of a specific version.
Roll Back
Click Roll Back to roll back the data push task to a specific version.
NoteThe Version Details and Roll Back options are available for data push tasks in any state.
Manage data push tasks
After the data push task is published, you can perform the following steps to go to the Data Push Task Management page: In the right-side navigation pane, click the Version tab. On this tab, find the desired version and click Data Push Task Management in the Actions column. Alternatively, you can click Service Management. In the left-side navigation pane of the page that appears, click Data Push Task Management.
On the Data Push Tasks tab of the Data Push Task Management page, you can view all published data push tasks and the following information about each data push task: ID, name, data source name, data source environment, node mode, resource group for scheduling, owner, publisher, and last publish time. You can also perform operations on a published data push task. The following table describes the operations.
Operation | Description |
Unpublish | Click Unpublish in the Actions column to unpublish a data push task. |
Test | Click Test in the Actions column to test a data push task. |
You can click the icon in the Name column of a data push task to view the configurations of the data push task.
Test a published data push task
After a data push task is published, the data push task pushes the content that you configure to specified destinations based on the configured push settings. You can go to the Test Data Push Task page by using one of the following methods:
Method 1: Click Service Management. In the left-side navigation pane, click Test Data Push Task.
Method 2: Click Service Management. In the left-side navigation pane, click Data Push Task Management. On the Data Push Tasks tab of the Data Push Task Management page, find the desired data push task and click Test in the Action column.
You can test a published data push task to check whether the data push task pushes data as expected and whether the specified destinations can receive data.