All Products
Search
Document Center

DataWorks:Data push

Last Updated:Nov 15, 2024

DataWorks DataService Studio provides the data push feature. This feature allows you to execute SQL statements to query data from a specific data source and push data to a webhook URL. You can perform simple configurations to periodically push business data to different webhook URLs. This topic describes how to configure and use the data push feature.

Overview

The data push feature allows you to create data push tasks. You can write SQL statements for single- or multi-table queries in a data push task to define the data that you want to push and organize the data by using rich text or tables. You can configure a scheduling cycle to periodically push data to destination webhook URLs.

image

Supported data source types and push channels

  • Supported data source types:

    • MySQL (compatible with StarRocks and Doris)

    • PostgreSQL (compatible with Snowflake and Redshift)

    • Hologres

    • MaxCompute

    • ClickHouse

  • Push channels: DingTalk, Lark, WeCom, and Microsoft Teams.

Limits

  • Limits on the data size:

    • If you want to push data to DingTalk, the data size cannot exceed 20 KB.

    • If you want to push data to Lark, the data size cannot exceed 30 KB, and the size of an image must be less than 10 MB.

    • If you want to push data to WeCom, each chatbot can send a maximum of 20 messages every minute.

    • If you want to push data to Microsoft Teams, the data size cannot exceed 28 KB.

    Note

    The tables that are defined by using Markdown in the content that you want to push cannot be displayed as expected on the mobile client of DingTalk and WeCom. We recommend that you select Table in the Body section to add tables to push the content that you want to push. The tables that are defined by using Markdown in the content that you want to push can be displayed as expected on the mobile clients of Lark and Microsoft Teams.

  • The data push feature is available only in DataWorks workspaces in the following regions: China (Hangzhou), China (Shanghai), China (Beijing), China (Shenzhen), China (Chengdu), Singapore, US (Silicon Valley), and US (Virginia).

Prerequisites

A data source is added in a supported region. For information about how to add a data source to DataWorks, see Add and manage data sources.

Procedure

Step 1: Create a data push task

Step 2: Configure the data push task

Step 3: Configure data push settings

Step 4: Test the data push task

Step 5: Publish the data push task

Step 1: Create a data push task

  1. Go to the DataService Studio page.

    Log on to the DataWorks console. In the top navigation bar, select the region in which the desired data source resides. In the left-side navigation pane, choose Data Development and Governance > DataService Studio. On the DataService Studio page, select a workspace from the drop-down list and click Go to DataService Studio.

  1. Create a data push task.

    In the Service Development pane of the DataService Studio page, click Data Push. Move the pointer over the image icon and select Create Data Push Task. In the Data Push dialog box, configure the Name parameter and click OK.

    image

Step 2: Configure the data push task

(Optional) Preparations

The following sections describe how to query data from a MaxCompute table and push queried data to a specific destination. In this example, a table named sales is used. You can run the data push task to push the daily sales amount of each department and the difference between the sales amount of the current day and that of the previous day to a destination by using a specific push channel. Before you perform subsequent operations, you must create a table named sales in the desired environment. The following sample code provides an example on how to create a table named sales and insert data into the table. For more information about how to create a MaxCompute table, see Create and manage MaxCompute tables.

CREATE TABLE IF NOT EXISTS sales (
    id BIGINT COMMENT 'Unique identifier',
    department STRING COMMENT 'Department name',
    revenue DOUBLE COMMENT 'Sales amount'
) PARTITIONED BY (ds STRING);

-- Insert data into partitions.
INSERT INTO TABLE sales PARTITION(ds='20240101')(id, department, revenue ) VALUES (1, 'Department 1', 12000.00);
INSERT INTO TABLE sales PARTITION(ds='20240101')(id, department, revenue ) VALUES (2, 'Department 2', 21000.00);
INSERT INTO TABLE sales PARTITION(ds='20240101')(id, department, revenue ) VALUES (3, 'Department 3', 5000.00);
 
INSERT INTO TABLE sales PARTITION(ds='20240102')(id, department, revenue ) VALUES (1, 'Department 1', 11000.00);
INSERT INTO TABLE sales PARTITION(ds='20240102')(id, department, revenue ) VALUES (2, 'Department 2', 20000.00);
INSERT INTO TABLE sales PARTITION(ds='20240102')(id, department, revenue ) VALUES (3, 'Department 3', 10000.00); 

Select a data source

Configure the Datasource Type, Datasource Name, and Datasource Env. parameters. You can configure the Datasource Env. parameter based on the environment of the table whose data you want to push. Before you perform subsequent operations, confirm the environment of the sales table that you created in the (Optional) Preparations section.

image

Note

For information about the supported data source types, see Supported data source types and push channels.

Write SQL statements

  • Define the data that you want to push and query data.

    In the Edit Query SQL section, write SQL statements for single- or multi-table queries to define the data that you want to push. Example:

    -- Query the sales amount of each department on January 2, 2024.
    SELECT id, department, revenue FROM sales WHERE ds='20240102';
    -- Query the difference between the sales amount of the current day and that of the previous day.
    SELECT  a.revenue - b.revenue AS diff FROM sales a LEFT JOIN sales b ON a.id = b.id AND a.ds > b.ds WHERE a.ds = '20240102'AND b.ds = '20240101';

    After you write the SQL statements, the response parameters are automatically displayed on the Output Parameters tab of the Parameters tab. If the parameters fail to be displayed or are invalid, you can turn off Automatically Parse Parameters and manually add parameters.

    image

    You can configure variables in the ${Variable name} format in the SQL statements and use assignment parameters to assign values to the variables. The assignment parameters can be set to date and time expressions or constants. This way, the values of variables are dynamically replaced in the code of the data push task based on the configurations of the assignment parameters. For more information, see Configure the content to push.

    -- Dynamically assign values to the time variables based on scheduling parameters. 
    -- Query the latest sales amount of each department every day.
    SELECT id, department, revenue FROM sales WHERE ds='${date}';
    -- Query the difference between the sales amount of the current day and that of the previous day.
    SELECT a.revenue - b.revenue AS diff FROM sales a LEFT JOIN sales b ON a.id = b.id  and  a.ds > b.ds WHERE a.ds = '${date}' AND b.ds = '${previous_date}';

    image

  • Perform segmented queries.

    If a table contains a large amount of data, the data push feature allows you to perform segmented queries by using the Next Token method. To use the Next Token method, perform the following steps: In the upper-right corner of the Edit Query SQL section, select Code Template from the Code Help drop-down list. In the Code Template dialog box, click Insert Template Code in the Next Token section.

Configure the content to push

In the Content to Push section, configure the content that you want to push to specific webhook URLs by using Markdown or tables.

Configure the Title parameter. Click Add to the right of the Body parameter, select Markdown or Table, and then configure the content. After you configure the content that you want to push, you can click Preview to preview the content. Examples:

Markdown

  • Configure variables: Add assignment parameters and output parameters as variables in the ${Parameter name} format to the rich text. When the data push task is run, the variables in the rich text are replaced with the values of the assignment parameters and the SQL query output.

    • Assignment parameters: In the right-side navigation pane, click Parameters. On the Assignment Parameters tab, assign constants or date and time expressions (scheduling parameters) to the variables.

    • Output parameters: Output parameters indicate the queried data. For example, A and B in the SELECT A,B… FROM TABLE statement are the output parameters.

  • Automatically @ specified users: If you want to push content to webhook URLs of Lark, you can perform related configurations to allow the system to automatically @ specified users.

    • By default, if you select Markdown, the content is configured by using rich text. If you want to push the content to webhook URLs of Lark, you can click the image icon and use <at id="all" /> or <at email="username@example.com" /> to @ specified users.

  • In Markdown mode, you can also perform other operations, such as inserting images and DingTalk emojis.image

Table

  • Click Add Column to add a column. Then, select a parameter from the Parameters drop-down list to associate the parameter with the column.

  • If you want to push content to webhook URLs of Lark, you can perform the following steps to configure the display effect of the content that you want to push: Find the desired column and click the image icon to the right of the column. In the Modify Field dialog box, configure the Field, Display Name, Display Style, and Condition parameters.

    • Field: You can select another field from the drop-down list.

    • Display Name: the name to be displayed in the table header when you push the content to office software.

    • Display Style: If you specify a prefix or a suffix, the prefix or suffix is added to the return values of the field.

    • Condition: You can configure this parameter to compare the return value of a field with a specified value. You can specify the color of the value that meets or does not meet the specified condition and select an additional unicode from the drop-down list.image

    Note
    • Different push channels support different tables and display effects.

      • DingTalk: You can select Markdown or Table to add a table. If you select Table, the configuration of the Display Style and Condition parameters in the Modify Field dialog box does not take effect. For the mobile client of DingTalk, tables cannot be displayed.

      • Lark: You can select Markdown or Table to add a table. The tables that are defined by using Markdown or Table can be displayed as expected.

      • WeCom: You can select Markdown to add a table. However, the tables that are defined by using Markdown cannot be displayed as expected.

      • Mobile client of Teams: You can select Markdown to add a table. The tables that are defined by using Markdown can be displayed as expected based on the configurations.

Step 3: Configure data push settings

Create a destination

Before you configure data push settings, you must perform the following steps to create a destination: In the lower-left corner of the DataService Studio page, click the image icon. On the page that appears, click the Destination Management tab. Then, click Create Destination.

In the Create Destination dialog box, configure the following parameters:

  • Type: Select a push channel.

  • Destination Name: Enter a name for the destination.

  • Webhook: Enter the webhook URL of the push channel that you selected.

Note

image

Configure data push settings

In the right-side navigation pane of the data push task, click the Push Settings tab. On this tab, you can specify the scheduling cycle, the resource group for scheduling, and the destinations. This way, the system periodically runs the data push task to push data to specified destinations. Parameters:

  • Parameters related to the scheduling cycle: specify the scheduling cycle of the data push task and the specific point in time at which the data push task is scheduled.

    Scheduling cycle

    Specified time

    Scheduling time

    Example

    Month

    The days on which the data push task is run in each month.

    The point in time when the data push task is scheduled to run.

    Scheduling period: Month

    Specified time: Monthly 1

    Scheduling time: 08:00

    Description: The data push task runs at 08:00 on the first day of each month.

    Week

    The days on which the data push task is run in each week.

    The point in time when the data push task is scheduled to run.

    Scheduling period: Week

    Specified time: Monday

    Scheduling time: 09:00

    Description: The data push task runs at 09:00 every Monday.

    Day

    Note

    The data push task runs every day.

    The point in time when the data push task is scheduled to run.

    Scheduling period: Day

    Scheduling time: 08:00

    Description: The data push task runs at 08:00 every day.

    Hours

    Note

    You can select one of the following methods to specify the time at which you want to run the data push task:

    • Specify an interval at which the data push task runs.

    • Specify the hours and minutes at which the data push task runs.

    Specify an interval

    Start Time: 02:00

    Time interval: 1

    End Time: 23:59

    Description: The data push task runs every hour from 00:00 to 23:59 every day.

    Specify the hours and minutes

    Specify hours: 0 and 1

    Specify minutes: 10

    Description: The data push task runs at 00:10 and 01:10 every day.

  • Timeout Definition: specifies the timeout period for the data push task. If the data push task fails to finish running within the specified timeout period, the data push task is terminated. Valid values:

    • System Default: By default, the system dynamically adjusts the timeout period for data push tasks based on the load of the resource group that you use. The timeout period ranges from 3 to 7 days. If the running duration of a data push task exceeds the specified timeout period, the data push task is terminated.

    • Custom: You can specify the timeout period for data push tasks based on your business requirements. For example, Custom is selected, and the timeout period of the data push task is 1 hour. If the running duration of the data push task exceeds 1 hour, the data push task is terminated.

  • Effective Date: specifies the effective period of the data push task. Data push tasks are automatically run only during the specified effective period. Valid values:

    • Permanent: The data push task is permanently valid.

    • Specified time: You can specify the effective period for data push tasks based on your business requirements. For example, Specified time is selected, and the time period ranges from 2024-01-01 to 2024-12-31. The data push task is automatically run during the specified time period.

  • Resource Group for Scheduling: specifies the resource group for scheduling used to run the data push task. The resource group for scheduling provides resources to periodically run data push tasks.

  • Destination: specifies the destinations. You can select destinations from the drop-down list. You can manage destinations on the Destination Management tab.

    Note

    If you want to push data to DingTalk webhook URLs, you must select Custom Keywords in the Security Settings section and click Add to add keywords when you add a DingTalk chatbot in a DingTalk group. The content that you want to push must also contain the keywords that you add.

image

Step 4: Test the data push task

After you create the data push task, click Save in the toolbar to save the configurations. Then, click Test in the toolbar. In the Test Data Push Task dialog box, assign constants to the variables in the Request Parameters section and click Test.

image

Note

Before you submit and publish a data push task, you must test the push task in the development state.

Step 5: Publish the data push task

Manage data push task versions

  1. After the data push task passes the test, click Submit in the toolbar to submit the data push task. If you do not submit the data push task, the data push task is in the draft state, and no version is generated.

  2. In the right-side navigation pane, click the Version tab. On this tab, find the desired version and click Publish in the Actions column. After the data push task is published, the content that you want to push can be pushed to specified destinations based on the push settings.

    On the Version tab, you can manage the versions of the data push task.

    Status

    Option

    Description

    Release

    Data Push Task Management

    Click Data Push Task Management in the Actions column to view the details of the published data push task. For more information, see the Manage data push tasks section in this topic.

    Can Be Published

    Publish

    Click Publish in the Actions column to publish the data push task of a specific version.

    Deprecate

    Click Deprecate in the Actions column to deprecate a specific version of a data push task. After you perform this operation, the status of the data push task of the specific version changes to Deprecated.

    Off-Line and Deprecated

    Version Details

    Click Version Details to view the configurations and the pushed content of the data push task of a specific version.

    Roll Back

    Click Roll Back to roll back the data push task to a specific version.

    Note

    The Version Details and Roll Back options are available for data push tasks in any state.

    image

Manage data push tasks

After the data push task is published, you can perform the following steps to go to the Data Push Task Management page: In the right-side navigation pane, click the Version tab. On this tab, find the desired version and click Data Push Task Management in the Actions column. Alternatively, you can click Service Management. In the left-side navigation pane of the page that appears, click Data Push Task Management.

On the Data Push Tasks tab of the Data Push Task Management page, you can view all published data push tasks and the following information about each data push task: ID, name, data source name, data source environment, node mode, resource group for scheduling, owner, publisher, and last publish time. You can also perform operations on a published data push task. The following table describes the operations.

Operation

Description

Unpublish

Click Unpublish in the Actions column to unpublish a data push task.

Test

Click Test in the Actions column to test a data push task.

Note

You can click the image icon in the Name column of a data push task to view the configurations of the data push task.

image

Test a published data push task

After a data push task is published, the data push task pushes the content that you configure to specified destinations based on the configured push settings. You can go to the Test Data Push Task page by using one of the following methods:

  • Method 1: Click Service Management. In the left-side navigation pane, click Test Data Push Task.

  • Method 2: Click Service Management. In the left-side navigation pane, click Data Push Task Management. On the Data Push Tasks tab of the Data Push Task Management page, find the desired data push task and click Test in the Action column.

You can test a published data push task to check whether the data push task pushes data as expected and whether the specified destinations can receive data.

image