All Products
Search
Document Center

DataWorks:Overview

Last Updated:Oct 21, 2024

DataWorks allows you to develop and deploy an extension based on a self-managed service or Function Compute. You can use the extension to process the event messages generated by the operations you perform in DataWorks based on custom processing logic, such as blocking the operations. This topic describes the basic information about extensions.

Limits

  • Only users of DataWorks Enterprise Edition can use the Extensions module.

  • The Extensions module is available in the following regions: China (Beijing), China (Hangzhou), China (Shanghai), China (Zhangjiakou), China (Shenzhen), China (Chengdu), US (Silicon Valley), US (Virginia), Germany (Frankfurt), Japan (Tokyo), China (Hong Kong), and Singapore.

Precautions

  • Only the Open Platform administrator, tenant administrator, Alibaba Cloud accounts, and RAM users to which the AliyunDataWorksFullAccess policy is attached have read and write permissions on the developer backend. For more information about permission management, see Manage permissions on global-level services and Manage permissions on the DataWorks services and the entities in the DataWorks console by using RAM policies.

  • If DataWorks Enterprise Edition expires, extensions become invalid and cannot be triggered to check extension point events. If an extension is triggered to check an event and has not completed the check when DataWorks Enterprise Edition expires, the check is terminated and the result Check Passed is returned.

  • If you develop and deploy an extension based on Function Compute, you can add only specific extension point events to the extension.

Features

DataWorks Open Platform provides various extension points. The Extensions module is provided based on extension points. The Extensions module is a plug-in that works with the OpenAPI and OpenEvent modules to provide features that can meet your business requirements and help you manage the operations you perform in DataWorks based on custom processing logic, such as blocking the operations.

Business scenarios in which extensions can be used for operation management:

  • Management of table or task naming conventions

  • Management of duplicate data synchronization tasks

  • Management of fees generated by tasks

  • Management of dependencies between tasks

    Note

    For more information about the event types that can be used in various business scenarios, see Development reference: Event lists and event message formats.

Custom process control: When you perform an operation in DataWorks and the extension point event generated by the operation is specified in an extension, you cannot continue the operation until the extension returns the processing result of the event message of the extension point event.

Note

For example, you can add an extension that is used to check a function to the basic process of task development and deployment in a workspace in standard mode. After you enable the extension, the task development and deployment process is changed from the develop, commit, and deploy procedure to the develop, check before commit, commit, check before deploy, and deploy procedure. For more information, see Best practices for prohibiting the use of the MAX_PT function (advanced feature).

Extension development process

DataWorks allows you to develop and deploy an extension based on a self-managed service or Function Compute to implement custom process control.

image
  • Configure the required settings in OpenEvent to push event messages: Select a method to develop and deploy an extension based on the extension point events that you want to receive.

    • Develop and deploy an extension based on a self-managed service: This method depends on the message distribution capability of EventBridge. You must specify an event bus to which you want to send DataWorks event messages in the event distribution channel that you configure. You must also specify the type of service to which the event messages are sent in the event bus.

    • Develop and deploy an extension based on Function Compute: By default, DataWorks event messages are sent to the Function Compute service that you specify in the extension that you register. You do not need to configure an event bus in OpenEvent.

  • Develop and deploy an extension: Develop and deploy an extension to receive and parse event messages pushed by DataWorks, process the event messages based on custom processing logic, and return the processing result to DataWorks.

  • Register the extension: Specify the types of events that you want the extension to receive and process in DataWorks.

  • Test the extension: Test the extension in the test workspace and check whether the extension works as expected.

  • Submit and publish the extension: After you confirm that the extension configurations are correct, submit the extension to the platform for review. After the extension is approved, you can publish the extension in all workspaces.

    Note
    • In most cases, an extension review can be completed within T+3 business days after you commit the extension. T is the point in time when you commit the extension for review.

    • DataWorks event messages are valid for three days. If the validity period is exceeded, the event messages are considered expired and are not processed by an extension.

Supported extension point events

The following table describes the types and details of extension point events that can be processed by extensions for each DataWorks service.

Note

Application scope

DataWorks service

Extension point type

Extension point

API operation for sending processing results

Workspace

DataStudio

Node change

Add a node

You can call the UpdateIDEEventResult operation to send event processing results to DataWorks.

File change

  • Node

  • Resource

  • Function

Delete a file

Commit a file

Deploy a file

Run the code

Table change

Commit a table to the development environment

Deploy a table to the production environment

Operation Center

Node O&M

Undeploy a node

You can call the UpdateWorkbenchEventResult operation to send event processing results to DataWorks.

Freeze a node

Unfreeze a node

Data backfilling for a node

Backfill data for a node

Instance O&M

Freeze an instance

Unfreeze an instance

Terminate an instance

Rerun an instance (rerun an instance and the descendant instances of the instance)

Set the status of an instance to Succeeded

Task status change

Change the status of an auto triggered node

Workflow status change

Change the status of a data backfill instance or a manually triggered workflow

Monitoring and alerting

Perform monitoring and alerting

Security Center

Approval Center

Create a request order

You can call the CallbackExtension operation to send event processing results to DataWorks.

Security Center

Request permissions on tables

Data Quality

Data quality check

Send feedback on data quality check results

Report that a data quality check is complete

Tenant

DataWorks console

Workspace management

Delete a workspace (pre-event)

Delete a workspace (post-event)

Data Upload and Download

Data download

Download data (pre-event) (file generation)

Download data (pre-event) (file download)

Data upload

Upload data (pre-event)

Appendix: Comparison between two types of extension deployment methods

Self-managed service

Function Compute

Operation difficulty

The procedure is complex, involves deployment of servers and applications, and is prone to network and O&M issues.

The procedure is simple. You can develop and deploy an extension by using a single function.

Fees

None

You are charged for using Function Compute. For more information, see Billing overview.

Supported events

Various extension point events are supported. For more information about extension point events, see Development reference: Event lists and event message formats.

The Function Compute-based deployment method supports only the pre-event for data download, pre-event for data upload, pre-event for asset publishing, and pre-event for asset unpublishing.