This topic describes the features and basic use scenarios of DataWorks modules.
Data processing procedure and main modules
Data processing procedure
DataWorks is an end-to-end data development and governance platform. The data processing procedure includes the phases that are shown in the following figure.
DataWorks modules
Feature directory | Module | Description |
Data integration | Data Integration provides comprehensive data synchronization solutions and supports batch synchronization, real-time synchronization, and full or incremental data synchronization. Data Integration provides the following benefits:
| |
Data modeling and development | Data Modeling consists of the following sub-modules: Data Warehouse Planning, Data Standard, Dimensional Modeling, and Data Metric.
| |
DataStudio supports various compute engines. DataStudio provides an intelligent code editor, visualization tools, an independent development environment, and reliable management features to ensure efficient task management and standardized data development processes.
| ||
Operation Center allows you to perform the following O&M operations on auto triggered tasks, manually triggered tasks, and real-time tasks that are deployed in DataStudio:
| ||
Data Map works based on table searches and provides features such as table usage instruction, data category, data lineage, and field lineage to help users and owners of data tables manage data in an efficient manner and facilitate collaborative development. | ||
Data analysis | SQL Query helps you perform SQL-based analysis online, gain an insight into business requirements, and modify and share data. SQL Query allows you to save query results as chart cards and quickly generate visualized data reports based on the chart cards for daily reporting. | |
Data Insight supports data exploration and visualization. You can use the data insight feature to understand data distribution, create data cards, and combine data cards into a data report. In addition, data insight results can be shared by using long images. | ||
Data governance | Data Quality can check the data quality of common big data storage systems, such as MaxCompute, EMR, Hologres, AnalyticDB for PostgreSQL, AnalyticDB for MySQL, and CDH. Data Quality allows you to configure monitoring rules that focus on multiple dimensions of data, such as integrity, accuracy, validity, consistency, uniqueness, and timeliness. You can configure a monitoring rule for a specific table and associate the monitoring rule with a scheduling node that generates the table data. After a task on the node is run, a check is automatically triggered. This facilitates reporting of data anomalies and allows you to handle data anomalies at the earliest opportunity. You can also configure a monitoring rule as a strong rule or a weak rule to determine whether to terminate the associated node when Data Quality detects anomalies. This way, you can prevent dirty data from spreading downstream and minimize the waste of time and money on data restoration. | |
Data Asset Governance can detect issues that need to be handled in the data storage, task computing, code development, data quality, and security dimensions based on governance plans. Data Asset Governance provides health scores to assess the effectiveness of data governance and visualizes the governance results by providing governance reports and leaderboards of governance issues from the global, workspace, and individual dimensions. This helps you achieve governance objectives in an efficient manner. Data Asset Governance also provides features such as business asset management, asset analysis, resource consumption details of tasks, and cost estimation to help you better understand the usage details of various resources and optimize resource configurations. | ||
Data service | DataService Studio provides a service bus to help enterprises create and manage private and public APIs in a centralized manner. DataService Studio also provides a solution to the last mile issue among data warehouses, databases, and data applications, and facilitates data forwarding and sharing.
| |
Others | Security Center provides the following core features:
| |
In Data Security Guard, you can configure sensitive data identification rules, identify sensitive data based on rules, view identification results, and process sensitive data. You can identify and manage sensitive data before, during, and after the event that generates sensitive data to ensure data security. | ||
Migration Assistant allows you to export data objects in your workspace, including auto triggered tasks, manually triggered tasks, resources, functions, data sources, table metadata, ad hoc queries, and script templates. You can also create full export tasks, incremental export tasks, or custom export tasks to export your data objects in DataWorks based on your business requirements. |