All Products
Search
Document Center

DataWorks:Features on the DataStudio page

Last Updated:Oct 11, 2024

This topic describes the overall layout of the DataStudio page and the features that are related to workflows and nodes for data development. This helps you understand and get started with DataStudio with ease.

Go to the DataStudio page

Log on to the DataWorks console. In the top navigation bar, select the desired region. Then, choose Data Modeling and Development > DataStudio in the left-side navigation pane. On the page that appears, select the desired workspace from the drop-down list and click Go to DataStudio.

On the DataStudio page, you can create workflows and different types of nodes for data development based on your business requirements. For more information, see Create a workflow and Create a node.

The features for different data development operations vary. The following sections in this topic describe these features:

Overall layout of the DataStudio page

The following figure shows the overall layout of the DataStudio page.界面说明

Area

Description

1

2

In this area, you can click the 切换模块 icon to show or hide the names of the functional icons in the left-side navigation pane.

  • Scheduled Workflow: allows you to create auto triggered nodes that use different compute engines for data development. The created nodes can be deployed to the production environment for O&M.

    Note

    Before you can use a specific compute engine for data development, you must add a data source of the compute engine type to DataWorks and associate the data source with DataStudio.

  • Manually Triggered Workflows: allows you to create manually triggered nodes. The created nodes can be deployed to the production environment for O&M.

  • Runtime Logs: allows you to view the records of the nodes that are run within the previous three days in DataStudio.

  • Ad Hoc Query: allows you to perform a simple ad hoc query to test your code. However, the code of an ad hoc query cannot be deployed to the production environment for O&M.

  • Tenant Tables: allows you to view all production tables of the current Alibaba Cloud account.

  • Workspace Tables: allows you to perform operations on a table in a visualized manner. The operations that you can perform on a table must be supported by the compute engine used to create the table.

  • Built-In Functions: allows you to view the descriptions of all built-in MaxCompute functions.

  • Recycle Bin: allows you to manage the nodes, resources, and functions that are deleted from Scheduled Workflow or Manually Triggered Workflows.

  • Snippets: allows you to manage script templates. A script template defines an SQL code process that includes multiple input and output parameters. Each SQL code process references one or more source tables. You can use an SQL code process to filter source table data, join source tables, and aggregate the source tables to generate a new result table required for business.

  • Operation History: allows you to filter and view historical operation records in the current workspace by operation type, operator, and operation time.

  • Operation Check: allows you to filter and view operations by operation type and check status.

  • MaxCompute:

    • MaxCompute Resources: allows you to manage existing MaxCompute resources and view the operation records of a specific MaxCompute resource. In addition, you can add MaxCompute resources that are not uploaded to DataWorks for management.

    • MaxCompute Functions: allows you to manage existing MaxCompute functions and view the operation records of a specific MaxCompute function. In addition, you can add MaxCompute functions that are not registered with DataWorks for management.

Note

If a specific functional icon is not displayed in the left-side navigation pane, you can click the 设置 icon in Area 4 to add the functional icon to the left-side navigation pane on the Settings page. For more information, see Personal settings.

3

Shortcuts to other services:

  • Cross-project cloning: You can click Cross-project cloning to clone and migrate nodes such as compute nodes and synchronization nodes between workspaces.

  • Operation Center: You can click Operation Center to perform O&M operations on nodes in Operation Center. In Operation Center, you can switch between the development environment and the production environment. You can perform O&M operations on deployed nodes in Operation Center in the production environment.

Common features of DataWorks services:

Note

DataWorks services share common features. The following descriptions introduce the common features that are provided by DataWorks on the DataStudio page.

  • Notification Center (消息中心): You can click this icon to obtain the latest updates of DataWorks at the earliest opportunity.

  • Helps (互动学习): You can click this icon to obtain information about how to use a specific feature based on your business requirements.

  • Workspace Manage (工作空间管理): You can click this icon to go to the Workspace page. On this page, you can view the basic information, scheduling properties, security settings, and associated data sources and open source clusters. For more information, see Create and manage workspaces.

  • Language switch: You can click the current language and switch to another language. For example, you can switch from English to Chinese.

  • Account information: You can click the account to view the personal information about the account and view the status statistics about nodes in the Workbench section.

4

After you click the Settings icon in Area 4, you can perform system configurations on the following tabs of the Settings page:

  • Personal Settings: On this tab, you can manage DataStudio modules, editor settings, and general settings such as the DataWorks theme.

  • Template Management: On this tab, you can modify a code template based on your business requirements.

  • Scheduling Settings: On this tab, you can enable the periodic scheduling feature and configure the default scheduling settings for auto triggered nodes. Auto triggered nodes can be run as scheduled only after the periodic scheduling feature is enabled.

  • Security Settings and Others:

    • Data Security: In this section, you can specify whether to mask sensitive information in the returned results of queries that you perform in DataStudio in the current workspace.

    • Code Review: In this section, you can enable forcible code review for workspaces and specify code reviewers. This helps ensure the code quality of nodes.

5

This area displays the shortcut keys that are commonly used in the DataStudio editor. For more information, see Editor shortcuts.

Features related to workflows

By default, the Scheduled Workflow pane appears after you go to the DataStudio page. On the Scheduled Workflow pane, you must create a workflow before you can organize your data development operations. For more information about how to create a workflow, see Create a workflow. The following figure shows the features related to workflows.业务流程

Area

Description

1

  • Solution: You can create a solution to manage multiple workflows. A workflow can be added to one or more solutions. Solutions can be displayed by using lists and cards.

  • Business Flow: A workflow is an abstract business entity that you can use to organize code development operations based on your business requirements.

Click the 全部 icon to show all solutions or workflows in a workspace.

2

  • Refresh (刷新): After you modify a workflow or solution, you can click this icon to refresh the Scheduled Workflow pane.

  • Locate (定位): You can click this icon to find the node whose configuration tab is displayed on the right side of the current page.

  • Search Code (代码搜索): You can click this icon to search for a code snippet by using keywords. This way, you can find all nodes that contain the code snippet in the Scheduled Workflow, Manually Triggered Workflows, Ad Hoc Query, and Recycle Bin panes and view the details of the code snippet in a centralized manner. You can also use this feature to identify the node that causes changes to a table.

  • Batch Operation (批量操作): You can click this icon to modify the configurations of multiple tables, resources, or functions at a time. The configurations include the owner, compute engine instance, resource group for scheduling, rerun properties, scheduling type, scheduling cycle, and scheduling timeout period.

  • Import Data (导入): You can click this icon to upload the data in an on-premises file to a table in DataWorks. You can import data in an on-premises file only to a MaxCompute table.

  • Create (快捷新建): You can click Create to quickly create a workflow, node, table, resource, or function.

  • Solution and workflow directory trees:

    • All: This directory tree displays all created objects, including nodes, resources, and functions, in the current workspace by solution and workflow.

    • Owned by Me: This directory tree displays the objects, including nodes, resources, and functions, that are owned by the current account by solution and workflow.

    • My Favorites: This directory tree displays the objects, including nodes, resources, and functions, that are added to favorites by the current account by solution and workflow.

  • Node search:

    • Exact search: You can enter the name of a node or the identifier of a node creator in the search box and click the 查找 icon to search for the node.

    • Search by node type: You can click the 筛选 icon to specify the types of nodes that you want to search. After you specify a node type, the directory tree displays only nodes of the specified type in the current workspace.

      Note

      You can determine whether to hide compute engine instances or node folders based on your business requirements. After you select Hide Engine Instances or Hide Node Folders, compute engine instances or node folders are not displayed in the directory tree.

      • Hide Engine Instances and Hide Node Folders are applicable only to workflows of the latest version.

      • In most cases, if a compute engine contains only one compute engine instance, we recommend that you hide the compute engine instance.

      • If you do not need to use node folders, such as Data Analytics, Table, Resource, and Function, you can hide them.

Note

Before you perform data development operations in a new workspace, you must create a workflow and a node in the workflow. For more information about how to create a workflow, see Create a workflow.

3

In this area, you can use a directory tree to manage nodes, tables, resources, and functions in each workflow.

  • Workflow: the unit for business development.

  • Node: the smallest unit for code development. You can develop code by node type, such as engine nodes, algorithm nodes, Data Integration nodes, database nodes, or general nodes.

  • Table: You can manage tables in DataStudio in a visualized manner.

  • Resource: You can upload resources in DataStudio in a visualized manner.

    Note

    You can upload resources of only the MaxCompute, E-MapReduce (EMR), and Cloudera's Distribution including Apache Hadoop (CDH) compute engines in a visualized manner.

  • Function: You can register functions in a visualized manner.

    Note

    You can register functions of only the MaxCompute, EMR, and CDH compute engines in a visualized manner.

The icon before the name of a node indicates the status of the node:

  • 未提交 icon: indicates that the node of the current version is not committed. You can click this icon to commit the node.

  • 未发布 icon: indicates that the node is not deployed. You can click this icon to deploy the node.

The last time when the node is edited is displayed after the node name.

You can double-click the name of a workflow to go to the configuration tab of the workflow, as shown in Area 5 to Area 8. On this tab, you can perform data development operations.

4

Resource Group Orchestration (资源组编排): You can click this icon to change the resource groups for scheduling used by multiple nodes in a workflow during data development. If multiple resource groups for scheduling are used in your workspace, you can use this feature to change the resource groups for scheduling for the nodes in the workspace based on your business requirements. This helps you improve resource utilization. After you change the resource groups for scheduling used by multiple nodes, you must deploy the nodes to the production environment so that the change can take effect in the production environment.

5

  • Common Nodes: This section displays the common types of nodes in the current workspace. This helps you quickly select a node type and create a node.

  • Node Group: You can use this feature to reference a set of nodes across workflows. You can add nodes that are frequently used in a workflow to a node group and reuse the node group in other workflows.

  • Quick node creation: You can drag nodes in sections, such as Data Integration, MaxCompute, and EMR, to the right-side canvas of a workflow to create the nodes in the workflow.

6

Tools on the canvas:

  • Switch Layout (切换布局): You can click this icon to switch the layout of the canvas to Vertical, Horizontal, or Grid.

  • Box (框选): You can click this icon to select nodes to form a node group and perform operations on the node group to manage selected nodes.

  • Refresh (刷新): After you modify a workflow, you can click this icon to refresh the workflow.

  • Format (格式化): You can click this icon to horizontally align the nodes on the canvas.

  • Adapt (适配窗口): You can click this icon to adapt the current workflow layout to the size of the canvas.

  • Center (居中): You can click this icon to center nodes on the canvas.

  • 1:1 (1:1): You can click this icon to change the scale of the directed acyclic graph (DAG) of nodes to 100%.

  • Zoom In (放大): You can click this icon to zoom in on the nodes in the current workflow.

  • Zoom Out (缩小): You can click this icon to zoom out on the nodes in the current workflow.

  • Search (查找): You can click this icon and enter a keyword in the search box to search for a node whose name contains the keyword.

    Note

    Fuzzy match is supported. After you enter a keyword, DataWorks displays all nodes whose names contain the keyword in the current workflow.

  • Toggle Full Screen View (全屏): You can click this icon to view the current workflow in full screen.

  • Hide Engine Information (隐藏引擎信息): You can click this icon to show or hide the engine information of each node.

7

Tabs in the right-side navigation pane:

  • Workflow Parameters: You can click this tab and assign a value to a variable in the code for all ODPS SQL nodes in the current workflow at a time.

  • Change History: You can click this tab and view the operation records of nodes in the current workflow.

  • Versions: Each time nodes in the workflow are committed, a new version is generated for the workflow. You can click this tab and view all versions and the details of each version.

8

Tools in the toolbar and tools above the configuration tab:

  • Submit (提交): You can click this icon to commit one or more updated nodes in the current workflow to the Deploy page.

  • Run (运行): You can click this con to run all nodes in the current workflow.

  • Stop (停止运行): If the current workflow is running, you can click this icon to stop the nodes from running in the workflow.

  • Deploy (发布): You can click this icon to go to the Deploy page and view the nodes to be deployed in the current workflow. Then, you can deploy nodes based on your business requirements.

  • Operation Center (前往运维): You can click this icon to go to Operation Center in the production environment to view the O&M details of nodes in the current workflow.

  • View opened configuration tabs: If you have opened multiple configuration tabs on the DataStudio page, you can click the 搜索 icon to view all configuration tabs that are open from the drop-down list.

  • Close opened configuration tabs: You can click the 关闭页签 icon to close one or more configuration tabs.

Shortcut menu related to workflows

You can move the pointer over a workflow and right-click the name of the workflow. The following figure shows the shortcut menu that appears, and the following table describes the features supported by the shortcut menu.业务流程快捷操作

Feature

Description

Create Node

This feature allows you to quickly create nodes of different types.

When you create a node, the system displays the node types that are recently used. If you click one of the node types, the system configures the Engine Instance and Node Type parameters based on the information about the node that was last used of this type. You can create a node of a type that was recently used by using this method.新建节点

Create Table

This feature allows you to quickly create tables of different types.

Create Resource

This feature allows you to quickly create resources of different compute engines.

Note

You can use this feature to create resources of only the MaxCompute, CDH, and EMR compute engines.

Create Function

This feature allows you to quickly create functions of different compute engines.

Note

You can use this feature to create functions of only the MaxCompute, CDH, and EMR compute engines.

Board

This feature navigates you to the canvas of a workflow.

Change

This feature allows you to modify the name, owner, and description of a workflow.

Delete

This feature allows you to delete the current workflow.

Note

If you perform this operation, all objects in the workflow are deleted. Proceed with caution.

The following options are available to cope with situations where an object cannot be deleted:

  • Terminate the Delete Operation: By default, this option is selected. If an object cannot be deleted, the delete operation is terminated. This operation does not affect the deleted objects.

  • Skip Current Object and Continue to Delete Other Objects: If an object cannot be deleted, the system skips the object and continues to delete other objects.

删除业务流程

Batch Operation

This feature allows you to modify the configurations of multiple nodes, resources, or functions at a time. For example, you can modify the owners, compute engine instances, and scheduling properties of multiple objects at a time. This feature also allows you to commit and deploy multiple modified objects to the production environment at a time.

Features related to nodes

After you create a workflow, you can create different types of nodes for data development based on your requirements. For more information, see Overview. Different types of nodes provide similar features. This section describes the features that are provided by DataWorks on the configuration tab of an ODPS SQL node.节点界面功能

Area

Description

1

Node development-related features in the top toolbar:

  • Save (保存): You can click this icon to save the code and configurations of the current node.

  • Save as Ad-Hoc Query Node (另存为临时查询文件): You can click this icon to save the current code as an ad hoc query. Then, you can view the ad hoc query in the Ad Hoc Query pane. For more information, see Create an ad hoc query.

  • Submit (提交): You can click this icon to commit the current node.

  • Unlock (提交并允许他人编辑该文件): You can click this icon to commit the current node and allow other users to modify the code of the node.

  • Steal Lock (偷锁编辑): If you are not the owner of the node but you want to modify the node, click this icon.

  • Run (运行): You can click this icon to run the code of the current node. Values need to be assigned to the variables in the SQL statements only once. If you modify the code, the variables in the code still use the initial values that you assign.

    Note

    If no resource group for scheduling is specified for the node, DataWorks prompts you to select a resource group for scheduling after you click this icon.

  • Run with Parameters (高级运行(带参数运行)): You can click this icon to run the code of the current node based on the configured parameters. Each time you click the Run with Parameters icon, you must assign values to variables in SQL statements. DataWorks obtains the initial values when you click the icon. After you assign values to custom parameters, DataWorks replaces the initial values with the values assigned to the custom parameters.

    Note

    If no resource group for scheduling is specified for the node, DataWorks prompts you to select a resource group for scheduling after you click this icon.

  • Stop (停止运行): You can click this icon to stop the node that is running.

  • Reload (重新加载): You can click this icon to refresh the configuration tab of the current node and return to the node configuration tab that is last saved.

  • Perform Smoke Testing in Development Environment (在开发环境执行冒烟测试): You can click this icon to test the code of the current node in the development environment. Smoke testing in the development environment allows you to simulate the value replacement of scheduling parameters in the production environment. After you select a data timestamp, DataWorks replaces the values in the specified data timestamp with the values that you specified. This feature checks the result of value replacement for scheduling parameters.

    Note

    Each time you modify the scheduling parameters, you must save and commit the modification before you perform smoke testing in the development environment. Otherwise, the new values of scheduling parameters do not take effect.

  • View Log of Smoke Testing in Development Environment (查看开发环境的冒烟测试日志): You can click this icon to view the operation logs of a node that runs in the development environment.

  • Access Scheduling System in Development Environment (前往开发环境的调度系统): You can click this icon to go to Operation Center in the development environment and perform O&M operations. For more information, see View auto triggered instances.

  • Format Code (格式化): You can click this icon to sort the code of the current node. This prevents the code in a single line from being excessively long.

  • Share (分享): You can click the icon to share the current node with other users.

2

Properties tab:

  • General: In this section, you can view the name, ID, and type of the node and configure the Owner and Description parameters for the node.

  • Scheduling Parameter: In this section, you can add scheduling parameters for the node and dynamically assign values to the parameters.

  • Schedule: In this section, you can configure time properties for scheduling the node after the node is deployed in the production environment. The time properties include the instance generation mode, scheduling cycle and time of instances, rerun properties, and timeout period for the node.

  • Resource Group: In this section, you can specify a resource group for scheduling for the node.

  • Dependencies: In this section, you can configure node dependencies. For more information, see Configure same-cycle scheduling dependencies and Configure cross-cycle scheduling dependencies.

  • Input and Output Parameters: In this section, you can use context-based parameters to pass the value of the output parameter of an ancestor node to a descendant node based on the assignment feature.

Lineage tab: This tab displays the dependencies and auto-captured lineage between the current node and other nodes.

Versions tab: A version is generated each time a node is committed and deployed. On this tab, you can view the historical versions and information about each version of the node. The information includes the user that committed the node, the time when the node was committed, change type, status, and remarks. The following descriptions provide the different states of a node version:

  • Yes: The node is committed to the development environment, but a deployment package is not created for the node on the Create Deploy Task page.

  • Deployed: The node is deployed in the production environment. You can view the node on the Auto Triggered Tasks page in Operation Center in the production environment. For more information, see View and manage auto triggered tasks.

  • Not Deployed: The node is committed to the development environment but not deployed to the production environment. If you commit the node again, the previously committed version becomes a pending version.

  • The deployment is cancelled: If you commit a node but cancel the deployment of the committed node on the Create Deploy Task page, the committed version is in this state.

Code Structure tab: This tab uses SQL operators to display the code structure of the node.

3

SQL editor: You can write SQL statements in the editor based on your business requirements.

  • You can click the 跳转至首行 icon to return to the first line of the SQL editor.

  • You can click the 全屏展示 icon to view the SQL editor in full screen.

  • You can click the 快捷运行 icon to quickly run a code snippet to test whether the code snippet is correctly written. For more information, see Debug a code snippet: Quickly run a code snippet.

    Note

    This icon is displayed only when you click a line of code.

4

Features in the upper-right corner:

  • Deploy: You can click Deploy to go to the Create Deploy Task page. On this page, you can view the deployment details of the node or perform O&M operations in the production environment after the node is deployed.

  • Operation Center: You can click this icon to go to Operation Center in the production environment and perform O&M operations.

Shortcut menu related to nodes

You can move the pointer over a node and right-click the name of the node. The following figure shows the shortcut menu that appears, and the following table describes the features supported by the shortcut menu.节点编辑快捷操作

Feature

Description

Rename

This feature allows you to change the name of the node.

Add to Favorites

This feature allows you to add the node to favorites. After you add the node to favorites, you can click My Favorites in the upper-right corner of the Scheduled Workflow pane to view the node. If you want to remove the node from favorites, right-click the node name and select Remove from Favorites.

Move

This feature allows you to move the node to another workflow.

Clone

This feature allows you to clone the node. The new node is of the same type and has the same owner and resource properties as the original node.

Note

The new node has a name that is different from that of the original node.

View Earlier Versions

This feature allows you to view the historical versions and information about each version of the node. The information includes the user that committed the node, the time when the node was committed, change type, status, and remarks.

View in Operation Center

This feature navigates you to Operation Center so that you can view information about the node. If the node is committed to both the development and production environments, you can select View in Operation Center (Production Environment) or View in Operation Center (Development Environment).

Submit for Code Review

This feature allows you to commit the code of the node for review. A node that is committed by a developer must pass the code review before the node can be deployed.

Delete

This feature allows you to delete the node and the dependency configurations of its ancestor and descendant nodes. After you click Delete to delete a node that has been deployed to the production environment, you must go to the Create Deploy Task page, create a deployment package for the node, and then deploy the node. This way, the node is deleted from the production environment. For more information, see Undeploy nodes.