Getting started with Operation Center

After a node is committed and deployed in the production environment, you can view the node and perform O&M operations on the node in Operation Center. For example, you can test the node and backfill data for the node. This topic describes the basic operations that you can perform on an auto triggered node in Operation Center. You can check whether the configurations of the node meet your requirements, backfill data in a historical period of time for the node, and configure an alert rule for the node to ensure that the node can be scheduled as expected in the future.

Prerequisites

A node named result_table is created and deployed by performing the operations described in Data development: Developers.

Note

This topic uses the result_table node to describe O&M operations. You can perform O&M operations on your node in the same manner.

Background information

In Operation Center, you can perform O&M operations on different types of nodes, such as auto triggered nodes, manually triggered nodes, and real-time synchronization nodes. You can also use different monitoring methods to monitor various objects such as nodes and resources that are used by the nodes. This helps you identify and handle exceptions at the earliest opportunity based on alerts and ensures efficient and stable data generation.

This topic describes only the basic operations that you can perform in Operation Center. You can choose to perform advanced O&M operations based on your business requirements. Examples:

Deploy, undeploy, or freeze a node. For more information, see Perform basic O&M operations on auto triggered nodes.
Manage and control O&M operations that are performed on a node. For more information, see the Manage and control O&M operations (advanced feature) section in this topic.

For more information about Operation Center, see Overview.

Go to the Operation Center page

Log on to the DataWorks console. In the top navigation bar, select the desired region. Then, choose Data Development and Governance > Operation Center in the left-side navigation pane. On the page that appears, select the desired workspace from the drop-down list and click Go to Operation Center.

Procedure

Phase 1: Test and verify the node

Step 1: View the configurations of the node
After you commit and deploy the node in the production environment, we recommend that you go to the Operation Center page to check whether the configurations of the node meet your requirements. The configurations include the scheduling parameters and the resource group for scheduling. If the configurations do not meet your requirements, modify the configurations and deploy the node again.
Step 2: Test the node
Check whether the node is run as expected in the production environment by using the smoke testing feature. If an error occurs during the node execution, handle the error at the earliest opportunity to ensure that the node is run as expected.
Step 3: Backfill data in a historical period of time for the node
You can backfill data in a historical period of time for the node.
Step 4: View the auto triggered node instances generated for the node
After you commit and deploy the node on a day, auto triggered node instances are generated for the node based on the scheduling cycle that you specified for the node. If you set the Instance Generation Mode parameter to Next Day for the node, the auto triggered node instances start to be scheduled on the next day. If you set the Instance Generation Mode parameter to Immediately After Deployment, the auto triggered node instances start to be scheduled on the current day. You can view the auto triggered node instances generated for the node and the status of the instances to check whether the node is scheduled as expected.
Step 5: View data write results
After you test the node or backfill data for the node, you can view data write results.

Phase 2: Monitor the node

Step 6: Create a custom alert rule
You can use the intelligent monitoring feature to configure an alert rule for the node based on your business requirements to monitor the scheduling status of the node to ensure that the node is scheduled as expected.
Step 7: Create a baseline (advanced feature)
To ensure that the node with a higher priority generates data at the specified time, you can create and configure a baseline and add the node to the baseline. This way, if the system detects that the node may fail to finish running before the specified time, the system sends you a notification that describes the exception about the node. This helps you identify and handle the exception at the earliest opportunity.
Step 8: Create a custom alert rule for a resource group and associate an automated O&M rule with the custom alert rule
You can create a custom alert rule for an exclusive resource group. When you configure the custom alert rule, you can specify the alert conditions such as the resource usage of the exclusive resource group and the maximum number of instances that are waiting for resources in the resource group. If the custom alert rule is triggered, the system performs O&M operations based on the automated O&M rule that you associate with the custom alert rule.

Step 1: View the configurations of the node

After you commit and deploy the node in the production environment, we recommend that you go to the Operation Center page to check whether the configurations of the node meet your requirements. The configurations include the scheduling parameters and dependencies.

Go to the Operation Center page.
Find the desired node.
1. In the left-side navigation pane, choose Auto Triggered Node O&M > Auto Triggered Nodes.
2. On the page that appears, search for the node.
View the node details.
1. Click the node name. The directed acyclic graph (DAG) of the node appears.
2. Click Show Details in the lower-right corner of the DAG of the node to view the node details.

Note

For information about the operations that you can perform on auto triggered nodes, see View and manage auto triggered tasks.
If the configurations do not meet your requirements, go to the configuration tab of the node on the DataStudio page, modify the configurations, and then deploy the node again. For more information, see Data development: Developers.

In this example, you can find the deployed result_table node in the node list and check whether the scheduling parameters and the resource group for scheduling are correctly configured for the node. 查看周期任务配置

Step 2: Test the node

You can use the smoke testing feature to check whether the node is run as expected in the production environment. The code of the node is run during the test.

Open the Test dialog box.
You can use one of the following methods to open the Test dialog box:
- Method 1: In the node list, find the desired node and click Test in the Actions column.
- Method 2: In the DAG of the node, right-click the node name and select Test.
In the Test dialog box, configure the data timestamp and the time at which the node is run and click OK.
When you test the node, a test instance is generated for the node. You can view the status of the test instance on the Test Instances page.
Note
- For more information about the smoke testing feature, see Perform smoke testing.
- For more information about how to view test instances, see Test an auto triggered task and view test instances generated for the task.

In this example, the smoke testing feature is used to check whether the result_table node is run as expected in the production environment. You can test the node and view the execution status of the test instance generated for the node by performing the operations shown in the following figure. 测试周期任务

Step 3: Backfill data in a historical period of time for the node

After you develop the node, and commit and deploy the node in the production environment, the node is scheduled based on the scheduling settings. If you want to calculate data in a historical period of time for the node again, you can use the data backfill feature to backfill the data for the node.

Go to the Backfill Data panel.
You can use one of the following methods to go to the Backfill Data panel:
- Method 1: In the node list, find the desired node and click Backfill Data in the Actions column.
- Method 2: In the DAG of the node, right-click the node name and select Run.

Select a mode in which you want to backfill data for the node.

You can select a data backfill mode based on your business requirements.

Node selection method	Description	Scenario

Node selection method	Description	Scenario
Manually Select	Select one or more nodes as root nodes. This way, you can manually select specific descendant nodes of the root nodes for which you want to backfill data. Note The original plans of backfilling data for the current node, backfilling data for the current node and its descendant nodes, and backfilling data in advanced mode are compatible with this method. You can select up to 500 root nodes and up to 2,000 total nodes. The total nodes consist of root nodes and their descendant nodes.	This method can be used to backfill data for the current node and its descendant nodes at a time. This method can be used to backfill data for multiple nodes that may not have dependencies with each other at a time.
Select by Link	Select a start node as the root node and one or more end nodes. Then, the system automatically determines that all nodes from the start node to the end node require data backfilling.	This method can be used to perform end-to-end data backfilling for nodes for which complex dependencies are configured.
Select by Workspace	Select a node as the root node, and determine the nodes for which you want to backfill data based on the workspaces to which descendant nodes of the root node belong. Note The original plan of backfilling data for massive nodes is compatible with this method. You can select up to `20,000` nodes. You cannot configure a node blacklist.	This method is suitable for scenarios in which descendant nodes of the current node belong to different workspaces and you want to backfill data for the descendant nodes.
Specify Task and All Descendant Tasks	Select a root node. Then, the system automatically determines that the root node and all its descendant nodes require data backfilling. Important You can view the nodes that are triggered to run only if the data backfill task is running. Proceed with caution.	This method can be used to backfill data for a root node and all its descendant nodes.

Configure the data backfill parameters.
For example, you can configure the data timestamp and the node for which you want to backfill data based on your business requirements. The data backfill parameters vary based on the data backfill mode. For more information, see Backfill data and view data backfill instances (new version).

In this example, the data backfill mode Backfill Data for Current Node is selected. Data generated in the time period from 00:00 to 01:00 every day from 2024-09-17 to 2024-09-19 is backfilled for the result_table node. You can backfill data for the node by performing the operations shown in the following figure.

Note

After the data backfill, the scheduling system replaces the variables in the code of the node with actual values based on the scheduling parameters and data timestamp that you specified.

补数据

Step 4: View the auto triggered node instances generated for the node

After you commit and deploy the node on a day, auto triggered node instances are generated for the node based on the scheduling cycle that you configured for the node. If you set the Instance Generation Mode parameter to Next Day for the node, the auto triggered node instances start to be scheduled on the next day. If you set the Instance Generation Mode parameter to Immediately After Deployment, the auto triggered node instances start to be scheduled on the current day. You can view the auto triggered node instances generated for the node to check whether the node is scheduled as expected.

Go to the Auto Triggered Instances page.
In the left-side navigation pane of the Operation Center page, choose Auto Triggered Node O&M > Auto Triggered Instances.
View the auto triggered node instances generated for the node.
Check whether the auto triggered node instances are generated for the node based on the scheduling settings and check whether the instances are run as expected. For more information about auto triggered node instances, see View auto triggered instances.
If an auto triggered node instance generated for the node is in the Pending (Ancestor) state, you can troubleshoot the issue by performing the following operations:
1. Use the upstream analysis feature provided in the DAG of the instance to quickly identify ancestor instances that block the running of the current instance.
2. Use the intelligent diagnosis feature to diagnose failure causes or related issues of the ancestor instances. The intelligent diagnosis feature can also be used to quickly troubleshoot issues if dependencies between the current instance and ancestor instances are complex. This improves O&M efficiency.

In this example, you can view the status of the auto triggered node instances generated for the result_table node on September 19, 2024. The result_table node is scheduled by hour. 查看周期实例

Step 5: View data write results

After you test the node or backfill data for the node, you can use one of the following methods to view data write results:

View data write results in Data Map.
You can go to the homepage of Data Map and search for the desired table on this page to go to the details page of the table. Then, you can check whether data written to the table is correct on the details page of the table. For more information about how to search for a table, see Search for tables. For more information about how to view the details of a table, see View the details of a table.
Create an ad hoc query in the Ad Hoc Query pane of the DataStudio page to view data write results.
If you need to query the data and related SQL code, and check whether the running result of test code is consistent with the expected result and whether the code is valid only on the DataStudio page, which is the development environment, you can create an ad hoc query. In this case, you do not need to deploy the data or SQL code to the production environment and perform operations on compute engines in the production environment.

Note

By default, a RAM user does not have the required permissions to query MaxCompute tables in the production environment. If you want to query a MaxCompute table in the production environment as a RAM user, you can go to the details page of the table in Data Map to request the query permissions. For more information, see Request permissions on tables.
When the node is run on the DataStudio page, data is written to a project in the development environment. After the node is deployed in the production environment, data is written to a project in the production environment. When you query table data, confirm the environment of the project to which the table belongs. You can go to the Computing Reource page in DataStudio to view the information about a project.
MaxCompute allows you to access tables across projects. For example, you can access tables across MaxCompute projects that are associated with your workspace, and you can access tables in a project in the production environment from the development environment. Some other types of compute engines do not allow you to access tables across projects. The features of a compute engine determine whether you can access tables across projects.

In this example, the result_table node is in the workspace that corresponds to the MaxCompute project named mc_test_project in the production environment. You can create an ad hoc query node of the ODPS SQL type and execute SQL statements to query the partition data in the mc_test_project.result_table table. 查看表数据

Step 6: Create a custom alert rule

After the node is tested and verified, you can create a custom alert rule for the node to monitor the status of the node. If an exception occurs during the node running, the system sends you an alert notification based on the alert configurations. This helps you identify and handle the exception at the earliest opportunity and ensures that the node can be scheduled in the future.

Go to the Operation Center page.
In the left-side navigation pane, choose Alarm > Rule Management.
Create a custom alert rule.
1. On the page that appears, click Create Custom Rule.
2. Configure the parameters for the rule.
  You can configure the custom alert rule based on your business requirements. For more information, see Create a custom alert rule.
  In this example, a custom alert rule is configured for the result_table node. An alert notification is sent if the node fails to run. You can configure the custom alert rule based on your business requirements by configuring parameters shown in the following figure. The Test rules custom alert rule is triggered if the result_table node fails to run. An alert notification is sent to the node owner by text message. The alert notification can be sent for a maximum of three times at an interval of 30 minutes.
  Note
  You must configure the information about the alert contact in advance. For more information, see Configure and view alert contacts.

Step 7: Create a baseline (advanced feature)

To ensure that the node generates data at the specified time, you can create and configure a baseline and add the node to the baseline. Then, you can configure the priority and committed completion time for the baseline. DataWorks calculates the estimated completion time of the node based on the execution situations of the node during a historical period of time. Nodes with a higher priority in the baseline can preferentially use scheduling resources. If the system detects that the node may fail to finish running before the committed completion time, the system sends you an alert notification. You can troubleshoot the issue based on the alert.

Go to the Operation Center page.
In the left-side navigation pane, click Smart Baseline.
Create a baseline.
1. On the Baselines tab, click Create Baseline.
2. Configure the parameters for the baseline.
  You can configure the baseline based on your business requirements. For more information, see the Create a baseline section of the "Manage baselines" topic.
  In this example, an hour-level baseline is configured and the result_table node is added to the baseline. The baseline can monitor data generation of the node each hour. You can configure the baseline by configuring parameters shown in the following figure. Descriptions of some parameters:
  - Priority: A large value indicates a high priority. A node with a higher priority in a baseline can preferentially use scheduling resources when the resources are in shortage.
  - Estimated Finish Time:: The system estimates the time at which a node finishes running based on the completion time of the node in a historical period of time.
  - Committed Finish Time: You can specify the latest time at which a node must generate data. You can configure this parameter based on your business requirements and the completion time of the node in a historical period of time.
  - Alert Margin Threshold: You can configure this parameter based on the Committed Finish Time parameter. This parameter allows you to have time to handle node exceptions to ensure that the node can finish running at the committed completion time.
    Note
    The time difference between the alert time and committed completion time must be at least 5 minutes.
  If the instances generated for the result_table node cannot finish running within 30 minutes of each hour, the Test Baselines baseline alert is triggered, and an alert notification is sent to the node owner by text message. The alert notification can be sent for a maximum of three times at an interval of 30 minutes.

Step 8: Create a custom alert rule for a resource group and associate an automated O&M rule with the custom alert rule

If you use an exclusive resource group to run nodes, you can create a custom alert rule for the resource group and associate an automated O&M rule with the custom alert rule to enable automated O&M based on your business requirements. When you configure the custom alert rule, you can specify the alert conditions such as the resource usage of the exclusive resource group and the maximum number of instances that are waiting for resources in the resource group. If the custom alert rule is triggered, the system performs O&M operations based on the automated O&M rule that you associate with the custom alert rule.

The automated O&M feature works by associating an automated O&M rule with a custom alert rule that is configured for an exclusive resource group. You can specify the alert conditions for nodes that are run on the exclusive resource group in the custom alert rule and configure the automated O&M rule based on your business logic. If an auto triggered node instance that meets the filter conditions specified in the automated O&M rule hits the alert conditions, the custom alert rule is triggered and automated O&M operations are performed.

Note

Only exclusive resource groups for scheduling support the automated O&M feature.
To prevent slow node running due to insufficient resources, you can move your node to an exclusive resource group for scheduling for running. For information about how to change the resource group used by nodes, see General reference: Change the resource groups used by tasks.

Go to the Operation Center page.
Create a custom alert rule for a resource group.
1. In the left-side navigation pane, choose Alarm > Rule Management.
2. Create and configure a custom alert rule for a resource group.
  The alert rule configurations for a resource group are similar to those for a node. The only difference is that you need to set the Object Type parameter to Exclusive Resource Group for Scheduling. For more information, see Create a custom alert rule.
  In this example, a custom alert rule is configured for the Exclusive_Scheduling_Resource resource group to monitor the resource usage in the resource group. The following figure shows the parameters that you need to configure.
  Note
  This topic provides only a configuration example for you. You can configure a custom alert rule for your resource group based on your business requirements.
  The Resource group monitoring rules alert rule is triggered when the resource usage in the Exclusive_Scheduling_Resource resource group is greater than 90% for 10 minutes. The system sends an alert notification to a specified alert contact by text message. The alert notification can be sent for a maximum of three times.
Configure an automated O&M rule based on the custom alert rule that is configured for the resource group.
1. In the left-side navigation pane, choose O&M Assistant > Automatic.
2. On the Rules tab of the page that appears, click Create Rule.
3. Configure the parameters for the rule.
  You can configure the automated O&M rule based on your business requirements. For more information, see Automated O&M.
  In this example, an automated O&M rule named Automatic_test is created and the automated O&M rule is associated with the Resource group monitoring rules custom alert rule that is configured for the exclusive resource group for scheduling named Exclusive_Scheduling_Resource. If the custom alert rule is triggered, DataWorks performs O&M operations on the instances that meet the filter conditions specified in the automated O&M rule. The following figure shows the parameters that you need to configure. Descriptions of some parameters:
  - Associated Monitoring Rule: You can associate the current automated O&M rule only with a custom alert rule that is configured for an exclusive resource group for scheduling. You must create a custom alert rule and set the Object Type parameter to Exclusive Resource Group for Scheduling in advance.
  - O&M Operation: You can set this parameter only to Terminate Running Instance. After the automated O&M rule is triggered, node instances that meet the filter conditions are terminated.
  In this example, when the resource usage in the Exclusive_Scheduling_Resource resource group is greater than 90% for 10 minutes, DataWorks terminates the instances whose priority is 1 and scheduling cycle is hour or minute from all auto triggered node instances, test instances, and data backfill instances that are run on the resource group in a specified workspace.

Manage and control O&M operations (advanced feature)

In Operation Center, you can perform operations on a node. For example, you can freeze, unfreeze, backfill data for, or undeploy a node. The different types of operations can be considered as extension point events. You can use extension point events with extensions to customize the processing logic for and O&M operations on nodes. For more information, see Extension overview and Trigger event checking in Operation Center.

What to do next

You can configure data quality monitoring rules for the table data that is generated by the node to ensure that the data output meets your expectations. For more information, see Data Quality.

Prerequisites

Background information

Go to the Operation Center page

Procedure

Phase 1: Test and verify the node

Phase 2: Monitor the node

Step 1: View the configurations of the node

Step 2: Test the node

Step 3: Backfill data in a historical period of time for the node

Step 4: View the auto triggered node instances generated for the node

Step 5: View data write results

Step 6: Create a custom alert rule

Step 7: Create a baseline (advanced feature)

Step 8: Create a custom alert rule for a resource group and associate an automated O&M rule with the custom alert rule

Manage and control O&M operations (advanced feature)

What to do next

Sales Support

Technical Support

Connect & Report Abuse

About Alibaba Cloud

Our Global Network

Quick Start

Global Offices

Olympic Games Paris 2024 New

Stade Roland Garros – Glitz from the Past New

Place de la Concorde – “Breaking” the Barriers New

Vaires-sur-Marne Nautical Stadium – Sports with Sustainability New

International Broadcast Center – Images, Sounds, and Data that Captivate Billions New

Customer Success Stories New

Trust Center

Security & Compliance Center

Cloud Compliance Resources

Security Compliance FAQs

Product & Feature Update New

Cloud Forward

Press Room

Alibaba Cloud e-Magazine New

Alibaba Cloud in Analyst Research

Notice

Go Global Service New

Go Global Alliance with Alibaba Cloud

Asia Accelerator Hot

Information Compliance

China Gateway - MLPS 2.0 Compliance New

China Gateway - Networking

China Gateway - Global Application Acceleration New

China Gateway - Security

China Gateway - Data Security New

ICP Support Hot

China Gateway - Omnichannel Data Mid-End New

China Gateway - Organizational Data Mid-End New

China Gateway - Business Mid-End New

China Gateway - AI Service for Conversational Chatbots New

China Gateway - Online Education

China Gateway - Domain Registration

Work at Alibaba Cloud

Experienced Professionals

Students and Graduates

Free Trial

Pricing

Promo Center

Price Reduction

Pay Less and Deploy More

FinOps

Elastic Compute Service (ECS)

Simple Application Server (SAS)

Elastic GPU Service

Elastic Desktop Service (EDS)

Object Storage Service (OSS)

Cloud Enterprise Network (CEN)

Web Application Firewall (WAF)

Domain Names

Container Compute Service (ACS)

Secure Access Service Edge (SASE)

Intelligent Media Services(IMS)

Edge Security Acceleration (ESA)(Original DCDN)

Intelligent Media Management

DingTalk Enterprise

YiDA

Alibaba Cloud Model Studio

Apsara Prime - For Easy Cloud Product Selection

Alibaba Cloud ECS - Cater All Your Cloud Hosting Needs

1TB CDN—Get Free 1 TB Outbound Traffic Plan Now

Security—Under Attack? Get Free Security Support

Short Message Service - Free Testing is Available

Elastic Compute Service (ECS) Hot

CloudBox

Compute Nest

Dedicated Host Hot

ECS Bare Metal Instance

Elastic GPU Service Featured

Simple Application Server (SAS) Hot

Auto Scaling

Cloud Phone Beta

Elastic Desktop Service (EDS) Featured

Batch Compute

Elastic High Performance Computing (E-HPC)

Super Computing Cluster (SCC)

Function Compute (FC)