All Products
Search
Document Center

DataWorks:Operation Center

Last Updated:Dec 04, 2024

Operation Center is an end-to-end big data O&M and monitoring platform. Operation Center allows you to view the status of tasks in real time and perform O&M operations on tasks on which exceptions occur. For example, you can perform intelligent diagnostics and rerun tasks. Operation Center provides the intelligent baseline feature that you can use to resolve issues such as the uncontrollable output time of important tasks and difficulties in the monitoring of massive tasks. The intelligent baseline feature helps ensure the timeliness of task output. In addition, Operation Center provides O&M capabilities for compute engines, resources, and scheduling.

Modules in Operation Center

After you develop tasks in DataStudio and commit and deploy the tasks to the production environment, you can perform O&M operations on the tasks in Operation Center. For example, you can run tasks in the production environment, identify task running issues, monitor the task status, and view key task O&M metrics and the task list. The tasks include auto triggered tasks, manually triggered tasks, and real-time tasks.

image

Precautions

Only tasks that are deployed to the production environment can be automatically scheduled to run.

Go to the Operation Center page

Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Development and Governance > Operation Center. On the page that appears, select the desired workspace from the drop-down list and click Go to Operation Center.

Task O&M

The task O&M feature allows you to perform O&M on the following types of tasks: auto triggered tasks, manually triggered tasks, and real-time tasks. You can view key metrics for task running on the O&M dashboard and use the features provided by the O&M assistant, such as data backfill, intelligent diagnostics, and automated O&M, to perform various O&M operations on tasks.

Module

Description

Supported environment

O&M Dashboard

This module provides the Workbench Overview and Data Integration tabs. On the Workbench Overview tab, you can view key O&M metrics of auto triggered tasks and instances in charts and tables. On the Data Integration tab, you can perform O&M operations on batch and real-time synchronization tasks.

This module is not available in Operation Center in the development environment.

Auto Triggered Node O&M

Auto Triggered Nodes

You can perform O&M operations on auto triggered tasks, such as viewing the directed acyclic graph (DAG) of an auto triggered task, testing an auto triggered task, and backfilling data for an auto triggered task.

Tasks cannot be automatically scheduled to generate auto triggered task instances in Operation Center in the development environment.

Auto Triggered Instances

This module displays the instances that are generated for auto triggered tasks after the auto triggered tasks are committed to the scheduling system. You can perform operations such as viewing the DAG, performing diagnostics, and rerunning an auto triggered task instance in the instance list.

Test Instances

This module displays the test instances that are generated after you test auto triggered tasks. You can view the status of test instances and perform operations such as viewing the DAG, performing diagnostics, and rerunning a test instance in the instance list.

Real-time Node O&M

Real-time Computing Nodes

You can start, stop, or undeploy a real-time computing task. You can also configure alert rules for a real-time computing task. This way, you can identify and handle an exception that occurs on the task at the earliest opportunity.

-

Real-time Synchronization Nodes

You can start, stop, undeploy, or change the owner of a real-time synchronization task. You can also configure alert rules for a real-time synchronization task. This way, you can identify and handle an exception that occurs on the task at the earliest opportunity.

-

Manually Triggered Node O&M

Manual Triggered Nodes

You can query manually triggered tasks and perform O&M operations on manually triggered tasks. For example, you can view the DAG of a manually triggered task, manually run the task, and view instances that are generated for the task.

-

Manual Triggered Instances

You can open the DAG of a manually triggered task instance to view the detailed information about the instance and perform operations on the instance, such as viewing run logs, code, and lineage, and performing diagnostics.

-

O&M Assistant

Data Backfill

In this module, you can manage data backfill tasks.

-

Intelligent Diagnosis

The intelligent diagnosis feature is used to perform end-to-end analysis on tasks. You can use this feature to efficiently identify issues. On the Intelligent Diagnosis page, you can click an instance to view information about the instance on the following tabs: Running Details, General, Impact baseline, and Historical instance.

This module is not available in Operation Center in the development environment.

Automatic

The automated O&M feature allows you to configure custom O&M rules. You can configure custom metrics and create an O&M rule for the task instances that are running on a resource group based on your business requirements. If the O&M rule is triggered, the system performs O&M operations on the task instances to implement automated O&M.

-

Note

The following prerequisites must be met before an auto triggered task starts to run:

  • All the instances of the ancestor tasks on which the auto triggered task depends are successfully run.

  • The point in time when the auto triggered task is scheduled to run arrives.

  • The scheduling resources that are required to run the auto triggered task are sufficient.

  • The auto triggered task is not frozen.

In Operation Center, the color of an instance varies based on the status of the instance. For more information, see the Appendix: Instance status and diagnostics section in this topic.

Task monitoring

The task monitoring module provides the intelligent baseline and monitoring and alerting features. You can configure a baseline, which enables DataWorks to identify an exception on a task in the baseline and report an alert about the exception at the earliest opportunity. In addition, you can configure custom alert rules and a shift schedule, and view alert details to handle O&M alerts at the earliest opportunity.

Module

Description

Supported environment

Smart Baseline

You can use the intelligent baseline feature to detect an exception that prevents a task in a baseline from being completed on time. If an exception is detected, the system reports an alert about the exception at the earliest opportunity. This ensures that important data is generated as expected, and helps you reduce configuration costs, prevent invalid alerts, and implement automatic monitoring of important tasks.

This module is not available in Operation Center in the development environment.

Alarm

Rule Management

You can configure custom alert rules to monitor the status or resource usage of specific tasks based on your business requirements. This helps you identify and handle exceptions at the earliest opportunity.

Alert Management

You can view all alerts on the Alert Management page in Operation Center. The alerts include baseline alerts and event alerts that are generated on the Smart Baseline page, alerts that are generated based on custom rules, and alerts that are generated based on global rules.

Schedule

DataWorks provides the shift schedule feature, which allows you to create shift schedules. This way, on-duty engineers can respond at the earliest opportunity when alerts are reported or O&M on instances is required. DataWorks can send alert notifications to the on-duty engineers that you specify for a shift schedule. After the engineers receive the alert notifications, they can identify and handle exceptions at the earliest opportunity.

Others

In addition to task O&M and intelligent monitoring features, DataWorks also allows you to view the job details of the E-MapReduce (EMR) compute engine, monitor the usage of resources in resource groups, and configure custom scheduling parameters to facilitate daily O&M.

Module

Description

Supported environment

Engine Maintenance

You can use the engine O&M feature provided by DataWorks to view the details of each EMR job, find jobs that fail to be run, and remove the failed jobs. This prevents failed jobs from affecting the running of the DataWorks task instances to which the jobs belong and their descendant instances.

This module is not available in Operation Center in the development environment.

Resource

You can view the usage of resource groups and status of task execution on the resource group details page. The resource O&M feature supports intelligent monitoring and automated O&M of resource groups and tasks. This reduces complex manual operations and improves O&M management efficiency.

-

Tenant Schedule Setting

You can configure a scheduling calendar and use workspace-level parameters to specify task scheduling methods in a convenient manner.

-

Appendix: Instance status and diagnostics

In Operation Center, different colors and icons are used to mark the stage and status of an instance. The following table describes the mappings between icons in different colors and states of instances. For information about the prerequisites for a task to run, see Use the Intelligent Diagnosis feature.

No.

Status

Icon

Flowchart

1

Run Successfully

运行成功

运行流程图

2

Not run

未运行

3

Failed To Run

运行失败

4

Running

正在运行

5

Wait time

等待状态

6

Freeze

暂停冻结状态