Operation Center is an end-to-end big data O&M and monitoring platform. Operation Center allows you to view the status of tasks in real time and perform O&M operations on tasks on which exceptions occur. For example, you can perform intelligent diagnostics and rerun tasks. Operation Center provides the intelligent baseline feature that you can use to resolve issues such as the uncontrollable output time of important tasks and difficulties in the monitoring of massive tasks. The intelligent baseline feature helps ensure the timeliness of task output. In addition, Operation Center provides O&M capabilities for compute engines, resources, and scheduling.
Modules in Operation Center
After you develop tasks in DataStudio and commit and deploy the tasks to the production environment, you can perform O&M operations on the tasks in Operation Center. For example, you can run tasks in the production environment, identify task running issues, monitor the task status, and view key task O&M metrics and the task list. The tasks include auto triggered tasks, manually triggered tasks, and real-time tasks.
Precautions
Only tasks that are deployed to the production environment can be automatically scheduled to run.
Go to the Operation Center page
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to Operation Center.
Task O&M
The task O&M feature allows you to perform O&M on the following types of tasks: auto triggered tasks, manually triggered tasks, and real-time tasks. You can view key metrics for task running on the O&M dashboard and use the features provided by the O&M assistant, such as data backfill, intelligent diagnostics, and automated O&M, to perform various O&M operations on tasks.
Module | Description | Supported environment | |
This module provides the Workbench Overview and Data Integration tabs. On the Workbench Overview tab, you can view key O&M metrics of auto triggered tasks and instances in charts and tables. On the Data Integration tab, you can perform O&M operations on batch and real-time synchronization tasks. | This module is not available in Operation Center in the development environment. | ||
Auto Triggered Node O&M | You can perform O&M operations on auto triggered tasks, such as viewing the directed acyclic graph (DAG) of an auto triggered task, testing an auto triggered task, and backfilling data for an auto triggered task. | Tasks cannot be automatically scheduled to generate auto triggered task instances in Operation Center in the development environment. | |
This module displays the instances that are generated for auto triggered tasks after the auto triggered tasks are committed to the scheduling system. You can perform operations such as viewing the DAG, performing diagnostics, and rerunning an auto triggered task instance in the instance list. | |||
This module displays the test instances that are generated after you test auto triggered tasks. You can view the status of test instances and perform operations such as viewing the DAG, performing diagnostics, and rerunning a test instance in the instance list. | |||
Real-time Node O&M | You can start, stop, or undeploy a real-time computing task. You can also configure alert rules for a real-time computing task. This way, you can identify and handle an exception that occurs on the task at the earliest opportunity. | - | |
You can start, stop, undeploy, or change the owner of a real-time synchronization task. You can also configure alert rules for a real-time synchronization task. This way, you can identify and handle an exception that occurs on the task at the earliest opportunity. | - | ||
Manually Triggered Node O&M | You can query manually triggered tasks and perform O&M operations on manually triggered tasks. For example, you can view the DAG of a manually triggered task, manually run the task, and view instances that are generated for the task. | - | |
You can open the DAG of a manually triggered task instance to view the detailed information about the instance and perform operations on the instance, such as viewing run logs, code, and lineage, and performing diagnostics. | - | ||
O&M Assistant | In this module, you can manage data backfill tasks. | - | |
The intelligent diagnosis feature is used to perform end-to-end analysis on tasks. You can use this feature to efficiently identify issues. On the Intelligent Diagnosis page, you can click an instance to view information about the instance on the following tabs: Running Details, General, Impact baseline, and Historical instance. | This module is not available in Operation Center in the development environment. | ||
The automated O&M feature allows you to configure custom O&M rules. You can configure custom metrics and create an O&M rule for the task instances that are running on a resource group based on your business requirements. If the O&M rule is triggered, the system performs O&M operations on the task instances to implement automated O&M. | - |
The following prerequisites must be met before an auto triggered task starts to run:
All the instances of the ancestor tasks on which the auto triggered task depends are successfully run.
The point in time when the auto triggered task is scheduled to run arrives.
The scheduling resources that are required to run the auto triggered task are sufficient.
The auto triggered task is not frozen.
In Operation Center, the color of an instance varies based on the status of the instance. For more information, see the Appendix: Instance status and diagnostics section in this topic.
Task monitoring
The task monitoring module provides the intelligent baseline and monitoring and alerting features. You can configure a baseline, which enables DataWorks to identify an exception on a task in the baseline and report an alert about the exception at the earliest opportunity. In addition, you can configure custom alert rules and a shift schedule, and view alert details to handle O&M alerts at the earliest opportunity.
Module | Description | Supported environment | |
You can use the intelligent baseline feature to detect an exception that prevents a task in a baseline from being completed on time. If an exception is detected, the system reports an alert about the exception at the earliest opportunity. This ensures that important data is generated as expected, and helps you reduce configuration costs, prevent invalid alerts, and implement automatic monitoring of important tasks. | This module is not available in Operation Center in the development environment. | ||
Alarm | You can configure custom alert rules to monitor the status or resource usage of specific tasks based on your business requirements. This helps you identify and handle exceptions at the earliest opportunity. | ||
You can view all alerts on the Alert Management page in Operation Center. The alerts include baseline alerts and event alerts that are generated on the Smart Baseline page, alerts that are generated based on custom rules, and alerts that are generated based on global rules. | |||
DataWorks provides the shift schedule feature, which allows you to create shift schedules. This way, on-duty engineers can respond at the earliest opportunity when alerts are reported or O&M on instances is required. DataWorks can send alert notifications to the on-duty engineers that you specify for a shift schedule. After the engineers receive the alert notifications, they can identify and handle exceptions at the earliest opportunity. |
Others
In addition to task O&M and intelligent monitoring features, DataWorks also allows you to view the job details of the E-MapReduce (EMR) compute engine, monitor the usage of resources in resource groups, and configure custom scheduling parameters to facilitate daily O&M.
Module | Description | Supported environment | |
You can use the engine O&M feature provided by DataWorks to view the details of each EMR job, find jobs that fail to be run, and remove the failed jobs. This prevents failed jobs from affecting the running of the DataWorks task instances to which the jobs belong and their descendant instances. | This module is not available in Operation Center in the development environment. | ||
You can view the usage of resource groups and status of task execution on the resource group details page. The resource O&M feature supports intelligent monitoring and automated O&M of resource groups and tasks. This reduces complex manual operations and improves O&M management efficiency. | - | ||
You can configure a scheduling calendar and use workspace-level parameters to specify task scheduling methods in a convenient manner. | - |
Appendix: Instance status and diagnostics
In Operation Center, different colors and icons are used to mark the stage and status of an instance. The following table describes the mappings between icons in different colors and states of instances. For information about the prerequisites for a task to run, see Use the Intelligent Diagnosis feature.
No. | Status | Icon | Flowchart |
1 | Run Successfully | ||
2 | Not run | ||
3 | Failed To Run | ||
4 | Running | ||
5 | Wait time | ||
6 | Freeze |