All Products
Search
Document Center

DataWorks:Configure rules to monitor data quality

Last Updated:Nov 13, 2024

This topic describes how to configure a monitor to monitor the data quality of the dwd_log_info_di_emr table.

Prerequisites

Data is synchronized and processed.

Procedure

Go to the Configure Rules page

  1. Go to the Data Quality page.

    Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Development and Governance > Data Quality. On the page that appears, select the desired workspace from the drop-down list and click Go to Data Quality.

  2. Go to the rule configuration page.

    In the left-side navigation pane of the Data Quality page, choose Configure Rules > Configure by Table. On the Configure by Table page, find the desired table based on the following filter conditions:

    • In the Data Sources section, select E-MapReduce.

    • In the E-MapReduce category, select the current project in the production environment.

    • On the right side of the Configure by Table page, specify filter conditions to find the dwd_log_info_di_emr table for which you want to configure monitoring rules.

  3. Find the desired table in the search results and click Configure Monitoring Rule in the Actions column. The Table Quality Details page of the table appears. The following section describes how to configure monitoring rules for the table.

Configure a monitor

You can use a monitor to check whether the quality of data in the specified range (partition) of a table meets your expectations.

In this step, you must set the Data Range parameter of the monitor to dt=$[yyyymmdd-1]. When the monitor is run, the monitor searches for the data partitions that match the parameter value and checks whether the quality of the data meets your expectations.

In this case, each time the scheduling node that is used to write data to the dwd_log_info_di_emr table is run, the monitor is triggered and the rules that are associated with the monitor are used to check whether the quality of data in the specified range meets your expectations.

You need to perform the following steps:

  1. On the Monitor tab, click Create Monitor.

  2. Configure the parameters of the monitor.

    image

    The following table describes the key parameters.

    Parameter

    Description

    Data Range

    dt=$[yyyymmdd-1]

    Trigger Method

    The trigger method. Set this parameter to Triggered by Node Scheduling in Production Environment and select the dwd_log_info_di_emr node that is created during data processing.

    Monitoring Rule

    You do not need to configure this parameter. The monitoring rules are configured in the Configure monitoring rules section.

    Note

    For more information about how to configure a monitor, see Configure a monitoring rule for a single table.

Configure monitoring rules

The dwd_log_info_di_emr table is used to process the data of the ods_raw_log_d_emr table. To prevent invalid data processing and data quality issues, you need to create and configure a strong rule that monitors whether the number of rows in the dwd_log_info_di_emr table is greater than 0. This rule helps you determine whether the ancestor node writes data to the partitions of the dwd_log_info_di_emr table.

If the number of rows in the related partitions of the dwd_log_info_di_emr table is 0, an alert is triggered, the dwd_log_info_di_emr node fails and exits, and the descendant nodes of the dwd_log_info_di_emr node are blocked from running.

You need to perform the following steps:

  1. In the Monitor Perspective section of the Rule Management tab, select a monitor. In this example, the raw_log_number_of_table_rows_not_0 monitor is selected. Then, click Create Rule on the right side of the tab. The Create Rule panel appears.

    image

  2. On the System Template tab of the Create Rule panel, find the Table is not empty rule and click Use. On the right side of the panel, set the Degree of Importance parameter to Strong Rule.

    Note

    In this example, the rule is defined as a strong rule. This indicates that when the number of rows in the dwd_log_info_di_emr table is found to be 0, an alert is triggered and the descendant nodes are blocked from running.

    image

  3. Click Determine.

    Note

    For information about other parameters configured for a monitoring rule, see Configure a monitoring rule for a single table.

Perform a test run on the monitor

You can perform a test run to verify whether the configurations of the monitoring rules that are associated with the monitor work as expected. To ensure that the configurations of the rules are correct and meet your expectations, perform a test run on the monitor after you create the rules that are associated with the monitor.

image

  1. Click Test Run. The Test Run dialog box appears.

  2. In the Test Run dialog box, configure the Scheduling Time parameter and click Test Run.

  3. After the test run is complete, click View Details to view the test result.

    image

Subscribe to the monitor

Data Quality provides the monitoring and alerting feature. You can subscribe to monitors to receive alert notifications about data quality issues. This way, you can resolve the data quality issues at the earliest opportunity and ensure data security, data stability, and the timeliness of data generation.

image

After the subscription configuration is complete, choose Quality O&M > Monitor in the left-side navigation pane. Then, click My Subscriptions on the Monitor page to view and modify the subscribed monitors.

What to do next

After the data is processed, you can use DataAnalysis to visualize the data. For more information, see Visualize data on a dashboard.