All Products
Search
Document Center

DataWorks:Configure rules for a single table

Last Updated:Feb 27, 2026

Data Quality enables you to configure monitoring rules for data tables. These rules verify whether your table data meets specified requirements and can automatically block problematic tasks to prevent dirty data from propagating downstream. This ensures that output data conforms to expectations. This topic describes how to configure, execute, and manage quality monitoring rules for a table.

Prerequisites

You must acquire engine metadata before configuring quality monitoring rules. Quality rules are based on engine data tables and apply to the corresponding table data. For more information, see Metadata acquisition.

Limits

  • Data source limits: You can configure quality monitoring rules only for MaxCompute, E-MapReduce, Hologres, CDH Hive, AnalyticDB for PostgreSQL, AnalyticDB for MySQL, StarRocks, MySQL, SQL Server, DLF, and Lindorm data sources.

  • Network limits: After you configure a rule, the scheduling node that generates the table data must use a resource group with a stable network connection to trigger the Data Quality rule check.

  • Rule activation limits: Rules with dynamic thresholds require 21 days of sampling records to function correctly. If fewer than 21 days of records exist, the rule check will be abnormal. If you lack 21 days of sampling records, you can configure the rule, associate it with a scheduling node, and then use the data backfill feature to generate the required 21 days of records.

Core components of quality monitoring

image

Configuring quality monitoring rules by table is the core process for defining and instantiating data validation logic. This process creates a complete quality monitoring configuration consisting of four key parts:

  1. Monitoring scope: Specifies the target asset for data quality checks. The configuration includes:

    • Monitored object: Select one or more physical tables to check. Both partitioned and non-partitioned tables are supported.

    • Timestamp range: For partitioned tables, you must use a partition filter expression to dynamically scan partitions during each check. For example, use $[yyyymmdd-1] to check the partition data from the day before the data timestamp.

  2. Quality rules: Define the specific validation logic and standards to determine if the data meets expectations.

    • Rule definition: You can add one or more quality rules to a monitored object. Each rule is instantiated from a rule template, which can be:

      • System template: A built-in template provided by DataWorks. It covers multiple dimensions such as integrity, uniqueness, and validity. Examples include "Table Row Count Fluctuation" and "Field Unique Value Count".

      • Custom template: A reusable, personalized validation logic created by users with SQL.

    • Rule properties: Each rule requires key properties to be configured. These include a threshold (for example, fluctuation rate not exceeding 30%) and a severity level (strong rule or soft rule). If a strong rule check fails, it can block the associated scheduling node.

  3. Trigger methods: Define when the quality monitoring job runs.

    • Triggered by a scheduling node: Associate the quality monitoring job with an upstream DataWorks scheduling node—typically the node that generates the monitored table. When the scheduling node runs successfully, it automatically triggers the associated quality rules for validation. This is a best practice for automated data quality assurance.

    • Manual trigger: The validation process is not associated with any scheduling node and must be started manually from the interface. This method is suitable for temporary, one-time data exploration and validation.

  4. Alert policies: Configure the notification strategy for when data quality issues occur.

    • Alert subscription: You can configure alerts for specific rule check results, such as "Failed" or "Warning". The system supports sending notifications through various channels, including email, text message, phone call, DingTalk, Lark, WeCom group chatbots, and custom Webhooks.

After you configure these four components and save the settings, a complete quality monitoring plan is created. We recommend testing the configuration before publishing it to the production environment.

Procedure

1. Go to the table quality details page

  1. Go to the Data Quality page.

    Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Governance > Data Quality. On the page that appears, select the desired workspace from the drop-down list and click Go to Data Quality.

  2. Go to the page for configuring monitoring rules by table.

    In the navigation pane on the left, click Configure Rules > Configure by Table to go to the rule configuration page.

    1. In the Data Source list on the left, select the database that contains the table for which you want to configure a rule.

    2. Filter the tables by database type, database, or table name. Click the target table name or click Rule Management in the Actions column. This takes you to the table quality details page for that table.

      This page displays all configured quality monitoring jobs and rules for the current table. You can quickly filter rules based on whether they are associated with a quality monitoring job. You can also define the execution method for rules that are not yet associated with a quality monitoring job.

      image

2. Create a quality monitoring job

  1. Create a new quality monitoring job.

    You can create a quality monitoring job in one of two ways:

    Rule management page

    On the Table Quality Details page for the table, click the Rule Management tab. Next to Monitor Perspective, click the image icon to create a new quality monitoring job.

    image

    Quality monitoring page

    On the Table Quality Details page for the table, switch to the Monitor tab. Click Create Monitor.

    image

  2. Configure the parameters for the quality monitoring job.

    Configuration item

    Parameter

    Description

    Basic Configurations

    Monitor Name

    Enter a custom name for the monitoring rule.

    Quality Monitoring Owner

    You can specify the owner of the monitor as needed. When you configure alert subscriptions, you can specify the monitor owner as the alert recipient by using Email, Email and SMS, or Telephone.

    Monitored Object

    The object for data quality checks. By default, this is the current table.

    Data Range

    Use a partition filter expression to define the partitions to be checked by the quality rule.

    • For a non-partitioned table, you do not need to configure this parameter. All data in the table is checked by default.

    • Partitioned table: The expression format is partition_name=partition_value. The partition value can be a static field or a built-in partition filter expression from Appendix 2.

    Note

    This configuration does not take effect when you use a custom template or custom SQL to configure rules. For rules configured with a custom template or custom SQL, the partitions to be checked are determined by the custom SQL.

    Monitoring Rule

    Monitoring Rule

    Associate quality rules with the quality monitoring job to determine which rules will check if the data in the current timestamp range meets expectations.

    Note
    • You can create multiple quality monitoring jobs for different partitions and associate them with different quality rules. This lets you apply different validation rules to different partitions.

    • If you have not yet created a quality rule, you can skip this step for now. Create the quality monitoring job first, and then add the rule to it later. For more information about how to create a quality rule, see 3. Configure Data Quality rules.

    Running Settings

    Trigger Method

    The trigger method for the monitor.

    • Triggered by Node Scheduling in Production Environment: After the scheduling node that you associate with the monitor finishes running in Operation Center, the rules that are associated with the monitor are automatically triggered. Note that dry-run nodes do not trigger monitoring rules to run.

    • Triggered Manually: The monitoring rules that are associated with the monitor are manually triggered.

    Important

    If the table whose data quality you want to check is a non-MaxCompute table and Triggered By Node Scheduling In Production Environment is selected for Trigger Method, you cannot associate scheduling nodes that are run on the shared resource group for scheduling with the monitor. Otherwise, an error may be reported when the monitor is run.

    Associated Scheduling Node

    If you set the Trigger Method parameter to Triggered By Node Scheduling In Production Environment, you can configure this parameter to select the scheduling nodes that you want to associate with the monitor. After the scheduling nodes finish running, the rules that are associated with the monitor are automatically triggered.

    Running Resources

    The computing resources required to run the quality rule checks. By default, the data source of the monitored table in the workspace is selected. If you select another data source, make sure the corresponding resources can access the table.

    Handling Policies

    Quality Issue Handling Policies

    Configure the blocking or alerting policy to be used when a data quality issue is detected.

    • Block: When a data quality issue is detected, the system identifies the production scheduling node that triggered the table's quality check. It then sets the node to failed, and downstream nodes will not run. This blocks the production pipeline to prevent the spread of problematic data.

      The default is Strong Rule - Critical Anomaly.

    • Alert: When a data quality issue is detected, an alert message is sent to the alert subscription channels of the quality monitoring job.

      The defaults are: Strong Rule - Critical Anomaly, Strong Rule - Warning Anomaly, Strong Rule - Check Failed, Soft Rule - Critical Anomaly, Soft Rule - Warning Anomaly, and Soft Rule - Check Failed.

    Alert Method Configuration

    You can send alert notifications by using Email, Email and SMS, DingTalk Chatbot, DingTalk Chatbot @ALL, Lark Group Chatbot, Enterprise Wecha Robot, Custom WebHook, or Telephone.

    Note
    • You can add a DingTalk chatbot, Lark chatbot, or WeChat chatbot and obtain a webhook URL. Then, copy the webhook URL to the Recipient field in the alert subscription dialog box.

    • The Custom Webhook notification method is supported only in DataWorks Enterprise Edition. For information about the message format of an alert notification sent by using a Custom Webhook, see Appendix: Message format of alert notifications sent by using a custom webhook URL.

    • When you select Email, Email and SMS, or Telephone as the notification method, you can specify Recipient as Monitor Owner, Shift Schedule, or Node Owner.

      • Data Quality Monitoring Owner: Alert information will be sent to the Quality Monitoring Owner set in the Basic Configurations section of the current quality monitor.

      • Shift Schedule: When the monitoring rule associated with the monitor is triggered and an alert is generated, the system sends alert notifications to the person on duty for the current day in the shift schedule.

      • Scheduling Task Owner: Alert notifications are sent to the owner of the scheduling node associated with the monitor.

  3. Click Save to create the quality monitoring job.

3. Configure Data Quality rules

Note

You can configure quality rules based on built-in table-level and field-level monitoring templates. For more information about built-in rule templates, see View built-in rule templates.

  1. On the Table Quality Details page, on the Rule Management tab, select the quality monitoring job you created. Then, click Create Rule to go to the rule configuration page.

  2. Create a Data Quality rule.

    Data Quality provides the following methods to configure quality monitoring rules. Choose one as needed.

    Method 1: Use a system template

    Data Quality has dozens of built-in quality rule templates. On the left, click + Use to quickly create a quality monitoring rule from a template. You can add multiple rules at the same time.

    You can click + System Template Rule at the top and then modify the Rule Template parameter to select the target rule template.

    System rule template parameters

    Parameter

    Description

    Rule Name

    The name of the monitoring rule.

    Template

    Define the type of rule validation that needs to be performed on the table.

    Data Quality provides many built-in table-level and field-level rule templates that are ready for use. For more information, see View built-in rule templates.

    Note

    You can configure field-level monitoring rules of the following types only for numeric fields: average value, sum of values, minimum value, and maximum value.

    Rule Scope

    The application scope of the rule. For a table-level monitoring rule, the application scope is the current table by default. For a field-level monitoring rule, the application scope is a specific field.

    Comparison Method

    The comparison method that is used by the rule to check whether the table data is as expected.

    • Manual Settings: You can configure the comparison method to compare the data output result with the expected result based on your business requirements.

      You can select different comparison methods for different rule templates. You can view the comparison methods that are supported by a rule template in the DataWorks console.

      • For numeric results, you can compare a numeric result with a fixed value, which is the expected value. The following comparison methods are supported: Greater Than, Greater Than Or Equal To, Equal To, Not Equal To, Less Than, and Less Than Or Equal To. You can configure the normal data range (normal threshold) and abnormal data range (red threshold) based on your business requirements.

      • For fluctuation results, you can compare a fluctuation result with a fluctuation range. The following comparison methods are supported: Absolute Value, Raise, and Drop. You can configure the normal data range (normal threshold) based on your business requirements. You can also define data output exceptions (orange threshold) and unexpected data outputs (red threshold) based on the degree of abnormal deviation.

    • Intelligent Dynamic Threshold: If you select this option, you do not need to manually configure the fluctuation threshold or expected value. The system automatically determines the reasonable threshold based on intelligent algorithms. If abnormal data is detected, an alert is immediately triggered or the related task is immediately blocked. When the Comparison Method parameter is set to Intelligent Dynamic Threshold, you can configure the Degree of importance parameter.

      Note

      Only monitoring rules that you configure based on a custom SQL statement, a custom range, or a dynamic threshold support the intelligent dynamic threshold comparison method.

    Monitoring Threshold

    • If you set the Comparison Method parameter to Manual Settings, you can configure the Normal Threshold and Red Threshold parameters.

      • Normal Threshold: If the data quality check result meets the specified condition, the data output is as expected.

      • Red Threshold: If the data quality check result meets the specified condition, the data output is not as expected.

    • If the rule that you configure is a rule of the Intelligent Dynamic Threshold, you must configure the Orange Threshold.

      • Orange Threshold: If the data quality check result meets the specified condition, the data is abnormal but your business is not affected.

    Retain problem data

    If the monitoring rule is enabled and a data quality check based on the rule fails, the system automatically creates a table to store the problematic data that is identified during the data quality check.

    Important
    • The Retain problem data parameter is available for MaxCompute and Hologres tables.

    • The Retain problem data parameter is available only for specific monitoring rules in Data Quality.

    • If you Disable the monitoring rule, problematic data is not stored.

    Status

    Specifies whether to Enable or Disable the rule in the production environment.

    Important

    If you Disable the rule, the rule cannot be triggered to perform a test run or triggered by the associated scheduling nodes.

    Degree of importance

    The strength of the rule in your business.

    • Strong rules are important rules. If you set the parameter to Strong rules and the critical threshold is exceeded, the scheduling node that you associate with the monitor is blocked by default.

    • Weak rules are regular rules. If you set the parameter to Weak rules and the critical threshold is exceeded, the scheduling node that you associate with the monitor is not blocked by default.

    Configuration Source

    The source of the rule configuration. The default value is Data Quality.

    Description

    You can add additional descriptions to the rule.

    Method 2: Use a custom template

    Note

    Before you use this method to create a rule, you must go to Quality Assets > Rule Template Library to create a custom rule template. For more information, see Create and manage custom rule templates.

    When you reference a custom rule template, the basic configurations of the template, such as FLAG parameter and SQL, are automatically displayed. You can configure the Rule Name parameter based on your business requirements, and the Monitoring Threshold parameter based on the rule type. For example, you must define a normal threshold and a critical threshold for a numeric rule, and you must define a warning threshold in addition to a normal threshold and a critical threshold for a fluctuation-type rule.

    Custom rule template parameters

    Only the parameters that are unique to rules based on custom rule templates are described in the following table. For information about other parameters, see the parameters for configuring a rule based on a built-in rule template.

    Parameter

    Description

    FLAG parameter

    The SET statement that you want to execute before the SQL statement in the rule is executed.

    SQL

    The SQL statement that determines the complete check logic. The returned results must be numeric and consist of one row and one column.

    In the custom SQL statement, enclose the partition filter expression in brackets []. Example:

    SELECT count(*) FROM ${tableName} WHERE ds=$[yyyymmdd];
    Note
    • In this statement, the value of the ${tableName} variable is dynamically replaced with the name of the table for which you are configuring monitoring rules.

    • For information about how to configure a partition filter expression, see the Appendix 2: Built-in partition filter expressions section in this topic.

    • If you have created a monitor for the table, the setting of the table partition that you specify in the Data Range parameter during the monitor configuration no longer takes effect for the table after you configure this parameter. The rule determines the table partition to be checked based on the setting of WHERE in the SQL statement.

    Method 3: Use a custom SQL statement

    This method lets you customize the data quality validation logic for the table.

    Custom SQL parameters

    Only parameters unique to custom SQL are shown here. For explanations of other parameters, see the system rule template parameter descriptions.

    Parameter

    Description

    FLAG parameter

    The SET statement that you want to execute before the SQL statement in the rule is executed.

    SQL

    The SQL statement that determines the complete check logic. The returned results must be numeric and consist of one row and one column.

    In the custom SQL statement, enclose the partition filter expression in brackets []. Example:

    SELECT count(*) FROM <table_name> WHERE ds=$[yyyymmdd];
    Note
    • You must replace <table_name> with the name of the table for which you are configuring monitoring rules. The SQL statement determines the table that needs to be monitored.

    • For information about how to configure a partition filter expression, see the Appendix 2: Built-in partition filter expressions section in this topic.

    • If you have created a monitor for the table, the setting of the table partition that you specify in the Data Range parameter during the monitor configuration no longer takes effect for the table after you configure this parameter. The rule determines the table partition to be checked based on the setting of WHERE in the SQL statement.

    Method 4: Use a custom script

    Custom script rules support data validation at the hour and minute level. For information on how to write script rules, see Use a system rule template. For example:

    - assertion: change 30 minutes ago for max(id) = 15
      name: 30-minute difference in max value of id field is 15

    image

  3. (Optional) You can add the configured rule to a quality monitoring job. For more information about quality monitoring jobs, see 2. Create a quality monitoring job.

    Note

    The configured monitoring rule can be triggered only if you add the rule to a monitor. To associate a rule with a monitor, you can select an existing monitor here, or select the rule in the Monitoring Rule section when you configure a monitor.

    image

  4. Click Determine.

4. Test the rule execution

You can test the triggering of rules in a quality monitoring job in the following ways.

Test run from the Rule Management tab

  1. On the Rule Management tab, in the Monitor Perspective, find the quality monitoring job you created and click Test Run.

    image

  2. In the Test Run dialog box, check the configurations of parameters, such as Data Range and Scheduling Time, and click Test Run. If Started is displayed, you can click View Details to view results of the test run.

    image

Test run from the Monitor tab

  1. On the Monitor tab, find the created monitor and choose More > Subscribe To Alerts in the Actions column.

    image

  2. In the Test Run dialog box, check the configurations of parameters, such as Data Range and Scheduling Time, and click Test Run. If Started is displayed, you can click View Details to view results of the test run.

    image

5. Modify alert subscriptions

You set up alert subscriptions in Step 2. Create a quality monitoring job. When a rule is triggered, the system sends a notification to the corresponding alert recipient. If you want to modify the alert subscription to notify other users, you can configure it in the following ways.

Subscriptions on the Rule Management tab

  1. On the Rule Management tab, in the Monitor Perspective, find the quality monitoring job you created and open the alert subscription page as shown below.

    image

  2. In the Alert Subscription dialog box, add a Notification Method and a Recipient, and then click Save in the Actions column. After you save the configurations, you can configure another subscription with a different notification method and alert recipient.

    Data Quality supports the following notification methods: Email, Email and SMS, DingTalk Chatbot, DingTalk Chatbot @ALL, Lark Group Chatbot, Enterprise Wechat Robot, Custom Webhook, and Telephone.

    Note
    • You can add a DingTalk chatbot, Lark chatbot, or WeChat chatbot and obtain a webhook URL. Then, copy the webhook URL to the Recipient field in the alert subscription dialog box.

    • The Custom Webhook notification method is supported only in DataWorks Enterprise Edition. For information about the message format of an alert notification sent by using a Custom Webhook, see Appendix: Message format of alert notifications sent by using a custom webhook URL.

    • When you select Email, Email and SMS, or Telephone as the notification method, you can specify Recipient as Monitor Owner, Shift Schedule, or Node Owner.

      • Data Quality Monitoring Owner: Alert information will be sent to the Quality Monitoring Owner set in the Basic Configurations section of the current quality monitor.

      • Shift Schedule: When the monitoring rule associated with the monitor is triggered and an alert is generated, the system sends alert notifications to the person on duty for the current day in the shift schedule.

      • Scheduling Task Owner: Alert notifications are sent to the owner of the scheduling node associated with the monitor.

Subscribe from the Quality Monitoring tab

  1. On the Quality Monitoring tab, find the quality monitoring job you created, and in the Actions column, click More > Alert Subscription.

    image

  2. In the Alert Subscription dialog box, add a Notification Method and a Recipient, and then click Save in the Actions column. After you save the configurations, you can configure another subscription with a different notification method and alert recipient.

    Data Quality supports the following notification methods: Email, Email and SMS, DingTalk Chatbot, DingTalk Chatbot @ALL, Lark Group Chatbot, Enterprise Wechat Robot, Custom Webhook, and Telephone.

    Note
    • You can add a DingTalk chatbot, Lark chatbot, or WeChat chatbot and obtain a webhook URL. Then, copy the webhook URL to the Recipient field in the alert subscription dialog box.

    • The Custom Webhook notification method is supported only in DataWorks Enterprise Edition. For information about the message format of an alert notification sent by using a Custom Webhook, see Appendix: Message format of alert notifications sent by using a custom webhook URL.

    • When you select Email, Email and SMS, or Telephone as the notification method, you can specify Recipient as Monitor Owner, Shift Schedule, or Node Owner.

      • Data Quality Monitoring Owner: Alert information will be sent to the Quality Monitoring Owner set in the Basic Configurations section of the current quality monitor.

      • Shift Schedule: When the monitoring rule associated with the monitor is triggered and an alert is generated, the system sends alert notifications to the person on duty for the current day in the shift schedule.

      • Scheduling Task Owner: Alert notifications are sent to the owner of the scheduling node associated with the monitor.

Next steps

After the monitor is run, you can choose Quality O&M in the left-side navigation pane and click Monitor and Running Records to view the quality check status of the specified table and the complete quality rule check records.

Appendix

Appendix 1: Formulas for fluctuation rate and variance

  • Fluctuation rate formula: Fluctuation rate = (Sample value - Baseline value) / Baseline value

    • Sample value: The specific value of the sample collected on the current day. For example, for a 1-day fluctuation check of the table row count in an SQL task, the sample is the row count of the current day's partition.

    • Baseline value: The comparison value from historical samples.

    Note
    • If the rule is a 1-day fluctuation rate of table row count check for an SQL task, the baseline value is the table row count from the previous day's partition.

    • If the rule is a 7-day average fluctuation rate of table row count check for an SQL task, the baseline value is the average of the table row data from the previous 7 days.

  • Variance fluctuation formula: (Current sample - Average of last N days) / Standard deviation

    Note

    Variance can only be used for numeric types such as BIGINT and DOUBLE.

Appendix 2: Built-in partition filter expressions

Scenario:

  • The data timestamp (bizdate) is 20240524.

  • Scheduled time is 10:30:00

Partition Filter Expression

Check Target Description

Example (Based on the scenario)

ds=$[yyyymmdd]

Checks the partition data of the current data timestamp.

20240524

ds=$[yyyymmdd-1]

Checks the partition data from the day before the data timestamp.

20240523

ds=$[yyyymmdd-7]

Checks the partition data from 7 days before the data timestamp (one week ago).

20240517

ds=$[add_months(yyyymmdd,-1)]

Checks the partition data from the same day of the previous month as the data timestamp.

20240424

ds=$[yyyymmddhh24miss]

Checks the partition for the current data timestamp, accurate to the current scheduled time (second level).

20240524103000

ds=$[yyyymmdd]000000

Checks the second-level partition data at midnight of the current data timestamp.

20240524000000

ds=$[yyyymmddhh24miss-1/24]

Checks the second-level partition data from one hour before the scheduled time on the current data timestamp.

20240524093000

ds=$[hh24miss-1/24]

(For hourly partitions) Checks the partition from one hour before the scheduled time. The format is usually hh0000.

090000

ds=$[hh24miss-30/24/60]

(For minute-level partitions) Checks the partition from 30 minutes before the scheduled time. The format is usually hhmi00.

100000

ds=$[yyyymmdd-1]/hour=$[hh24]

(For subpartitions) Checks all hourly partition data from the day before the data timestamp.

All partitions from ds=20240523/hour=00 to ds=20240523/hour=23