Data Quality provides various built-in rule templates. This topic describes the check logic of Data Quality and the built-in rule templates.
Description of calculation formula
You can calculate the fluctuation by using the following formula: Fluctuation = (Sample value - Baseline)/Baseline
.
Sample value
The sample value for the current day. For example, if you want to check the fluctuation in the number of table rows on an SQL node within a day, the sample value is the number of table rows in partitions on that day.
Baseline
The comparison value collected from the previous N days. Examples:
If you want to check the fluctuation in the number of table rows on an SQL node based on the statistics seven days ago, the baseline is the number of table rows in partitions seven days before the current day. In other words, the fluctuation is calculated by comparing the sample value collected on the current day with that collected seven days before the current day.
If you want to check the fluctuation in the number of table rows on an SQL node in the last seven days, the baseline is the average number of table rows in the last seven days. In other words, the baseline is calculated by dividing the total number of table rows in the last seven days by seven.
Check logic
Data Quality supports three verification methods: comparison with a fixed value, comparison with thresholds, and comparison with a dynamic threshold.
Verification method | Check logic |
Comparison with a fixed value |
|
Comparison with thresholds | The comparison of the raising range, drop range, and fluctuation range (absolute value) is supported. The comparison of the fluctuation range (absolute value) is used as an example in this topic.
|
Comparison with a dynamic threshold | You do not need to set thresholds. The system automatically checks the metrics in real time based on algorithm models. If the value of a metric falls outside a reasonable range, an alert is reported. |
Built-in monitoring rule templates
You can use a built-in rule template to quickly create a monitoring rule for a single table or multiple tables. For more information, see Configure a monitoring rule for a single table and Configure a monitoring rule for multiple tables based on a template.
Template category | Template | Description |
Table Count | Number of rows. fixed value | Data Quality compares the number of table rows collected on the current day with a fixed value. |
Table is not empty | Checks whether the number of table rows is greater than 0. | |
Number of rows. 1 day difference | Data Quality compares the number of table rows collected on the current day with that in partitions generated on the previous day to obtain the fluctuation. Then, Data Quality compares the obtained fluctuation with thresholds. Note The baseline is the number of table rows in partitions generated on the previous day. | |
Number of table rows. upper cycle difference | Data Quality compares the number of table rows collected on the current day with that in partitions generated in the last cycle to obtain the fluctuation. Then, Data Quality compares the obtained fluctuation with thresholds. | |
Number of rows. 1. 7. 30 days. 1st of this month. volatility | Data Quality compares the number of table rows collected on the current day with that on the previous day, seven days ago, 30 days ago, and that on the first day of the current month to obtain the fluctuations. Then, Data Quality compares the obtained fluctuations with thresholds. If a fluctuation exceeds a threshold, Data Quality reports an alert. | |
Table row number. 1. 7. 30 day volatility | Data Quality compares the number of table rows collected on the current day with that on the previous day, seven days ago, and 30 days ago to obtain the fluctuations. Note Then, Data Quality compares the obtained fluctuations with thresholds. If a fluctuation exceeds a threshold, Data Quality reports an alert. | |
Table row number. 1 day volatility | Data Quality compares the number of table rows collected on the current day with that on the previous day to obtain the fluctuation. Then, Data Quality compares the obtained fluctuation with thresholds. If the fluctuation exceeds a threshold, Data Quality reports an alert. | |
Table row number. 30-day volatility | Data Quality compares the number of table rows collected on the current day with that 30 days before the current day to obtain the fluctuation. Then, Data Quality compares the obtained fluctuation with thresholds. If the fluctuation exceeds a threshold, Data Quality reports an alert. | |
Number of rows. 7-day volatility | Data Quality compares the number of table rows collected on the current day with that seven days before the current day to obtain the fluctuation. Then, Data Quality compares the obtained fluctuation with thresholds. If the fluctuation exceeds a threshold, Data Quality reports an alert. | |
Table Rows | If you set the Comparison Method parameter to Intelligent Dynamic Threshold, you do not need to manually configure the fluctuation thresholds or the expected value. The system determines the thresholds by using intelligent algorithms. If data exceptions are detected, the system triggers alerts or blocks at the earliest opportunity. | |
Table row number. 30-day average volatility | Data Quality compares the number of table rows collected on the current day with the average number of table rows in the last 30 days to obtain the fluctuation. The baseline is the average number of table rows in the last 30 days. In other words, the baseline is calculated by dividing the total number of table rows in the last 30 days by 30. | |
Table row number. 7-day average volatility | Data Quality compares the number of table rows collected on the current day with the average number of table rows in the last seven days to obtain the fluctuation. The baseline is the average number of table rows in the last seven days. In other words, the baseline is calculated by dividing the total number of table rows in the last seven days by seven. | |
Number of table rows. upper cycle volatility | Data Quality compares the number of table rows collected on the current day with that in partitions generated in the last cycle to obtain the fluctuation. Then, Data Quality compares the obtained fluctuation with thresholds. | |
Table row count with user defined condition | You can specify the comparison method and the comparison threshold range for the number of table rows based on your business requirements. | |
Percent on Condition | Row count matched user defined condition | You can specify the comparison method and the comparison threshold range for the matching rate of filter conditions based on your business requirements. |
Table Size | Table size. fixed value | Data Quality compares the size of a table in bytes on the current day with a fixed value. |
Table size. upper period difference | Data Quality compares the size of a table in bytes on the current day with that on the previous day to obtain the fluctuation. Then, Data Quality compares the obtained fluctuation with thresholds. | |
Table size. upper period difference | Data Quality compares the size of a table in bytes on the current day with that in the last cycle to obtain the fluctuation. Then, Data Quality compares the obtained fluctuation with thresholds. | |
Table size. 1 day volatility | Data Quality compares the size of a table on the current day with that on the previous day to obtain the fluctuation. Then, Data Quality compares the obtained fluctuation with thresholds. If the fluctuation exceeds a threshold, Data Quality reports an alert. For example, you can set the warning threshold to 5% and the error threshold to 10%. If the fluctuation is greater than 5% and less than or equal to 10%, a warning alert is reported. If the fluctuation is greater than 10%, an error alert is reported. | |
Table size. 30-day volatility (to be determined) | Data Quality compares the size of a table on the current day with that 30 days ago to obtain the fluctuation. Then, Data Quality compares the obtained fluctuation with thresholds. If the fluctuation exceeds a threshold, Data Quality reports an alert. | |
Table size. 7-day volatility | Data Quality compares the size of a table on the current day with that seven days ago to obtain the fluctuation. Then, Data Quality compares the obtained fluctuation with thresholds. If the fluctuation exceeds a threshold, Data Quality reports an alert. | |
Table Size | If you set the Comparison Method parameter to Intelligent Dynamic Threshold, you do not need to manually configure the fluctuation thresholds or the expected value. The system determines the thresholds by using intelligent algorithms. If data exceptions are detected, the system triggers alerts or blocks at the earliest opportunity. | |
Null Value Count | Number of null values. fixed value | Data Quality compares the number of null values of a field with a fixed value. Note The |
No null value on single field | Checks whether the number of null values of a field is 0. | |
Null Value Count / Table Count | Number of nulls / total number of rows. fixed value | Data Quality compares the ratio of the number of null values of a field to the total number of rows with a fixed value. Note The fixed value is a decimal. |
Duplicated Value Count | Repeated value. fixed value | Data Quality subtracts the number of values of a field after deduplication from the total number of rows to obtain the number of duplicate values of the field. Then, Data Quality compares the number of duplicate values with a fixed value. |
No duplicated value on single field | Checks whether the number of duplicate values of a field is 0. | |
Distinct Count on Multiple Fields | No duplicated value on multiple fields | Checks whether the number of duplicate values of multiple fields is 0. |
Duplicated Value Count / Table Count | Repeated number of values / total number of rows. fixed value | Data Quality compares the ratio of the number of duplicate values of a field to the total number of rows with a fixed value. |
Distinct Count | Unique value. fixed value | Data Quality compares the number of unique values of a field after deduplication with a fixed value. |
The number of unique values. 1. 7. 30 volatility | Data Quality compares the number of unique values of a field after deduplication on the current day with that on the previous day, seven days ago, and 30 days ago to obtain the fluctuations. Then, Data Quality compares the obtained fluctuations with thresholds. | |
Unique value | If you set the Comparison Method parameter to Intelligent Dynamic Threshold, you do not need to manually configure the fluctuation thresholds or the expected value. The system determines the thresholds by using intelligent algorithms. If data exceptions are detected, the system triggers alerts or blocks at the earliest opportunity. | |
Distinct Count / Table Count | Unique value/total number of rows. fixed value | Data Quality compares the ratio of the number of unique values of a field to the total number of rows with a fixed value. |
Min | Minimum. 1. 7. 30-day volatility | Data Quality compares the minimum value of a field on the current day with the average values calculated on the previous day, seven days ago, and 30 days ago to obtain the fluctuations. Then, Data Quality compares the obtained fluctuation with thresholds. If the fluctuation exceeds a threshold, Data Quality reports an alert. |
Min Value | If you set the Comparison Method parameter to Intelligent Dynamic Threshold, you do not need to manually configure the fluctuation thresholds or the expected value. The system determines the thresholds by using intelligent algorithms. If data exceptions are detected, the system triggers alerts or blocks at the earliest opportunity. | |
Minimum value. 1 day volatility | Data Quality compares the minimum value of a field on the current day with that on the previous day to obtain the fluctuation. Then, Data Quality compares the obtained fluctuation with thresholds. | |
Minimum period | Data Quality compares the minimum value of a field on the current day with that in the last cycle to obtain the fluctuation. Then, Data Quality compares the obtained fluctuation with thresholds. | |
Minimum value with user defined condition | You can specify the comparison method and the comparison threshold range for the minimum value of a field based on your business requirements. | |
Max | Maximum. 1. 7. 30-day volatility | Data Quality compares the maximum value of a field on the current day with the average values calculated on the previous day, seven days ago, and 30 days ago to obtain the fluctuations. Then, Data Quality compares the obtained fluctuation with thresholds. If the fluctuation exceeds a threshold, Data Quality reports an alert. |
Maximum | If you set the Comparison Method parameter to Intelligent Dynamic Threshold, you do not need to manually configure the fluctuation thresholds or the expected value. The system determines the thresholds by using intelligent algorithms. If data exceptions are detected, the system triggers alerts or blocks at the earliest opportunity. | |
Maximum. 1 day volatility | Data Quality compares the maximum value of a field on the current day with that on the previous day to obtain the fluctuation. Then, Data Quality compares the obtained fluctuation with thresholds. | |
Maximum period | Data Quality compares the maximum value of a field on the current day with that in the last cycle to obtain the fluctuation. Then, Data Quality compares the obtained fluctuation with thresholds. | |
Maximum value with user defined condition | You can specify the comparison method and the comparison threshold range for the maximum value of a field based on your business requirements. | |
Average | Average. 1. 7. 30-day volatility | Data Quality compares the average value of a field calculated on the current day with that on the previous day, seven days ago, and 30 days ago to obtain the fluctuations. Then, Data Quality compares the obtained fluctuation with thresholds. If the fluctuation exceeds a threshold, Data Quality reports an alert. Note Data Quality compares the average value of a field calculated on the current day with that on the previous day, seven days ago, and 30 days ago. |
Average | If you set the Comparison Method parameter to Intelligent Dynamic Threshold, you do not need to manually configure the fluctuation thresholds or the expected value. The system determines the thresholds by using intelligent algorithms. If data exceptions are detected, the system triggers alerts or blocks at the earliest opportunity. | |
Average. 1 day volatility | Data Quality compares the average value of a field on the current day with that on the previous day to obtain the fluctuation. Then, Data Quality compares the obtained fluctuation with thresholds. | |
Average value with user defined condition | You can specify the comparison method and the comparison threshold range for the average value of a field based on your business requirements. | |
Sum | Summary value. 1. 7. 30-day volatility | Data Quality compares the value sum of a field on the current day with the average values calculated on the previous day, seven days ago, and 30 days ago to obtain the fluctuations. Then, Data Quality compares the obtained fluctuation with thresholds. If the fluctuation exceeds a threshold, Data Quality reports an alert. |
Summary value | If you set the Comparison Method parameter to Intelligent Dynamic Threshold, you do not need to manually configure the fluctuation thresholds or the expected value. The system determines the thresholds by using intelligent algorithms. If data exceptions are detected, the system triggers alerts or blocks at the earliest opportunity. | |
Summary value. 1 day volatility | Data Quality compares the value sum of a field on the current day with that on the previous day to obtain the fluctuation. Then, Data Quality compares the obtained fluctuation with thresholds. | |
Summary value. upper period volatility | Data Quality compares the value sum of a field on the current day with that in the last cycle to obtain the fluctuation. Then, Data Quality compares the obtained fluctuation with thresholds. | |
Sum value with user defined condition | You can specify the comparison method and the comparison threshold range for the aggregated value of a field based on your business requirements. | |
Discrete Values | Discrete value (status value). fixed value | Data Quality compares the number of values in each group of a field with a fixed value. |
Discrete value (number of groups). fixed value | Data Quality compares the number of groups of a field with a fixed value. | |
Discrete value (number of groups) | If you set the Comparison Method parameter to Intelligent Dynamic Threshold, you do not need to manually configure the fluctuation thresholds or the expected value. The system determines the thresholds by using intelligent algorithms. If data exceptions are detected, the system triggers alerts or blocks at the earliest opportunity. | |
Discrete value (status value) | If you set the Comparison Method parameter to Intelligent Dynamic Threshold, you do not need to manually configure the fluctuation thresholds or the expected value. The system determines the thresholds by using intelligent algorithms. If data exceptions are detected, the system triggers alerts or blocks at the earliest opportunity. | |
Discrete value (number of groups). 1 day volatility | Data Quality compares the number of groups of a field on the current day with that on the previous day to obtain the fluctuation. Then, Data Quality compares the obtained fluctuation with thresholds. | |
Discrete values (number of groups and status values). 1. 7. 30-day volatility | Data Quality compares the number of groups and the number of values in each group of a field on the current day with those on the previous day, seven days ago, and 30 days ago to obtain the fluctuations. Then, Data Quality compares the obtained fluctuations with thresholds. |
You cannot configure table-size-based monitoring rules for E-MapReduce (EMR) tables.
Appendix: Description of the last cycle
In some of the preceding built-in templates, the instance of the last cycle is used as the baseline. For a daily- or hourly-scheduled task, the logic of determining the instance of the last cycle is to first exclude all instances with the current data timestamp and then sort the other instances by data timestamp in reverse chronological order. If multiple instances have the same data timestamp, further sort these instances by running time in reverse chronological order. The first instance in the obtained sequence is the instance of the last cycle and is used as the baseline. The following table describes how to determine the baseline.
Scheduling scenario | Data timestamp | Baseline | FAQ |
Daily scheduling | Historical data timestamps:
| When the instance whose data timestamp is June 6, 2024 starts to be checked based on monitoring rules, the instance whose data timestamp is June 5, 2024 is used as the baseline. | Historical data backfilling scenario: Background: The scheduling node is run as expected from June 1, 2024 to June 5, 2024. After the instance whose data timestamp is June 5, 2024 finishes running, a data backfill operation is performed to backfill the data on July 1, 2024 to the scheduling node. In this case, what is the baseline that can be used for comparison when the instance whose data timestamp is June 6, 2024 starts to be checked based on monitoring rules? Conclusion: The instance whose data timestamp is June 6, 2024 uses the instance whose data timestamp is July 1, 2024 as the baseline. The instance whose data timestamp is July 1, 2024 is used as the baseline before the instance whose data timestamp is July 2, 2024 finishes running. |
Hourly scheduling | Historical data timestamps:
A scheduling node is scheduled by hour and run three times a day. | When an instance whose data timestamp is June 4, 2024 starts to be checked based on monitoring rules, the last instance whose data timestamp is June 3, 2024 is used as the baseline. | Hourly scheduling scenario: Background: Three instances are generated for each day from June 1, 2024 to June 3, 2024 and run as expected, and the first instance whose data timestamp is June 4, 2024 is also run as expected. In this case, what is the baseline that can be used for comparison when the second instance whose data timestamp is June 4, 2024 starts to be checked based on monitoring rules? Conclusion: The first instance whose data timestamp is June 4, 2024 is excluded. The last instance whose data timestamp is June 3, 2024 is used as the baseline. |
Appendix 2: Description of obtaining a sample value from the output data of an hourly-scheduled task on the date that is N days before the current date
When you extract a sample value from the output data of an hourly-scheduled task on the date that is N days before the current date, the instances of the task are sorted by running time (different from scheduling time) on the date that is N days before the current date in reverse chronological order. The output data of the first instance in the obtained sequence is used as the baseline by default and is compared with the output data of an instance generated for the task on the current date to obtain the fluctuation. The following table describes how to obtain the fluctuation.
Scheduling scenario | Data timestamp | Sample value | FAQ |
Hourly scheduling | Historical data timestamps:
A task is scheduled by hour and run three times a day. | If you want to obtain a seven-day fluctuation, a sample value is extracted from the output data of the last instance whose running time is June 1, 2024 when an instance whose running time is June 8, 2024 starts to be checked based on monitoring rules. | Hourly scheduling scenario: Background: Three instances are generated for each day from June 1, 2024 to June 8, 2024. In this case, what is the sample value that can be used for comparison to obtain a seven-day fluctuation when the second instance whose running time is June 8, 2024 starts to be checked based on monitoring rules? Conclusion: When the second instance whose running time is June 8, 2024 starts to be checked based on monitoring rules, the output data of the last instance whose running time is June 1, 2024 is used as the sample value for comparison to obtain a seven-day fluctuation. |