All Products
Search
Document Center

DataWorks:CreateDataQualityRule

Last Updated:Dec 05, 2024

Creates a data quality monitoring rule.

Debugging

You can run this interface directly in OpenAPI Explorer, saving you the trouble of calculating signatures. After running successfully, OpenAPI Explorer can automatically generate SDK code samples.

Authorization information

There is currently no authorization information disclosed in the API.

Request parameters

ParameterTypeRequiredDescriptionExample
NamestringYes

The name of the rule.

ProjectIdlongYes

The DataWorks workspace ID.

10726
EnabledbooleanNo

Specifies whether to enable the rule.

true
SeveritystringNo

The strength of the rule. Valid values:

  • Normal
  • High
Normal
DescriptionstringNo

The description of the rule. The description can be up to 500 characters in length.

this is a odps _sql task
TargetobjectNo

The monitored object of the rule.

TypestringNo

The type of the monitored object. Valid values:

  • Table
Table
DatabaseTypestringNo

The type of the database to which the table belongs. Valid values:

  • maxcompute
  • emr
  • cdh
  • hologres
  • analyticdb_for_postgresql
  • analyticdb_for_mysql
  • starrocks
maxcompute
TableGuidstringYes

The ID of the table that is limited by the rule in Data Map.

odps.unit_test.tb_unit_test
PartitionSpecstringNo

The configuration of the partitioned table.

ds=$[yyyymmdd-1]
TemplateCodestringNo

The ID of the template used by the rule.

system::user_defined
SamplingConfigobjectNo

The sampling settings.

MetricstringNo

The metrics used for sampling. Valid values:

  • Count: the number of rows in the table.
  • Min: the minimum value of the field.
  • Max: the maximum value of the field.
  • Avg: the average value of the field.
  • DistinctCount: the number of unique values of the field after deduplication.
  • DistinctPercent: the proportion of the number of unique values of the field after deduplication to the number of rows in the table.
  • DuplicatedCount: the number of duplicated values of the field.
  • DuplicatedPercent: the proportion of the number of duplicated values of the field to the number of rows in the table.
  • TableSize: the table size.
  • NullValueCount: the number of rows in which the field value is null.
  • NullValuePercent: the proportion of the number of rows in which the field value is null to the number of rows in the table.
  • GroupCount: the field value and the number of rows for each field value.
  • CountNotIn: the number of rows in which the field values are different from the referenced values that you specified in the rule.
  • CountDistinctNotIn: the number of unique values that are different from the referenced values that you specified in the rule after deduplication.
  • UserDefinedSql: indicates that data is sampled by executing custom SQL statements.
Count
MetricParametersstringNo

The parameters required for sampling.

{ "Columns": [ "id", "name" ] , "SQL": "select count(1) from table;"}
SettingConfigstringNo

The statements that are used to configure the parameters required for sampling before you execute the sampling statements. The statements can be up to 1,000 characters in length. Only the MaxCompute database is supported.

SET odps.sql.udf.timeout=600s; SET odps.sql.python.version=cp27;
SamplingFilterstringNo

The statements that are used to filter unnecessary data during sampling. The statements can be up to 16,777,215 characters in length.

id IS NULL
CheckingConfigobjectNo

The check settings for sample data.

TypestringNo

The method that is used to calculate a threshold. You can leave this parameter empty if you use a rule template. Valid values:

  • Fixed
  • Fluctation
  • FluctationDiscreate
  • Auto
  • Average
  • Variance
Fixed
ReferencedSamplesFilterstringNo

The method that is used to query the referenced samples. To obtain some types of thresholds, you need to query reference values. In this example, an expression is used to specify the query method of referenced samples.

{ "bizdate": [ "-1", "-7", "-1m" ] }
ThresholdsobjectNo

The threshold settings.

ExpectedobjectNo

The expected threshold setting.

OperatorstringNo

The comparison operator. Valid values:

  • >
  • >=
  • <
  • <=
  • !=
  • =
>
ValuestringNo

The threshold value.

100.0
WarnedobjectNo

The threshold settings for normal alerts.

OperatorstringNo

The comparison operator. Valid values:

  • >
  • >=
  • <
  • <=
  • !=
  • =
>
ValuestringNo

The threshold value.

100.0
CriticalobjectNo

The threshold settings for critical alerts.

OperatorstringNo

The comparison operator. Valid values:

  • >
  • >=
  • <
  • <=
  • !=
  • =
>
ValuestringNo

The threshold value.

100.0
ErrorHandlersarray<object>No

The operations that you can perform after the rule-based check fails.

objectNo

The operation that you can perform after the rule-based check fails.

TypestringNo

The type of the operation. Valid values:

  • SaveErrorData
SaveErrorData
ErrorDataFilterstringNo

The SQL statement that is used to filter failed tasks. If the rule is defined by custom SQL statements, you must specify an SQL statement to filter failed tasks.

SELECT * FROM tb_api_log WHERE id IS NULL

Response parameters

ParameterTypeDescriptionExample
object

The response parameters.

RequestIdstring

The request ID.

691CA452-D37A-4ED0-9441

Examples

Sample success responses

JSONformat

{
  "RequestId": "691CA452-D37A-4ED0-9441\n",
  "Id": 0
}

Error codes

For a list of error codes, visit the Service error codes.