Risk identification rules - DataWorks - Alibaba Cloud Documentation Center

Risk identification rules use multi-dimensional association analysis and algorithms. This intelligent technology helps you proactively identify risky operations and receive alerts. You can use risk identification rules and perform comprehensive auditing with visualization tools. DataWorks includes built-in risk identification rules for many scenarios. You can use these rules out of the box or create custom rules as needed. This topic describes how to create and manage risk identification rules.

Background

Data entered into DataWorks is filtered by Data Security Guard. DataWorks provides the comprehensive risk identification rules feature to detect sensitive data in various scenarios. This feature offers the following benefits:

Ease of use
The feature includes four risk types: Data access risk, Data export risk, Data manipulation risk, and Other risk types. It also supports combining multiple dimensions, such as Access time, Sensitivity type, and Access volume, to detect various types of risks.
High accuracy
The feature uses event aggregation and statistical comparison. By comparing the number of event occurrences within a time window against a threshold, the feature detects risks more accurately and reduces false positives. For example, a risk is detected only if the same event occurs more than three times within 10 minutes.
Fine-grained management
The feature supports configuring High, Medium, and Low risk levels for fine-grained risk management.
Flexible rules
The feature has built-in rules for common scenarios that you can use directly. You can also create custom risk identification rules as needed. For more information, see Built-in risk identification rules and Create a risk identification rule.

Limits

Version limits
- Only DataWorks Professional Edition and later versions support the risk identification rules feature.
- Only DataWorks Enterprise Edition supports built-in risk identification rules.
Alerting methods
Only email and WebHook alerting methods are supported.
Note
DataWorks supports WebHook URLs for DingTalk groups, WeCom, and Lark. Only the Enterprise Edition supports pushing alert information to WeCom or Lark.

Go to the risk identification rules page

Go to Data Security Guard.
1. Go to the DataStudio page.
  Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Development and O&M > Data Development. On the page that appears, select the desired workspace from the drop-down list and click Go to Data Development.
2. Click the icon in the upper-left corner. Then, choose All Products > Data Governance > Data Security Guard. On the page that appears, click Try Now to go to the Data Security Guard page.
  Note
  - If your Alibaba Cloud account is granted the required permissions, you can directly access the homepage of Data Security Guard.
  - If your Alibaba Cloud account is not granted the required permissions, you are redirected to the authorization page of Data Security Guard. You can use the features of Data Security Guard only after your Alibaba Cloud account is granted the required permissions.
Go to Risk Identification Rules.
On the Data Security Guard page, choose Rule Configuration > Risk Identification Rules in the navigation pane on the left. You are redirected to the risk identification rules page where you can create and manage risk identification rules.
Risk identification rules have built-in rules for many common scenarios that you can use directly. You can also create custom risk identification rules as needed. For more information, see Built-in risk identification rules and Create a risk identification rule.

Built-in risk identification rules

The risk identification rules feature supports the built-in rules listed in the following table.

Rule name	Rule type	Rule level	Rule configuration
Querying large volumes of sensitive data outside of work hours	Data access risk	Low	This rule is hit when the data volume of a query exceeds 10,000 during the following time periods. Monday to Friday: `19:00–24:00`. Saturday to Sunday: `00:00–24:00`.
Similar SQL queries	Data access risk	Low	This rule is hit when five or more similar SQL queries are run within 10 minutes.
Batch querying large volumes of sensitive data	Data access risk	Medium	This rule is hit when the data volume of a single query exceeds 10,000.
Batch exporting large volumes of sensitive data	Data export risk	High	This rule is hit when the data volume of a single export exceeds 10,000.
Exporting large volumes of sensitive data outside of work hours	Data export risk	High	This rule is hit when the data volume of an export exceeds 10,000 during the following time periods. Monday to Friday: `22:00–24:00`. Saturday to Sunday: `00:00–24:00`.

Create a risk identification rule

Plan and prepare to create the rule.

Based on your scenario, you can detect risky data across dimensions such as Data location, Data properties, User information, and Operation time to configure more fine-grained detection conditions. When you use subcategories of Data properties and User information to configure detection conditions, perform the following preparatory steps.

Detection dimension	Subcategory	Description
Data properties	Data classification level	To detect risky data of a specific level, you must define data classification levels in advance. For more information, see Configure sensitive data classification and levels.
	Data category	To detect risky data of a specific category, you must define data categories in advance. For more information, see Configure sensitive data detection rules and run detection tasks.
	Sensitive field type	To detect risky data in specific sensitive fields, you must define sensitive field types in advance. For more information, see Configure sensitive data detection rules and run detection tasks.
User information	User group	To detect risky data for a specific user group under the current logon account, you must configure user groups in advance. For more information, see Configure user groups.
User information	RAM role	To detect risky data for a RAM user under the current logon account, you must add a RAM user to your Alibaba Cloud account in advance. For more information, see Create a RAM user.

In the upper-right corner of the Risk Identification Rules page, click + Risk Identification Rules.

In the Create Risk Identification Rule dialog box, configure the parameters for the rule.

Note

Currently, you can create only statistical association rules. A statistical association rule aggregates and counts single events and compares the count against a threshold. A risk is detected if the number of events exceeds the threshold. For example, a rule can be configured to detect a risk if a low-privilege user accesses more than 10,000 sensitive data entries outside of work hours.

Configure the basic information for the rule.

Parameter	Description
(Required) Rule Name	The name of the new risk identification rule. The name must be 1 to 30 characters in length and cannot contain special characters.
(Required) Rule Type	The type of the risk identification rule. Valid values: Data Access: A potential risk exists when data is accessed. Data Export: A potential risk exists when data is exported. Data Deletion: A potential risk exists when data is deleted. Data Update: A potential risk exists when data is updated. Library Table Operations: A potential risk exists when operations are performed on tables and libraries. Data Authorization: A potential risk exists when data permissions are granted.
(Required) Rule Level	The level of the risk identification rule. Valid values are Low, Medium, and High. You can set the rule level to High for rules that detect important data.
(Optional) Description	The description of the risk identification rule. The description can be 1 to 100 characters in length.

Click Next.

Configure detection conditions and thresholds.

Configure detection conditions.

DataWorks lets you detect risky data across dimensions such as Data location, Data properties, User information, and Operation time. This lets you configure more fine-grained detection conditions based on your scenario.

Note

You can add up to 10 conditions. Click + Add Comparison Relationship within a selected dimension to add multiple detection conditions. The logical relationship between multiple conditions is AND.

Data location

Used to specify the location scope for detecting risky data.

Parameter	Description	Required
Filter selected location	Specifies whether to filter risky data in the selected location. Valid values: ≠: Filters the destination location. The rule does not detect risky data in the selected location. =: Detects only in the destination location. The rule detects risky data only in the selected location.	Yes
Compute engine name	Select the engine scope for the rule. Note Currently, only risky data in the MaxCompute engine can be detected. You can select only one engine for each comparison. To specify multiple engines, click + Add Comparison to configure multiple detection conditions.	Yes
Project name	Select the destination project for the rule. The Project name must be a project within the selected engine. You can select a project from the drop-down list or enter a project name to search. Note The drop-down list displays up to 100 project names. The search supports fuzzy matching. Enter a keyword to search for projects whose names contain the keyword. You can select only one project for each comparison. To specify multiple projects, click + Add Comparison to configure multiple detection conditions.	Yes
Table name	Enter the destination tables for the rule. You can enter one or more table names, separated by commas (,). Note the following when entering table names: A single table name can be up to 30 characters long. The total length of all table names cannot exceed 100 characters. The wildcard character (``) is supported. For example, `name` matches all tables with names ending in `name`.	No. If you do not configure this parameter, the rule detects risky data in all tables within the selected project by default.

Data properties

Used to specify the property scope for detecting risky data.

Parameter

Description

Property Type

Select the property category for detecting risky data based on your business needs. The following property categories are supported:

Data classification level: Used to specify which level of risky data to detect. You must define data classification levels in advance. For more information, see Configure sensitive data classification and levels.
Data category: Used to specify which category of risky data to detect. You must define data categories in advance. For more information, see Configure sensitive data detection rules and run detection tasks.
Sensitive field type: Used to specify which type of sensitive field to detect risky data in. You must define sensitive field types in advance. For more information, see Configure sensitive data detection rules and run detection tasks.

Filter selected property

Specifies whether to filter risky data with the selected property. Valid values:

≠: Filters the destination property. The rule does not detect risky data with the selected property.
=: Detects only the destination property. The rule detects risky data only with the selected property.

User information

Used to specify the user information scope for detecting risky data.

Parameter

Description

Information category

Select the user information category for detecting risky data. Valid values:

User group: The name of a user group under the current logon account. You must configure user groups in advance. For more information, see Configure user groups.
RAM role: A RAM user under the current logon account. You must add a RAM user to your Alibaba Cloud account in advance. For more information, see Create a RAM user.
Username: The current logon user.

Filter selected user information

≠: Filters the destination user information. The rule does not detect risky data for the selected user.
=: Detects only the destination user information. The rule detects risky data only for the selected user.

Operation time

Used to specify the operation time scope for detecting risky data.

Parameter	Description
Select time range	Click a day of the week and an hour to select the desired time range. You can select any time from Monday to Sunday, with precision to the hour. You can add multiple time ranges. The added time ranges are mutually exclusive. For example, if you select Monday in Condition 1, you cannot select Monday in Condition 2.
Filter selected time	≠: Filters the destination operation time. The rule does not detect risky data during the selected operation time. =: Detects only the destination operation time. The rule detects risky data only during the selected operation time.

Configure thresholds.

DataWorks supports event aggregation and statistics. You can detect risky data by comparing the number of event occurrences within a time window against a threshold. Click + Add Threshold Comparison to configure multiple threshold conditions.

Parameter

Description

Threshold category

Single data volume: Detects risky data based on the volume of data in an operation. An operation hits the risk if the data volume exceeds the set threshold. The data volume is an integer from 1 to 10,000,000. The unit is entries. The default value is 1.
Cumulative occurrences: Detects risky data based on the number of times a single event occurs within a specified time range. A risk is hit if the number of occurrences of a single event exceeds the set threshold within the specified time range. The number of occurrences is an integer from 1 to 10,000. The unit is times. The default value is 10.
Cumulative data volume: Detects risky data based on the volume of data operated on within a specified time range. An operation hits the risk if the data volume exceeds the set threshold. The data volume is an integer from 1 to 10,000,000. The unit is entries. The default value is 1.
Note
DataWorks automatically categorizes and detects single events.

Time window

The time range that limits the number of event occurrences. The default value is 10 minutes. Valid values:

Minute: The value ranges from 1 to 59.
Hour: The value ranges from 1 to 23.
Day: The value ranges from 1 to 7.

Note

This parameter is required only when Threshold category is set to Cumulative occurrences.

Click Next.
Configure the alerting method.
After a data risk is detected, you can promptly receive alert information based on the configured alerting method to handle the risk. You can select Email and WebHook as alerting methods.
Note
Before you select an alerting method, make sure that you have configured email and WebHook settings in System Settings.
Click Save. The rule is created.
Custom rules are disabled by default after they are created. On the Risk identification rules page, you must click Re-enable next to the destination rule to manually enable it.

Manage risk identification rules

On the Risk Identification Rules page, you can view the list of created rules and their details. You can also edit a specific rule.

Area	Description
1	In this area, you can filter the rule list by conditions such as Risk Type, Risk Level, Built-in or Not Built-in, and Risk Rule Name. Note The search by name supports fuzzy matching. Enter a keyword to search for risk identification rules whose names contain the keyword.
2	In this area, you can perform the following operations: View basic rule information: View basic information about created rules, such as risk type, risk level, and effective status, along with the risky data detected by the rule. You can check the Risks Hit, Pending Risks, and Handled Risks to understand the current risks in the tenant and their processing status. View rule details and edit the rule: Click View Details to view the detailed configuration of the rule. You can also modify the rule as needed. Re-enable the rule: Click the icon to re-enable a disabled rule. Note You can perform this operation only on rules that are in a disabled state.
3	In this area, you can perform batch operations on destination rules. Currently, you can perform batch operations such as Batch Enable, Batch Disable, and Batch Delete. Click the icon to switch between batch operation types. Note DataWorks does not support deleting built-in risk identification rules. You can only delete custom rules that are in the Disabled state.

Next steps

After a risk identification rule is created and enabled, you can navigate to the Data Risks page to view the details of risks detected by the rule and handle them promptly. For more information, see View data risks.