The risk identification rule management feature provides multidimensional association analysis methods and algorithms. These intelligent data analysis methods are used to identify data risks and send you alert notifications based on risk identification rules. This feature also allows you to audit risk identification rules in a visualized manner. DataWorks provides built-in risk identification rules for you to directly use in multiple scenarios. You can also create custom risk identification rules based on your business requirements. This topic describes how to create and manage a risk identification rule.
Background information
After DataWorks ingests data from data sources, the Data Security Guard service filters the data. The old version of the risk identification rule management feature provided by this service can be used to identify data risks only if sensitive data is involved and cannot be used to identify data risks in operation audit scenarios and scenarios that require aggregation of event statistics. To resolve this issue, DataWorks provides a new version of the risk identification rule management feature. The new version of this feature has the following benefits:
Ease of use
The feature can be used to identify the following types of data risks: data access risks, data export risks, data operation risks, and other data risks. The feature also allows you to create a risk identification rule based on a combination of risk identification dimensions, such as access time, sensitivity type, and number of access requests, to identify different types of data risks.
High precision
The feature supports the aggregation of event statistics. You can use a risk identification rule to compare the number of occurrences of an event in a time window with the threshold value that is specified for event occurrences to identify data risks in a precise manner. This feature helps reduce the number of false positives. For example, a risk identification rule specifies to identify a data risk if an event occurs at least three times in 10 minutes.
Fine-grained management
When you use this feature, you can set the risk level of a data risk to High, Medium, or Low. You can perform fine-grained management on data risks based on their risk levels.
High flexibility
DataWorks provides common risk identification rules for you to directly use in multiple scenarios. You can also create custom risk identification rules based on your business requirements. For more information, see Built-in risk identification rules and Create a risk identification rule.
The positions of parameters for a risk identification rule in the DataWorks console differ between the old and new versions of the risk identification rule management feature. For more information, see Comparison of the positions of parameters for a risk identification rule in the old and new versions of the risk identification rule management feature.
Limits
Version
Only users of DataWorks Professional Edition or a more advanced edition can use the new version of the risk identification rule management feature.
Only DataWorks Enterprise Edition or a more advanced edition provides built-in risk identification rules.
Switch between the old and new versions of the risk identification rule management feature
The old version of the risk identification rule management feature expires on June 30, 2022. The actual expiration time that is displayed on the Custom Identification Rules page takes precedence. After the expiration time elapses, the created risk identification rules and identified data risks are automatically cleared. You can only use the new version of the risk identification rule management feature after June 30, 2022. Export and back up the risk identification rules and identified data risks that you want to use at the earliest opportunity. For more information about the export and backup operations, see Risk identification rule management (old version).
The new version of the risk identification rule management feature can also be used before the old version expires. You can switch from the old version to the new version before the expiration time. After you switch to the new version, the created risk identification rules and identified data risks in the old version are not automatically synchronized to the new version. You must create them again in the new version.
Alert notification method
An alert notification can be sent by email or webhook URL.
NoteDataWorks supports the webhook URL-based alerting method for DingTalk, Enterprise WeChat, and Lark. Only DataWorks Enterprise Edition or a more advanced edition allows users to use Enterprise WeChat or Lark to receive an alert notification that is sent based on a webhook URL.
Go to the Custom Identification Rules page
Go to the Data Security Guard page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to Data Development.
Click the icon in the upper-left corner and choose .
Click Try now to go to the Data Security Guard page.
Go to the Custom Identification Rules page.
In the left-side navigation pane of the Data Security Guard page, choose
. The page for the old version of the risk identification rule management feature appears. You can click Try New Version in the upper-right corner of the Data Security Guard page to go to the page for the new version of the risk identification rule management feature and create and manage risk identification rules.The new version provides common risk identification rules for you to directly use in multiple scenarios. You can also create custom risk identification rules based on your business requirements. For more information, see Built-in risk identification rules and Create a risk identification rule.
Built-in risk identification rules
The following table describes built-in risk identification rules provided by the new version of the risk identification rule management feature.
Rule name | Data risk type | Risk level | Rule configuration |
Query a large number of sensitive data records in non-business hours | Data Access Risk | Low | This risk identification rule specifies to identify a data risk if the number of sensitive data records queried in the following periods of time exceeds 10,000:
|
Use similar SQL statements to query data | Data Access Risk | Low | This risk identification rule specifies to identify a data risk if similar SQL statements are used to query data for 10 or more times within 10 minutes: |
Query a large number of sensitive data records at a time | Data Access Risk | Medium | This risk identification rule specifies to identify a data risk if the number of sensitive data records queried in a single request exceeds 10,000: |
Export a large number of sensitive data records at a time | Data Export Risk | High | This risk identification rule specifies to identify a data risk if the number of sensitive data records exported at a time exceeds 10,000: |
Export a large number of sensitive data records in non-business hours | Data Export Risk | High | This risk identification rule specifies to identify a data risk if the number of sensitive data records exported in the following periods of time exceeds 10,000:
|
Create a risk identification rule
Plan and prepare for the creation of a risk identification rule.
You can create a risk identification rule to identify data risks by specifying fine-grained risk identification conditions in dimensions such as data location, data property, user information, and operation time based on your business requirements. The following table describes the preparations that you must make in advance if you want to use the fine-grained risk identification conditions in the data property and user information dimensions to identify data risks.
Risk identification dimension
Fine-grained risk identification condition
Description
Data property
Sensitivity level of sensitive data
To identify a data risk at a specified sensitivity level, you need to define the sensitivity level for sensitive data in advance. For more information, see Specify the category and sensitivity level of sensitive data.
Data category
To identify a data risk of a specified data category, you need to define the data category for sensitive data in advance. For more information, see Configure a sensitive data identification rule and run a sensitive data identification task.
Sensitive field type
To identify a data risk of a specified sensitive field type, you must define the sensitive field type for sensitive data in advance. For more information, see Configure a sensitive data identification rule and run a sensitive data identification task.
User information
User group
To identify a data risk for a specified user group that belongs to the current Alibaba Cloud account, you need to configure the user group in advance. For more information, see Configure a user group.
RAM role
To identify a data risk for a specified RAM user that belongs to the current Alibaba Cloud account, you need to add the RAM user to the current Alibaba Cloud account in advance. For more information, see Create a RAM user.
In the upper-right corner of the Custom Identification Rules page, click + Risk identification rule.
In the New risk identification rule panel, configure the parameters.
NoteYou can create only a risk identification rule of the statistical association type. A risk identification rule of this type can be used to calculate and aggregate the number of occurrences of a single event, and compare this number with the threshold value that is specified for event occurrences. The risk identification rule is triggered if the number of occurrences exceeds the specified threshold value. For example, a risk identification rule specifies to identify a data risk if the number of sensitive data records that are queried by a user with limited permissions in non-business hours exceeds 10,000.
Configure parameters in the Basic information step.
Parameter
Description
Rule name
The name of the risk identification rule. The name can be 1 to 30 characters in length and cannot contain special characters.
Rule Type
The type of the risk identification rule. Valid values:
Data Access Risk: a data risk that occurs when data is accessed.
Data Export Risk: a data risk that occurs when data is exported.
Data Operation Risk: a data risk that occurs when data is created, modified, or deleted.
Other: a data risk of other types.
Rule level
The risk level of the risk identification rule. Valid values: Low, Medium, and High. You can set this parameter to High for important data based on your business requirements.
Description information
The description of the risk identification rule. The description can be 1 to 100 characters in length.
Click Next.
Configure risk identification conditions and their threshold values.
Configure risk identification conditions.
DataWorks allows you to create a risk identification rule to identify data risks by specifying fine-grained risk identification conditions in dimensions such as data location, data property, user information, and operation time based on your business requirements.
NoteYou can add a maximum of 10 risk identification conditions. After you select a risk identification dimension, click + Add comparison relation to add a risk identification condition from the selected risk identification dimension. You can repeatedly perform this operation to add multiple risk identification conditions. The logical relationship between the risk identification conditions is AND.
Data location
Specifies the range of locations for data risks. The location range can be accurate to the field name.
Parameter
Description
Required
Whether to filter out the selected location
Specifies whether to filter out data risks identified in the selected location. Valid values:
≠: specifies to filter out the selected location. The risk identification rule in which this condition is specified will not identify data risks in the selected location.
=: specifies to identify data risks only in the selected location. The risk identification rule in which this condition is specified is used to identify data risks only in the selected location.
Yes.
Compute Engine Instance Name
The compute engine instance that is specified in the risk identification rule.
NoteYou can create a risk identification rule to identify data risks only in a MaxCompute compute engine instance.
You can specify only one compute engine instance in each risk identification condition. If you want to identify data risks in multiple compute engine instances, you can click + Add comparison relation to add a risk identification condition and specify the compute engine instance in which you want to identify data risks in the condition. You can repeatedly perform the operations to add multiple risk identification conditions and specify the compute engine instance for each added condition.
Yes.
Project Name
The project that is specified in the risk identification rule. You must set Project Name to the project that belongs to the specified compute engine instance. You can select the project from the Project Name drop-down list. You can also enter the name of the project to search for the project.
NoteThe drop-down list displays a maximum of 100 project names.
A fuzzy match is supported if you search for a project by name. After you enter a keyword in the search box, projects whose names contain the keyword are displayed.
You can specify only one project in each risk identification condition. If you want to identify data risks in multiple projects, you can click + Add comparison relation to add a risk identification condition and specify the project in which you want to identify data risks in the condition. You can repeatedly perform the operations to add multiple risk identification conditions and specify the project for each added condition.
Yes.
Table Name
The name of the table that is specified in the risk identification rule. You can specify one or more table names. If you specify multiple table names, separate them with commas (,). You must abide by the following requirements when you specify a table name:
A table name can contain a maximum of 30 characters in length. The number of characters of all table names cannot exceed 100.
You can use an asterisk (
*
) as a wildcard. For example, you can enter*name
to identify all tables whose names contain thename
suffix.
No. If you do not configure this parameter, the risk identification rule identifies data risks in all tables that belong to the specified project by default.
Field Name
The name of the field that is specified in the risk identification rule. You can specify one or more field names. If you specify multiple field names, separate them with commas (,). You must abide by the following requirements when you specify a field name:
A field name can contain a maximum of 30 characters in length. The number of characters of all field names cannot exceed 100.
You can use an asterisk (
*
) as a wildcard. For example, you can enter*name
to identify all fields whose names contain thename
suffix.
No. If you do not configure this parameter, the risk identification rule identifies data risks in all fields by default.
Data property
Specifies the property that is used to filter and identify data risks.
Parameter
Description
Property
The category of the property that you specify to identify data risks based on your business requirements. The following property categories are supported:
Data grading: specifies the sensitivity level of the data risk that you want to identify. You need to define the sensitivity level for sensitive data in advance. For more information, see Specify the category and sensitivity level of sensitive data.
Data category: specifies the data category for the data risk that you want to identify. You need to define the data category for sensitive data in advance. For more information, see Configure a sensitive data identification rule and run a sensitive data identification task.
Sensitive field type: specifies the sensitive field type for the data risk that you want to identify. You need to define the sensitive field type for sensitive data in advance. For more information, see Configure a sensitive data identification rule and run a sensitive data identification task.
Whether to filter out the selected property
Specifies whether to filter out data risks of the selected property. Valid values:
≠: specifies to filter out the selected property. The risk identification rule in which this condition is specified will not identify data risks of the selected property.
=: specifies to identify only data risks of the selected property. The risk identification rule in which this condition is specified is used to identify only data risks of the selected property.
User information
Specifies the category of the user information that is used to filter and identify data risks.
Parameter
Description
Information category
The category of the user information that you specify to identify data risks.
User group: specifies the name of the user group that belongs to the current Alibaba Cloud account. You need to configure the user group in advance. For more information, see Configure a user group.
RAM role: specifies the RAM user that belongs to the current Alibaba Cloud account. You need to add the RAM user to the current Alibaba Cloud account in advance. For more information, see Create a RAM user.
Username: specifies the username of the current Alibaba Cloud account.
Whether to filter out the selected user information
≠: specifies to filter out the selected user information. The risk identification rule in which this condition is specified will not identify data risks of the selected user information.
=: specifies to identify only data risks of the selected user information. The risk identification rule in which this condition is specified is used to identify only data risks of the selected user information.
Operation time
Specifies the time range within which risky operations are performed on data.
Parameter
Description
Select Time Range
The time range within which risky operations are performed on data. You can select one or more days of a week and select one or more hours on that day or days. The time range is accurate to the hour.
Whether to filter out the selected time range
≠: specifies to filter out the selected time range. The risk identification rule in which this condition is specified will not identify risky operations that are performed on data in the selected time range.
=: specifies to identify only risky operations in the selected time range. The risk identification rule in which this condition is specified is used to identify risky operations that are performed on data in the selected time range.
Configure threshold values for the risk identification conditions.
DataWorks allows you to calculate and aggregate the number of occurrences of an event, and compare this number with the specified threshold value of event occurrences. You can also specify a time window for the threshold comparison condition to identify data risks. You can click + Add threshold comparison to add a threshold comparison condition to identify data risks. You can repeatedly perform this operation to add multiple threshold comparison conditions.
Parameter
Description
Threshold Category
Data volume: specifies to identify a data risk based on the number of data records on which you perform operations. If the number of data records on which you perform operations exceeds the threshold value that you specify, the risk identification rule that contains the threshold comparison condition is triggered. The number of data records can range from 1 to 10,000,000. Default value: 1.
Number of occurrences: specifies to identify a data risk based on the number of occurrences of a single event in a specified time range. If the number of occurrences of a single event in a specified time range exceeds the specified threshold value of event occurrences, the risk identification rule that contains the threshold comparison condition is triggered. The number of occurrences of an event is an integer that ranges from 1 to 10,000. Default value: 10.
NoteDataWorks categorizes and identifies a single event.
Time window
The time range within which an event occurs. Default value: 10 minutes. Valid values:
minutes: Valid values for this option range from
1 to 59
.hour: Valid values for this option range from
1 to 23
.day: Valid values for this option range from
1 to 7
.
NoteThis parameter is required only if the Threshold Category parameter is set to Number of occurrences.
Click Next.
Configure an alert notification method.
You can specify an alert notification method to receive an alert notification at the earliest opportunity when a data risk is identified and handle the risk based on the alert notification. You can set an alert notification method to email or webhook URL.
NoteBefore you select an alert notification method, make sure that you have configured the email address and webhook URL in Configure system settings.
Click Save. A risk identification rule is created.
A created custom risk identification rule does not automatically take effect. You must go to the Risk identification rule page and click Revalidate to manually make the rule take effect.
Manage a risk identification rule
On the Custom Identification Rules page, you can view the created rules and the details about the rules. You can also modify desired rules or perform operations on multiple rules at a time.
Section | Description |
1 | You can specify conditions to search for your desired rules in this section. The conditions include data risk type, risk level, built-in rule or not, and risk identification name. Note A fuzzy match is supported if you search for rules by name. After you enter a keyword in the search box, rules whose names contain the keyword are displayed. |
2 | You can perform the following operations in this section:
|
3 | You can perform an operation on multiple rules at a time in this section. You can perform the operations such as Batch effective, Batch invalidation, or Batch delete on multiple rules at a time. You can click the icon to switch between different operations. Note You cannot delete built-in risk identification rules. You can delete only custom risk identification rules that are in the invalid state. |
Comparison of the positions of parameters for a risk identification rule in the old and new versions of the risk identification rule management feature
The following table describes the positions of parameters for a risk identification rule in the old and new versions of the risk identification rule management feature.
For more information about the configurations of a risk identification rule in the new version of the risk identification rule management feature, see Create a risk identification rule. For more information about the configurations of a risk identification rule in the old version of the risk identification rule management feature, see Rule Settings tab.
No. | Configuration item | Position in the old version | Position in the new version |
1 | Rule name | ||
2 | Rule owner |
By default, the owner of the rule is the current Alibaba Cloud account. | This configuration item does not exist. DataWorks records the owner of the rule. |
3 | Rule description | ||
4 | Compute engine instance for which the rule takes effect | To specify a compute engine instance in a risk identification condition, perform the following operations: In the Conditions section of the rule definition step, click Select condition and select Data location from the drop-down list. | |
5 | Project for which the rule takes effect | To specify a project in a risk identification condition, perform the following operations: In the Conditions section of the rule definition step, click Select condition and select Data location from the drop-down list. | |
6 | Data category for the data risk that you want to identify | In the Conditions section of the rule definition step, click Select condition and select Data property. Select Data classification as a property category. | |
7 | Sensitivity level of the data risk that you want to identify | In the Conditions section of the rule definition step, click Select condition and select Data property. Select Data grading as a property category. | |
8 | Sensitive field type for the data risk that you want to identify | In the Conditions section of the rule definition step, click Select condition and select Data property. Select Sensitive field type as a property category. | |
9 | Type of the operation that is performed on data | Valid values:
| Valid values:
|
10 | Table for which the rule takes effect | To specify a table in a risk identification condition, perform the following operations: In the Conditions section of the rule definition step, click Select condition and select Data location. | |
11 | Field for which the rule takes effect | To specify a field in a risk identification condition, perform the following operations: In the Conditions section of the rule definition step, click Select condition and select Data location. | |
12 | Users for which a risk identification rule is triggered when the users access data that is specified in the rule | To specify an information category in a risk identification condition, perform the following operations: In the Conditions section of the rule definition step, click Select condition and select User information. | |
13 | Maximum number of data records that are specified in a risk identification rule | In the Conditions section of the rule definition step, click Select condition and select a condition. In the Threshold comparison section for the selected condition, select Data volume in a threshold comparison condition. | |
14 | Time range that is specified in a risk identification rule | To specify a time range, perform the following operations: In the Conditions section of the rule definition step, click Select condition and select Operation time. | |
15 | Alert notification method for a risk identification rule | Not supported | In the Alert Notification Method section of the Alert Settings step, select an alert notification method. |
What to do next
After the risk identification rule is created and takes effect, you can go to the Data Risks page to view the details of risks that are identified based on the rule and handle the risks at the earliest opportunity.