How to create a data masking rule - DataWorks - Alibaba Cloud Documentation Center

DataWorks supports a variety of data masking scenarios. You can select a scenario and create a data masking rule based on your business requirements. This topic describes how to create a data masking rule and how DataWorks masks query results in your workspace based on data masking rules.

Background information

Data masking scenarios in DataWorks can be categorized into static data masking scenarios and dynamic data masking scenarios.

Dynamic data masking scenarios include masking of displayed data in DataStudio and Data Map, masking of displayed data in DataAnalysis, data masking at the MaxCompute compute engine layer, and data masking at the Hologres compute engine layer.
Static data masking scenarios refer to static data masking in Data Integration.

By default, a data masking rule does not immediately take effect after it is created. You must set the status of the rule to Active. Then, sensitive data can be automatically masked based on the rule in the related data masking scenario.

Note

For information about how to configure the status of a data masking rule, see the Configure the rule status section in this topic.
For information about various data masking scenarios, see the Descriptions of data masking scenarios section in this topic.

Prerequisites

This prerequisite is required in dynamic data masking scenarios and optional in other cases. Sensitive data identification rules must be configured based on your business requirements. This allows you to associate sensitive field types that you specified in the sensitive data identification rules with data masking rules when you create the data masking rules. For more information, see Identify sensitive data.
This prerequisite is required in dynamic data masking scenarios and optional in other cases. Specific users must be added to a whitelist as user groups in advance. You need to perform this operation if you want specific users to have access to sensitive data on which data masking rules take effect within a specified period of time. For more information, see Configure a user group.
This prerequisite is required for data masking at the MaxCompute compute engine layer and optional in other cases. The IP address or endpoint of Data Security Guard must be added to the whitelist of a MaxCompute project on which you want to perform data masking at the compute engine layer. After you add the IP address or endpoint of Data Security Guard to the whitelist of the MaxCompute project, you can call data masking functions to mask sensitive data in query results that you obtain by using methods such as the related DataWorks service, MaxCompute client (odpscmd), and MaxCompute LogView based on data masking rules. For more information, see Sample practice for performing underlying data masking on MaxCompute projects (old version).

Permission management

Manage a data masking rule, such as creating, modifying, and deleting a data masking rule:
- The tenant administrator and tenant security administrator can perform management operations on a data masking rule in all data masking scenarios.
- The workspace administrator and workspace security administrator can perform management operations on a data masking rule only in data masking scenarios on which they have the required permissions.
Manage a whitelist for a data masking rule, such as creating, modifying, and deleting a whitelist:
- The tenant administrator and tenant security administrator can perform management operations on a whitelist in all data masking scenarios.
- The workspace administrator and workspace security administrator can perform management operations on a whitelist only in data masking scenarios on which they have the required permissions.

You must be assigned the required role to perform the preceding operations. For more information about authorization, see Manage permissions on workspace-level services and Manage permissions on global-level services.

Entry for configuring a data masking rule

Go to the DataStudio page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Development and Governance > Data Development. On the page that appears, select the desired workspace from the drop-down list and click Go to Data Development.
Click the icon in the upper-left corner, choose All Products > Data Governance > Data Security Guard, and then click Try now.
Note
- If your Alibaba Cloud account is granted the required permissions, you can directly access the homepage of Data Security Guard.
- If your Alibaba Cloud account is not granted the required permissions, you are redirected to the authorization page of Data Security Guard. You can use the features of Data Security Guard only after your Alibaba Cloud account is granted the required permissions.
In the left-side navigation pane, choose Rule Change > Data Masking. The Data Masking page appears.
In the Masking Scene section of the Data desensitization management page, select a data masking scenario and click + Desensitization rules in the upper-right corner of the page to create a data masking rule based on the data masking scenario.
- Dynamic data masking: Rule configuration in different scenarios is similar. In this topic, the masking of displayed data in DataStudio and Data Map is used to describe key configurations of a data masking rule. You can select a data masking scenario based on your business requirements. For more information, see the Create a dynamic data masking rule in the scenario of masking of displayed data in DataStudio and Data Map section in this topic.
- Static data masking: For more information, see the Create a static data masking rule in the scenario of static data masking in Data Integration section in this topic.

Create a dynamic data masking rule in the scenario of masking of displayed data in DataStudio and Data Map

Select a data masking scenario.
In the Masking Scene section of the Data desensitization management page, click Default scene below Data development / Data map display desensitization and click + Desensitization rules in the upper-right corner of the page.

Create a data masking rule.

In the Create new desensitization rule dialog box, configure the parameters. 新建脱敏规则

Select a sensitive field type and specify the rule name.

Parameter

Description

Sensitive field type

The sensitive field type based on which the data masking rule masks sensitive data.

You can select system built-in sensitive field types or the sensitive field types that you added on the Data Recognition Rules tab of the Sensitive data identification page. For more information about sensitive field types that you can manually add, see Configure a sensitive data identification rule and run a sensitive data identification task.
If you have created data masking rules in the same data masking scenario, DataWorks filters out the sensitive field types that you selected for the data masking rules. This prevents different data masking rules from taking effect for the same sensitive field types in the same data masking scenario.

Desensitization rule name

The name of the data masking rule. The name of the sensitive field type is used as the name of the data masking rule by default. You can also specify a name based on your business requirements. The rule name must be unique.

Configure data masking scenarios.
Select data masking scenarios to which the data masking rule applies. The data masking scenario that you select in Step 1 is used as a valid value of the Desensitization scene parameter by default. You can also change the used value or add more valid values.

Configure the data masking method.

DataWorks supports data masking methods such as original format-based encryption, masking out, hash-based encryption, character replacement, range change, rounding, and leave empty.

Pseudonym

This method replaces the characters of a data record with an artificial pseudonym of the same characteristics. The data format of the pseudonym is the same as that of the original data record. The following table describes the parameters that you need to configure if you select this data masking method.

Parameter	Description
Data watermark	Watermarks allow you to trace the source of data. If your data is leaked, you can trace the potential source from which the data leak occurred based on the data watermark. You can turn on or off Data watermark based on your business requirements. Note Only DataWorks Enterprise Edition or a more advanced edition supports the data watermark feature.
Desensitization characteristic value	Data masking rules vary based on characteristic values. Different data masking results are generated when different characteristic values are used for the same data that you want to mask. If the characteristic value remains unchanged, the same data masking result is returned for a data record at all times. For example, a data record is a123: If the Desensitization characteristic value parameter is set to 0, the data masking result is b124. If the Desensitization characteristic value parameter is set to 1, the data masking result is c234. By default, the Desensitization characteristic value parameter is set to 5. You can select a digit from 0 to 9 as the characteristic value.
Optional. Substitution character set	If you do not set the Sensitive field type parameter to a built-in sensitive field type, you must configure the Substitution character set parameter for your data records. If a character in your data records is included in the character set, the character is replaced with another character of the same type. For example, if a data record contains only digits from 0 to 3 and letters from a to d, the data masking result contains only digits from 0 to 3 and letters from a to d. Note If the character that you want to mask is not included in a character set, it is not replaced.

Masking out

This method replaces each of the characters at specific positions of a data record with an asterisk (*). If you use this data masking method, you must configure the Recommended method or Custom parameter.

Parameter

Description

Recommended method

The recommended methods. You can configure the parameter based on the field that you want to mask.

Valid values: Only show first and last character, Show first three and last two characters, and Show first three and last four characters. You can select a method from the drop-down list based on your business requirements.

Customize

You can flexibly specify whether to mask the specified number of characters of a data record from left to right. You can add up to 10 segments, and The remaining digits must be specified for one of the segments.

The following figure shows how to mask the first three characters and leave the remaining characters intact.

HASH

The following table describes the parameters that you need to configure if you select this data masking method.

Parameter	Description
Data watermark	Watermarks allow you to trace the source of data. If your data is leaked, you can trace the potential source from which the data leak occurred based on the data watermark. You can turn on or off Data watermark based on your business requirements. Note Only DataWorks Enterprise Edition or a more advanced edition supports the data watermark feature.
Encryption Algorithm	The encryption algorithm. Valid values: MD5, SHA256, SHA512, and SM3.
Add salt value	The salt value for each encryption algorithm. By default, 5 is selected. You can select a digit from 0 to 9 as the salt value. Note A salt value is the specific string that you insert. In cryptography, you can insert a specific string to a fixed position of a password to generate a hash value that is different from that of the original password. This process is called salting.

Characters to replace

This method replaces the characters at the specified positions based on the replacement method you select. The following table describes the parameters that you need to configure if you select this data masking method.

Parameter

Description

Replacement position

You can select Replace all, Replace the first three digits, and Four digits after replacement from the drop-down list. You can also select Custom from the drop-down list to configure a custom replacement position.

If you select Custom, you can customize segments and configure the replacement method for each segment. You can add up to 10 segments, and The remaining digits must be specified for one of the segments. 自定义

Replace the way

The replacement method. Valid values: Random replacement, Sample substitution, or Fixed value substitution.

Random replacement: This method randomly replaces the characters at the specific positions. The number of characters remains unchanged before and after the replacement.
Sample substitution: You must specify a sample library first. After you select the sample library, this method replaces the characters at the specific positions with the data in the specified sample library.
Fixed value substitution: You must enter a replacement value. The value must be 1 to 100 characters in length, and cannot be a string that contains only spaces. After you set the value, this method replaces the characters at the specific positions with the replacement value.

Range transform

This method is applicable to only the masking of numeric data. This method masks data within a specified value range to a fixed value. You can add 1 to 10 value ranges.

Parameter	Description
Original value range [m,n)	The value range of the original data record. The valid value is a numeric value that is greater than or equal to 0. A maximum of two decimal places is supported.
Value after desensitization	The value that is used to replace the data record that you want to mask. The valid value is a numeric value that is greater than or equal to 0. A maximum of two decimal places is supported.

integer

Parameter	Description
Original data type	Only numeric data is supported.
Keep decimal places	You can select an integer from 0 to 5 as the valid value. The remaining parts are rounded. For example, if the original value is 3.1415 and the value is rounded down to two decimal places, the data masking result is 3.14.

empty

This method replaces the original data record with an empty string.

Verify the data masking result.
You can enter sample data in the Sample data field and click Verify. The sample data must be 0 to 100 characters in length. The data masking result is displayed in the Desensitization effect field.
Click Save or Save and take effect. The rule configuration is complete.

You can perform the following operations after you configure the data masking rule:

If you want to perform dynamic data masking, you can specify a whitelist for the rule. If you add specific users to the whitelist, the users can have access to the sensitive data on which the data masking rule takes effect within a subsequent specified period of time. For information about how to add users to a whitelist, see the Configure a whitelist for the data masking rule (supported only for dynamic data masking scenarios) section in this topic.
By default, the data masking rule is inactive after you create it. After you set its status to Active, the rule can be used in data masking scenarios. For information about how to set the rule status, see Configure the rule status.

Create a static data masking rule in the scenario of static data masking in Data Integration

In the Masking Scene section of the Data desensitization management page, click Default scene below Static desensitization of data integration and click + Desensitization rules in the upper-right corner of the page.

Create a data masking rule.

In the Masking Rule dialog box, configure the parameters.

脱敏规则

Select a sensitive field type and specify the rule name.

Parameter	Description
Sensitive data type	There are: Select an existing sensitive field type from the drop-down list on the right based on your business requirements. The existing sensitive field types include the built-in sensitive field types and custom sensitive field types. The new type: Enter a sensitive field type name. The name must be unique. Note The built-in sensitive field types are Mobile Phone Number, Id Card, Bank Card, Email, IP, Car No, Post Code, Seat Number, Mac Address, Address, Name, Company, Nation, Constellation, Gender, and Nationality.
Name of the desensitization rule	The name of the data masking rule. The name of the sensitive field type is used as the name of the data masking rule by default. You can also specify a name based on your business requirements. The rule name must be unique.

Configure the data masking method.
You can set the Method parameter to Pseudonym, The hash, or Masking Out based on your business requirements.
Pseudonym
This method replaces the characters of a data record with an artificial pseudonym of the same characteristics. The data format of the pseudonym is the same as that of the original data record.
- If you set the Sensitive data type parameter to a built-in sensitive field type, such as Mobile Phone Number, Id Card, Bank Card, Email, IP, Car No, Post Code, Seat Number, Mac Address, Address, Name, or Company, you must configure the Domain parameter for your data records.
  Domain: You can select a digit from 0 to 9 as the security domain. Data masking rules vary depending on security domains. Different data masking results are generated when the same data record that you want to mask resides in different security domains. For example, if the data record is a123 and the security domain is set to 0, the data masking result is b124. If the security domain is set to 1, the data masking result is c234. In a security domain, the same data masking result is returned for a data record at all times.
- If you do not set the Sensitive data type parameter to a built-in sensitive field type, you must configure the Replacement character set parameter for your data records.
  Replacement character set: You can separate multiple characters in a character set with commas (,). Each character can be a letter or a digit. If a character in your data records is included in this character set, the character is replaced with another character of the same type. For example, if a data record contains only digits from 0 to 3 and letters from a to d, the data masking result contains only digits from 0 to 3 and letters from a to d. If the character is not included in this character set, it is not replaced.
The hash
This method encrypts a data record to generate a hash value of a fixed length. If you select this method, you must configure the Domain parameter.
Domain: You can select a digit from 0 to 9 as the security domain. Data masking rules vary based on security domains. Different data masking results are generated when the same data record that you want to mask resides in different security domains. In a security domain, the same data masking result is returned for a data record at all times.
For example, a data record is a123:
- If the security domain is set to 0, the data masking result is b124.
- If the security domain is set to 1, the data masking result is c234.
Masking Out
This method replaces each of the characters at specific positions of a data record with an asterisk (*).
- Recommended: You can select Only show first and last character, Show first three and last two characters, and Show first three and last four characters from the Recommended drop-down list.
- Custom: You can flexibly specify whether to mask the specified number of characters of a data record from left to right. You can add up to 10 segments, and The remaining digits must be specified for one of the segments.
  - Example 1: Mask the first three characters and leave the remaining characters intact.
  - Example 2: Mask the last three characters and leave the remaining characters intact.

Verify the data masking result.
You can enter sample data in the Sample data field and click Test. The sample data must be 0 to 100 characters in length. The data masking result is displayed in the Effect of desensitization field.
Click OK. The rule configuration is complete.

You can perform the following operations after you configure the data masking rule:

By default, the data masking rule is inactive after you create it. After you set its status to Active, the rule can be used in data masking scenarios. For information about how to set the rule status, see Configure the rule status.
After you create a data masking rule for the DataWorks Data Integration Config scenario, you can use the rule when you create a task to synchronize data from a single table in real time. For more information, see Configure data masking.

Configure a whitelist for the data masking rule (supported only for dynamic data masking scenarios)

For a rule that you configure in dynamic data masking scenarios, you can specify a whitelist for the rule. If you add specific users to the whitelist, the users can have access to the sensitive data on which the data masking rule takes effect within a subsequent specified period of time.

Note

Before you create a whitelist, you must add specific users to the whitelist as a user group. For information about how to configure a user group, see Configure a user group.

You can perform the following steps to create a whitelist for a rule:

On the Data desensitization management page, click the Whitelist configuration tab.
Click + Whitelist in the upper-right corner.

In the New whitelist panel, configure the parameters.

Note

The Whitelist configuration tab is not available in the scenarios of data masking at the Hologres compute engine layer and static data masking in Data Integration.
If a user queries data within the time range that is specified by the Effective time parameter in the whitelist, the query results are not masked.

配置白名单

The following table describes the parameters that you need to configure.

Parameter	Description
Sensitive field type	You can select only sensitive field types in the selected data masking scenario.
User group scope	You can select a user group that you configured. You can select up to 50 user groups. After you add the selected user groups to the whitelist, you can use the Alibaba Cloud accounts or RAM users that belong to the selected user groups to view the original data that is not masked. For information about how to configure a user group, see Configure a user group.
Effective time	The effective time range of the whitelist. If a user queries data beyond the time range that is specified in the whitelist, the query results are masked. Note If you set this parameter to Short, the effective time range is from the current time to the specified time. If a user queries data within this time range, the query results are not masked.

Click Save to complete the whitelist configurations.

Configure the rule status

On the Data masking rules tab, find the desired rule and toggle the switch in the Status column. You can set the rule status to Effective or Invalid.

After the rule is configured, you can perform operations on the rule, such as modifying, deleting, and querying the details of the rule.

Note

You cannot delete or modify a rule in the Effective state. To delete or modify a rule, you must set the rule status to Invalid and then check whether the rule is configured for a task. You must contact the security administrator for further confirmation.
After the rule status is set to Invalid, you can modify the data masking method for the rule, but you cannot modify the sensitive field type or name for the rule.
After you modify the parameters, set the status of the rule to Active. Then, the data of the task for which the rule is configured can be masked based on the rule.

Example of using the data masking rule

Sample practice for performing underlying data masking on MaxCompute projects (old version)
After you create a data masking rule for the DataWorks Data Integration Config scenario, you can use the rule when you create a task to synchronize data from a single table in real time. For more information, see Configure data masking.

Background information

Prerequisites

Permission management

Entry for configuring a data masking rule

Create a dynamic data masking rule in the scenario of masking of displayed data in DataStudio and Data Map

Pseudonym

Masking out

HASH

Characters to replace

Range transform

integer

empty

Create a static data masking rule in the scenario of static data masking in Data Integration

Pseudonym

The hash

Masking Out

Configure a whitelist for the data masking rule (supported only for dynamic data masking scenarios)

Configure the rule status

Example of using the data masking rule