If you want to hide the key information of sensitive data in a MaxCompute project from specific users, you can enable the dynamic data masking feature provided by MaxCompute. This feature hides or replaces sensitive data in real time when unauthorized users access or view the data. This prevents leaks of sensitive data. This topic describes how to enable the dynamic data masking feature of MaxCompute and provides examples of using the feature.
Feature description
MaxCompute provides the dynamic data masking feature that uses masking techniques in data computing and display to protect sensitive data such as personally identifiable information (PII). You can use this feature in scenarios such as business development and testing, data sharing, and O&M. Data masking policies allow you to mask data by using methods such as masking, hashing, character replacement, numeric value rounding, and date rounding. You can use data masking policies together with the data classification and grading feature of Data Security Guard to mask user data, such as identity information, bank card numbers, addresses, and phone numbers.
Dynamic data masking is implemented at the beginning of the process in which data is read from a storage service. This ensures that data is masked during query, download, association, and user-defined function (UDF) computing and prevents the risk of sensitive data leaks.
Limits
Supported regions
This feature is in public preview and is available in the following regions: China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Ulanqab), China (Shenzhen), China (Chengdu), China (Hong Kong), Japan (Tokyo), Malaysia (Kuala Lumpur), Indonesia (Jakarta), and Germany (Frankfurt).
Supported driver versions
MaxCompute connection method
Driver version
Support for data masking
SDK for Java
0.48.0-public and later
Yes
odpscmd
0.47.1 and later
Yes
JDBC
3.4.3 and later
Yes
MaxFrame
All versions
Yes
PyODPS
All versions
Yes
Go SDK
All versions
Yes
Internal tables
MaxCompute supports data masking policies for internal tables, but not for external tables.
The dynamic data masking feature and row-level permission feature are mutually exclusive. If a data masking policy is configured for a table, you cannot configure a row-level permission policy for the table. Similarly, if you configure a row-level permission policy for a table, you cannot configure a data masking policy for the table.
If you use a data masking policy to mask Chinese characters, you must make sure that the encoding format of the Chinese characters is UTF-8.
Views
Traditional views support data masking policies. The data masking policies of traditional views are aligned with the data masking policies of the original tables. If data masking policies are bound to or unbound from the original tables, the policies are also bound to or unbound from the traditional views at the same time.
When you create a materialized view, the data masking policy of the original table applies to the materialized view. If you bind a new masking policy to or unbind a masking policy from the original table after you create the materialized view, the masking policy of the materialized view remains unchanged.
Data masking policies
When users access sensitive data, if multiple data masking policies are in effect at the same time, the system preferentially masks sensitive data based on the data masking policy with the highest priority. For more information, see Priorities of predefined masking policies.
Data masking process
During the data response process, the system checks the configurations of data masking policies. If a data masking policy is configured, the system returns the masked data based on the policy. If no data masking policy is configured, the system returns the data in plaintext.
Data masking commands
Enable or disable the data masking feature for a project
odps.data.masking.policy.enable
is a project-level property and specifies whether to enable the data masking feature. Only the project owner or the account that is assigned the Super_Administrator or Admin role can enable or disable the data masking feature. For more information about how to assign a role to a user account, see Assign a built-in administrator role to a user.
Enable the data masking feature for a project.
setproject odps.data.masking.policy.enable=true;
Disable the data masking feature for a project.
setproject odps.data.masking.policy.enable=false;
Create and drop a data masking policy
Command syntax
Create a data masking policy.
CREATE DATA MASKING POLICY [IF NOT EXISTS] <policy_name> TO { USER <user_list> | ROLE <role_list> | default } USING <Predefined Masking Policy>;
Drop a data masking policy.
DROP DATA MASKING POLICY <policy_name>;
Parameters
Parameter
Required
Description
policy_name
Yes
The name of the masking policy. The policy name is not case-sensitive and cannot contain special characters. It can contain letters, digits, and underscores (_). We recommend that you start the name with a letter. The name cannot exceed 128 bytes in length.
USER | ROLE | default
Yes
Configure one of the three parameters.
USER: a data masking policy at the user level. In <user_list>, specify the names of the users for which you want to perform data masking. You can run the
list users;
command in MaxCompute to obtain user information.ROLE: a data masking at the role level. In <role_list>, specify the names of the roles for which you want to perform data masking. You can run the
list roles;
command in MaxCompute to obtain role information.default: a default data masking policy. When a user or role accesses a sensitive data column, if no data masking policies are matched, the default policy is used to perform data masking on the sensitive data column.
Predefined Masking Policy
Yes
The predefined data masking policy. For more information, see Predefined data masking policies.
Examples
Example 1: Create an MD5 hash masking policy for the users userA, userB, and userC.
CREATE data masking policy IF NOT EXISTS masking_test_001 TO USER (userA, userB, userC) USING MASKED_MD5(0);
Example 2: Create an MD5 hash masking policy for the roles role_project_deploy and role_project_dev.
CREATE data masking policy IF NOT EXISTS masking_test_001 TO ROLE (role_project_deploy, role_project_dev) USING MASKED_MD5(0);
Bind or unbind masking policies to or from sensitive data columns
Command syntax
-- Bind a masking policy to a sensitive data column of a sensitive table. APPLY DATA MASKING POLICY <policy_name> BIND TO TABLE <table_name> COLUMN <column_name>; -- Unbind a specified masking policy from a sensitive data column of a sensitive table. APPLY DATA MASKING POLICY <policy_name> UNBIND FROM TABLE <table_name> COLUMN <column_name>; -- Unbind all masking policies from a sensitive data column of a sensitive table. APPLY DATA MASKING POLICY UNBIND ALL FROM TABLE <table_name> COLUMN <column_name>; -- Unbind all masking policies from all sensitive data columns of a sensitive table. APPLY DATA MASKING POLICY UNBIND ALL FROM TABLE <table_name>;
Parameters
Parameter
Required
Description
policy_name
Yes
The name of the masking policy.
table_name
Yes
The name of the table that contains sensitive data.
column_name
Yes
The name of the column that contains sensitive data.
View masking policies
Command syntax
-- View the information about a data masking policy. DESC DATA MASKING POLICY <policy_name>; -- View the extended information about a table, including the information about sensitive data columns and applied masking policies. DESC EXTENDED <table_name>; -- Display the names of all masking policies in the current project. LIST DATA MASKING POLICY; -- Display the names of the masking policies that are bound to a specified user. LIST DATA MASKING POLICY TO USER <user_name>; -- Display the names of the masking policies that are bound to a specified role. LIST DATA MASKING POLICY TO ROLE <role_name>; -- Display the names of all masking policies that are bound to a specified table. LIST DATA MASKING POLICY ON <table_name>; -- Display the names of all masking policies that are bound to a specified column of a specified table. LIST DATA MASKING POLICY ON <table_name> TO COLUMN <column_name>;
Parameters
Parameter
Required
Description
policy_name
Yes
The name of the masking policy.
table_name
Yes
The name of the table that contains sensitive data.
column_name
Yes
The name of the column that contains sensitive data.
user_name
Yes
The username.
role_name
Yes
The name of the role.
Predefined data masking policies
Predefined data masking policies include masking, hashing, character replacement, and rounding. To protect data, you can select a data masking policy based on the data types of sensitive data.
Policy type | Policy name | Command syntax | Description |
General-purpose policy | Unmasking | UNMASKED | Unmasks sensitive data. Supported data types: all types. |
NULL returned | MASKED_NULLIFY | Returns NULL to mask data.
| |
Default value assignment | MASKED_DV | Uses default values to mask data. For more information about the default value of each data type, see Default values of data types for the MASKED_DV policy.
| |
Year-part retaining | MASKED_DATE_YEAR | Retains only the year part of a time value. For other parts, 01-01 00:00:00 (UTC) is used.
| |
Rounding | MASKED_POINT_RESERVE(<num>) | Performs rounding on sensitive data and returns a value that contains zero to five decimal places.
| |
Masking | Masking of the start and end parts | MASKED_STRING_MASKED_BA(<before>, <after>) | Masks the start and end parts of a string with asterisks (
|
Masking of middle parts of a string | MASKED_STRING_UNMASKED_BA(<before>, <after>) | Displays the start and end parts of a string in plaintext and replaces the characters in the middle part with asterisks (
| |
Hashing | SHA-256 hashing | MASKED_SHA256(<salt>) | Uses the SHA-256 hash algorithm to mask data.
|
SHA-512 hashing | MASKED_SHA512(<salt>) | Uses the SHA-512 hash algorithm to mask data.
| |
MD5 hashing | MASKED_MD5(<salt>) | Uses the MD5 hash algorithm to mask data.
| |
SM3 hashing | MASKED_SM3(<salt>) | Uses the SM3 hash algorithm to mask data.
| |
Character replacement | Random character replacement | MASKED_REPLACE_RANDOM(<position>) | Replaces data with random characters. The length of the string remains unchanged after replacement. Random characters include digits and letters.
|
Random character replacement of the first and last N characters | MASKED_REPLACE_RANDOM_BA(<before>, <after>) | Replaces the first N characters and last M characters of data with random characters. The length of the string remains unchanged after replacement. Random characters include digits and letters.
| |
Fixed character replacement | MASKED_REPLACE_FIXED(<position>, <fixed_string>) |
|
Examples
Mask sensitive personal information
In this example, masking policies are configured to mask sensitive personal information.
Prepare data.
Create a sensitive personal information table and insert sensitive data into the table.
-- Create a sensitive personal information table. CREATE TABLE if NOT EXISTS personal_info ( id bigint COMMENT 'Unique user ID', name string COMMENT 'Username', age int COMMENT 'User age', gender string COMMENT 'User gender', height float COMMENT 'User height', birthday date COMMENT 'User birthday', phone_number string COMMENT 'User phone number', email string COMMENT 'User email address', address string COMMENT 'User address', salary decimal(18, 2) COMMENT 'User salary', create_time timestamp COMMENT 'User information creation time', update_time timestamp COMMENT 'User information update time', is_deleted boolean COMMENT 'Flag bit to specify whether user information is deleted' ); -- Insert sensitive data into the table. INSERT INTO personal_info VALUES (1, 'Zhang San', 18, 'Male', 178.56, '1990-01-01', '13800000000', 'zhangsan@example.com', 'Haidian District of Beijing', 5000.00, '2023-04-19 11:32:00', '2023-04-19 11:32:00', false), (2, 'Li Si', 20, 'Female', 162.70, '1992-02-02', '13900000000', 'lisi@example.com', 'Pudong New Area of Shanghai', 6000.00, '2023-04-19 11:32:00', '2023-04-19 11:32:00',false), (3, 'Wang Wu', 22, 'Male', 185.21, '1994-03-03', '14000000000', 'wangwu@example.com', 'Nanshan District of Shenzhen', 7000.00, '2023-04-19 11:32:00', '2023-04-19 11:32:00', false);
Configure masking policies.
Retain the first character of the username and replace other characters with asterisks (
*
).CREATE data masking policy IF NOT EXISTS masking_name TO USER (RAM$xxx@test.aliyunid.com:xxx) USING MASKED_STRING_UNMASKED_BA(1, 0); apply data masking policy masking_name bind TO TABLE personal_info COLUMN name;
Round up or down user heights.
CREATE data masking policy IF NOT EXISTS masking_height TO USER (RAM$xxx@test.aliyunid.com:xxx) USING MASKED_POINT_RESERVE(0); apply data masking policy masking_height bind TO TABLE personal_info COLUMN height;
Retain only the year part of user birthdays.
CREATE data masking policy IF NOT EXISTS masking_birthday TO USER (RAM$xxx@test.aliyunid.com:xxx) USING MASKED_DATE_YEAR; apply data masking policy masking_birthday bind TO TABLE personal_info COLUMN birthday;
Use the SM3 algorithm to hash phone numbers of users by default.
CREATE DATA MASKING POLICY default_sm3 TO DEFAULT USING MASKED_SM3(0); apply data masking policy default_sm3 bind TO TABLE personal_info COLUMN phone_number;
Use an account to which the masking policies are bound to query the masked data.
SELECT id, name, height, birthday, phone_number FROM personal_info; -- Data before masking +------------+------+--------+----------+--------------+ | id | name | height | birthday | phone_number | +------------+------+--------+----------+--------------+ | 1 | Zhang San | 178.56 | 1990-01-01 | 13800000000 | | 2 | Li Si | 162.7 | 1992-02-02 | 13900000000 | | 3 | Wang Wu | 185.21 | 1994-03-03 | 14000000000 | +------------+------+--------+----------+--------------+ -- Data after masking +------------+------------+------------+------------+--------------+ | id | name | height | birthday | phone_number | +------------+------------+------------+------------+--------------+ | 1 | Zhang* | 179.0 | 1990-01-01 | lvYJaH4ElL2ilpQx/8tfMUw7xP22yblIgmfWp0/msUQ= | | 2 | Li* | 163.0 | 1992-01-01 | 9fFWacNSwCRZLAjMHqunlfwkqhTbP2ubuDOeOSh4N1c= | | 3 | Wang* | 185.0 | 1994-01-01 | k/0JoQCSarJg9ATJ5tyVnhQf1jIBxHXRbB+cvUm4OmE= | +------------+------------+------------+------------+--------------+
Mask data for all users and roles by default
When multiple masking policies are hit by users or roles at the same time, the masking policy with a higher priority takes effect.
Use the MASKED_SHA256(5) masking policy for users by default.
CREATE DATA MASKING POLICY default_hash_policy
TO DEFAULT
USING MASKED_SHA256(5);
Use the UNMASKED policy for Users A and B.
CREATE DATA MASKING POLICY ab_unmask_policy
TO USER (A, B)
USING UNMASKED;
Result: Users A and B can access plaintext data. Other users can access only the data that is masked by using the SHA-256 hash algorithm.
Users A and B hit both the MASKED_SHA256(5) and UNMASKED policies. The UNMASKED policy takes effect for the users because the policy has a higher priority. For more information, see Priorities of predefined masking policies. Other users hit the MASKED_SHA256(5) policy.
Appendixes
Priorities of predefined masking policies
When users access sensitive data, if multiple data masking policies are hit at the same time, the system masks sensitive data based on the data masking policies with higher priorities.
For example, when User A accesses the col_string column, two masking policies, MASKED_REPLACE_RANDOM(3) and MASKED_SM3, are matched. The priority level of MASKED_REPLACE_RANDOM(3) is 3, and the priority level of MASKED_SM3 is 4. The system performs data masking based on the higher-priority MASKED_REPLACE_RANDOM(3) policy. User A can then view the masking result after random character replacement.
Priority level | Predefined masking policy |
0 (highest) | UNMASKED |
1 | MASKED_POINT_RESERVE(num) |
2 | MASKED_DATE_YEAR |
3 | MASKED_STRING_MASKED_BA(before, after) |
MASKED_STRING_UNMASKED_BA(before, after) | |
MASKED_REPLACE_RANDOM(position) | |
MASKED_REPLACE_RANDOM_BA(before, after) | |
MASKED_REPLACE_FIXED(position) | |
4 | MASKED_SHA256 |
MASKED_SHA512 | |
MASKED_MD5 | |
MASKED_SM3 | |
5 | MASKED_DV |
6 (lowest) | MASKED_NULLIFY |
Default values of data types for the MASKED_DV policy
Data type | Default value |
bigint | 0 |
double | 0.0 |
decimal | 0 |
string | "" |
datetime | DATETIME'1970-01-01 00:00:00' (UTC) |
boolean | false |
tinyint | 0 |
smallint | 0 |
int | 0 |
binary | '' |
float | 0.0 |
double | 0.0 |
decimal | 0 |
varchar(n) | "" |
char(n) | " " (fill n spaces) |
date | DATE'1970-01-01' |
timestamp | TIMESTAMP'1970-01-01 00:00:00' (UTC) |
timestamp_ntz | TIMESTAMP'1970-01-01 00:00:00' (UTC) |
array | {Default value of a subtype} |
map | {key:value} (key and value separately correspond to the default values of a subtype.) |
json | "" |
struct | (Default value of a subtype) |