All Products
Search
Document Center

MaxCompute:Dynamic data masking

最終更新日:Oct 29, 2024

If you want to hide the key information of sensitive data in a MaxCompute project from specific users, you can enable the dynamic data masking feature provided by MaxCompute. This feature hides or replaces sensitive data in real time when unauthorized users access or view the data. This prevents leaks of sensitive data. This topic describes how to enable the dynamic data masking feature of MaxCompute and provides examples of using the feature.

Feature description

MaxCompute provides the dynamic data masking feature that uses masking techniques in data computing and display to protect sensitive data such as personally identifiable information (PII). You can use this feature in scenarios such as business development and testing, data sharing, and O&M. Data masking policies allow you to mask data by using methods such as masking, hashing, character replacement, numeric value rounding, and date rounding. You can use data masking policies together with the data classification and grading feature of Data Security Guard to mask user data, such as identity information, bank card numbers, addresses, and phone numbers.

Dynamic data masking is implemented at the beginning of the process in which data is read from a storage service. This ensures that data is masked during query, download, association, and user-defined function (UDF) computing and prevents the risk of sensitive data leaks.image

Limits

  • Supported regions

    This feature is in public preview and is available in the following regions: China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Ulanqab), China (Shenzhen), China (Chengdu), China (Hong Kong), Japan (Tokyo), Malaysia (Kuala Lumpur), Indonesia (Jakarta), and Germany (Frankfurt).

  • Supported driver versions

    MaxCompute connection method

    Driver version

    Support for data masking

    SDK for Java

    0.48.0-public and later

    Yes

    odpscmd

    0.47.1 and later

    Yes

    JDBC

    3.4.3 and later

    Yes

    MaxFrame

    All versions

    Yes

    PyODPS

    All versions

    Yes

    Go SDK

    All versions

    Yes

  • Internal tables

    • MaxCompute supports data masking policies for internal tables, but not for external tables.

    • The dynamic data masking feature and row-level permission feature are mutually exclusive. If a data masking policy is configured for a table, you cannot configure a row-level permission policy for the table. Similarly, if you configure a row-level permission policy for a table, you cannot configure a data masking policy for the table.

    • If you use a data masking policy to mask Chinese characters, you must make sure that the encoding format of the Chinese characters is UTF-8.

  • Views

    • Traditional views support data masking policies. The data masking policies of traditional views are aligned with the data masking policies of the original tables. If data masking policies are bound to or unbound from the original tables, the policies are also bound to or unbound from the traditional views at the same time.

    • When you create a materialized view, the data masking policy of the original table applies to the materialized view. If you bind a new masking policy to or unbind a masking policy from the original table after you create the materialized view, the masking policy of the materialized view remains unchanged.

  • Data masking policies

    When users access sensitive data, if multiple data masking policies are in effect at the same time, the system preferentially masks sensitive data based on the data masking policy with the highest priority. For more information, see Priorities of predefined masking policies.

Data masking process

During the data response process, the system checks the configurations of data masking policies. If a data masking policy is configured, the system returns the masked data based on the policy. If no data masking policy is configured, the system returns the data in plaintext.image

Data masking commands

Enable or disable the data masking feature for a project

odps.data.masking.policy.enable is a project-level property and specifies whether to enable the data masking feature. Only the project owner or the account that is assigned the Super_Administrator or Admin role can enable or disable the data masking feature. For more information about how to assign a role to a user account, see Assign a built-in administrator role to a user.

  • Enable the data masking feature for a project.

    setproject odps.data.masking.policy.enable=true;
  • Disable the data masking feature for a project.

    setproject odps.data.masking.policy.enable=false;

Create and drop a data masking policy

  • Command syntax

    • Create a data masking policy.

      CREATE DATA MASKING POLICY [IF NOT EXISTS] <policy_name> 
      TO { USER <user_list> | ROLE <role_list> | default } 
      USING <Predefined Masking Policy>;
    • Drop a data masking policy.

      DROP DATA MASKING POLICY <policy_name>;
  • Parameters

    Parameter

    Required

    Description

    policy_name

    Yes

    The name of the masking policy. The policy name is not case-sensitive and cannot contain special characters. It can contain letters, digits, and underscores (_). We recommend that you start the name with a letter. The name cannot exceed 128 bytes in length.

    USER | ROLE | default

    Yes

    Configure one of the three parameters.

    • USER: a data masking policy at the user level. In <user_list>, specify the names of the users for which you want to perform data masking. You can run the list users; command in MaxCompute to obtain user information.

    • ROLE: a data masking at the role level. In <role_list>, specify the names of the roles for which you want to perform data masking. You can run the list roles; command in MaxCompute to obtain role information.

    • default: a default data masking policy. When a user or role accesses a sensitive data column, if no data masking policies are matched, the default policy is used to perform data masking on the sensitive data column.

    Predefined Masking Policy

    Yes

    The predefined data masking policy. For more information, see Predefined data masking policies.

  • Examples

    • Example 1: Create an MD5 hash masking policy for the users userA, userB, and userC.

      CREATE data masking policy IF NOT EXISTS masking_test_001
      TO USER (userA, userB, userC)
      USING MASKED_MD5(0);
    • Example 2: Create an MD5 hash masking policy for the roles role_project_deploy and role_project_dev.

      CREATE data masking policy IF NOT EXISTS masking_test_001
      TO ROLE (role_project_deploy, role_project_dev)
      USING MASKED_MD5(0);

Bind or unbind masking policies to or from sensitive data columns

  • Command syntax

    -- Bind a masking policy to a sensitive data column of a sensitive table.
    APPLY DATA MASKING POLICY <policy_name> BIND TO TABLE <table_name> COLUMN <column_name>;
    
    -- Unbind a specified masking policy from a sensitive data column of a sensitive table.
    APPLY DATA MASKING POLICY <policy_name> UNBIND FROM TABLE <table_name> COLUMN <column_name>;
    
    -- Unbind all masking policies from a sensitive data column of a sensitive table.
    APPLY DATA MASKING POLICY UNBIND ALL FROM TABLE <table_name> COLUMN <column_name>;
    
    -- Unbind all masking policies from all sensitive data columns of a sensitive table.
    APPLY DATA MASKING POLICY UNBIND ALL FROM TABLE <table_name>;
  • Parameters

    Parameter

    Required

    Description

    policy_name

    Yes

    The name of the masking policy.

    table_name

    Yes

    The name of the table that contains sensitive data.

    column_name

    Yes

    The name of the column that contains sensitive data.

View masking policies

  • Command syntax

    -- View the information about a data masking policy.
    DESC DATA MASKING POLICY <policy_name>;
    
    -- View the extended information about a table, including the information about sensitive data columns and applied masking policies.
    DESC EXTENDED <table_name>;
    
    -- Display the names of all masking policies in the current project.
    LIST DATA MASKING POLICY;
    
    -- Display the names of the masking policies that are bound to a specified user.
    LIST DATA MASKING POLICY TO USER <user_name>;
    
    -- Display the names of the masking policies that are bound to a specified role.
    LIST DATA MASKING POLICY TO ROLE <role_name>;
    
    -- Display the names of all masking policies that are bound to a specified table.
    LIST DATA MASKING POLICY ON <table_name>;
    
    -- Display the names of all masking policies that are bound to a specified column of a specified table.
    LIST DATA MASKING POLICY ON <table_name> TO COLUMN <column_name>;
  • Parameters

    Parameter

    Required

    Description

    policy_name

    Yes

    The name of the masking policy.

    table_name

    Yes

    The name of the table that contains sensitive data.

    column_name

    Yes

    The name of the column that contains sensitive data.

    user_name

    Yes

    The username.

    role_name

    Yes

    The name of the role.

Predefined data masking policies

Predefined data masking policies include masking, hashing, character replacement, and rounding. To protect data, you can select a data masking policy based on the data types of sensitive data.

Policy type

Policy name

Command syntax

Description

General-purpose policy

Unmasking

UNMASKED

Unmasks sensitive data.

Supported data types: all types.

NULL returned

MASKED_NULLIFY

Returns NULL to mask data.

  • Supported data types: all types.

  • Examples

    -- Data of the STRING type before masking
    +------------+
    | col_string |
    +------------+
    | Michael Johnson |
    +------------+
    
    -- Masking policy
    MASKED_NULLIFY
    
    -- Data after masking
    +------------+
    | col_string | 
    +------------+
    | NULL       | 
    +------------+

Default value assignment

MASKED_DV

Uses default values to mask data. For more information about the default value of each data type, see Default values of data types for the MASKED_DV policy.

  • Supported data types: all types.

  • Examples

    -- Data of the TIMESTAMP type before masking
    +---------------+
    | col_timestamp |
    +---------------+
    | 2024-05-01 11:12:13 |
    +---------------+
    -- Masking policy
    MASKED_DV
    -- Data after masking. The default time zone of the project is UTC+8.
    +---------------+
    | col_timestamp | 
    +---------------+
    | 1970-01-01 08:00:00 | 
    +---------------+

Year-part retaining

MASKED_DATE_YEAR

Retains only the year part of a time value. For other parts, 01-01 00:00:00 (UTC) is used.

  • Supported data types: DATE, DATETIME, TIMESTAMP_NTZ, and TIMESTAMP.

  • Examples

    -- Data of the TIMESTAMP type before masking
    +---------------+
    | col_timestamp |
    +---------------+
    | 2024-05-01 11:12:13 |
    +---------------+
    -- Masking policy
    MASKED_DATE_YEAR
    -- Data after masking. The default time zone of the project is UTC+8.
    +---------------+
    | col_timestamp | 
    +---------------+
    | 2024-01-01 08:00:00 | 
    +---------------+

Rounding

MASKED_POINT_RESERVE(<num>)

Performs rounding on sensitive data and returns a value that contains zero to five decimal places.

  • num: the number of decimal places after the decimal point. The value of num is an integer in the range of [0, 5].

  • Supported data types: DECIMAL, FLOAT, and DOUBLE.

  • Examples

    -- Data of the FLOAT type before masking
    +-----------+
    | col_float |
    +-----------+
    | 1.12345   |
    +-----------+
    -- Masking policy for returning a value that contains two decimal places.
    MASKED_POINT_RESERVE(2)
    -- Data after masking
    +------------+
    | col_float  | 
    +------------+
    | 1.12       | 
    +------------+

Masking

Masking of the start and end parts

MASKED_STRING_MASKED_BA(<before>, <after>)

Masks the start and end parts of a string with asterisks (* ), and retains the middle part of the string in plaintext.

  • before: The first N characters of a string are replaced by asterisks (*). N is specified by the before parameter. The value of before is an integer greater than or equal to 0. If the value of before is 0, the start part of the string is not masked.

  • after: The last M characters are replaced by asterisks (*). M is specified by the after parameter. The value of after is an integer greater than or equal to 0.

  • Supported data types: STRING, VARCHAR, CHAR, and BINARY.

  • Examples

    -- Data of the STRING type before masking
    +------------+
    | col_string |
    +------------+
    | Michael Johnson |
    +------------+
    -- Masking policy for replacing the first three characters and the last three characters with asterisks (*) and displaying other characters in plaintext.
    MASKED_STRING_MASKED_BA(3, 3)
    -- Data after masking
    +------------+
    | col_string | 
    +------------+
    | ***hael John*** | 
    +------------+

Masking of middle parts of a string

MASKED_STRING_UNMASKED_BA(<before>, <after>)

Displays the start and end parts of a string in plaintext and replaces the characters in the middle part with asterisks (*).

  • before: The first N characters of a string are in plaintext. N is specified by the before parameter. The value of before is an integer greater than or equal to 0.

  • after: The last M characters of a string are in plaintext. M is specified by the after parameter. The value of after is an integer greater than or equal to 0.

  • Supported data types: STRING, VARCHAR, CHAR, and BINARY.

  • Examples

    -- Data of the STRING type before masking
    +------------+
    | col_string |
    +------------+
    | Michael Johnson |
    +------------+
    -- Masking policy for displaying the first character and last character in plaintext and replacing other characters with asterisks (*).
    MASKED_STRING_UNMASKED_BA(1, 1)
    -- Data after masking
    +------------+
    | col_string | 
    +------------+
    | M*************n | 
    +------------+

Hashing

SHA-256 hashing

MASKED_SHA256(<salt>)

Uses the SHA-256 hash algorithm to mask data.

  • salt: a salt value that is an integer in the range from 0 to 9.

  • Supported data types: STRING, VARCHAR, CHAR, and BINARY.

  • Examples

    -- Data of the STRING type before masking
    +------------+
    | col_string | 
    +------------+
    | 4562-1234-5678-9123 | 
    +------------+
    -- Masking policy
    MASKED_SHA256(0)
    -- Data after masking
    +------------+
    | col_string | 
    +------------+
    | zwGMB1aCF1t705EfcwdDorql4MZb46XBqQJw/2RVx8U= | 
    +------------+

SHA-512 hashing

MASKED_SHA512(<salt>)

Uses the SHA-512 hash algorithm to mask data.

  • salt: a salt value that is an integer in the range from 0 to 9.

  • Supported data types: STRING, VARCHAR, CHAR, and BINARY.

  • Examples

    -- Data of the STRING type before masking
    +------------+
    | col_string | 
    +------------+
    | 4562-1234-5678-9123 | 
    +------------+
    -- Masking policy
    MASKED_SHA512(0)
    -- Data after masking
    +------------+
    | col_string | 
    +------------+
    | 3PPywfEIp08WuTUI8FZCCfdVuRu68wZTVwWWVAf4pboACUnH6w9kFMLpl2AARaGW/mvWvg26p0EIqmE0fAEiuA== | 
    +------------+

MD5 hashing

MASKED_MD5(<salt>)

Uses the MD5 hash algorithm to mask data.

  • salt: a salt value that is an integer in the range from 0 to 9.

  • Supported data types: STRING, VARCHAR, CHAR, and BINARY.

  • Examples

    -- Data of the STRING type before masking
    +------------+
    | col_string | 
    +------------+
    | 4562-1234-5678-9123 | 
    +------------+
    -- Masking policy
    MASKED_MD5(0)
    -- Data after masking
    +------------+
    | col_string | 
    +------------+
    | mK/o08tew5g7S3XV/BkFfw== | 
    +------------+

SM3 hashing

MASKED_SM3(<salt>)

Uses the SM3 hash algorithm to mask data.

  • salt: a salt value that is an integer in the range from 0 to 9.

  • Supported data types: STRING, VARCHAR, CHAR, and BINARY.

  • Examples

    -- Data of the STRING type before masking
    +------------+
    | col_string | 
    +------------+
    | 4562-1234-5678-9123 | 
    +------------+
    -- Masking policy
    MASKED_SM3(0)
    -- Data after masking
    +------------+
    | col_string | 
    +------------+
    | Q2TfwUh4B8QQH8jPL6DfdoGysx/CXBxn2T14dDwQtQw= | 
    +------------+

Character replacement

Random character replacement

MASKED_REPLACE_RANDOM(<position>)

Replaces data with random characters. The length of the string remains unchanged after replacement. Random characters include digits and letters.

  • position: the position for character replacement. The position value is an integer.

    • position = 0: All data is replaced by random characters.

    • position > 0: The first N characters of data are replaced by random characters. N is specified by position and is a positive value.

    • position < 0: The last N characters of data are replaced by random characters. N is specified by position and is a negative value.

  • Supported data types: STRING, VARCHAR, CHAR, and BINARY.

  • Examples

    -- Data of the STRING type before masking
    +------------+
    | col_string |
    +------------+
    | Michael Johnson |
    +------------+
    -- Masking policy for replacing the first seven characters of a string with random characters.
    MASKED_REPLACE_RANDOM(7)
    -- Data after masking
    +------------+
    | col_string | 
    +------------+
    | 4DlJQxi Johnson | 
    +------------+

Random character replacement of the first and last N characters

MASKED_REPLACE_RANDOM_BA(<before>, <after>)

Replaces the first N characters and last M characters of data with random characters. The length of the string remains unchanged after replacement. Random characters include digits and letters.

  • before: The first N characters are replaced by random characters. N is specified by the before parameter. The value of before is an integer greater than or equal to 0. If the value of before is 0, the start part of the string is not replaced.

  • after: The last M characters are replaced by random characters. M is specified by the after parameter. The value of after is an integer greater than or equal to 0.

  • Supported data types: STRING, VARCHAR, CHAR, and BINARY.

  • Examples

    -- Data of the STRING type before masking
    +------------+
    | col_string |
    +------------+
    | Michael Johnson |
    +------------+
    -- Masking policy for replacing the first four characters and the last four characters of a string with random characters.
    MASKED_REPLACE_RANDOM_BA(4,4)
    -- Data after masking
    +------------+
    | col_string | 
    +------------+
    | r0xEael JohnWNr | 
    +------------+

Fixed character replacement

MASKED_REPLACE_FIXED(<position>, <fixed_string>)

  • Replaces data with a fixed string.

  • position: the position for character replacement. The position value is an integer.

    • position = 0: All data is replaced by a string that is specified by fixed_string.

    • position > 0: The first N characters of data are replaced by a string that is specified by fixed_string. N is specified by position and is a positive value.

    • position < 0: The last N characters of data are replaced by a string that is specified by fixed_string. N is specified by position and is a negative value.

  • fixed_string: a fixed string. The length of the fixed string is less than or equal to 100. The fixed string cannot contain spaces.

  • Supported data types: STRING, VARCHAR, CHAR, and BINARY.

  • Examples

    -- Data of the STRING type before masking
    +------------+
    | col_string |
    +------------+
    | Michael Johnson |
    +------------+
    -- Masking policy for replacing the first seven characters of a string with the fixed string Oli.
    MASKED_REPLACE_FIXED(7, "Oli")
    -- Data after masking
    +------------+
    | col_string | 
    +------------+
    | Oli Johnson | 
    +------------+

Examples

Mask sensitive personal information

In this example, masking policies are configured to mask sensitive personal information.

  1. Prepare data.

    Create a sensitive personal information table and insert sensitive data into the table.

    -- Create a sensitive personal information table.
    CREATE TABLE if NOT EXISTS personal_info (
      id bigint COMMENT 'Unique user ID',
      name string COMMENT 'Username',
      age int COMMENT 'User age',
      gender string COMMENT 'User gender',
      height float COMMENT 'User height',
      birthday date COMMENT 'User birthday',
      phone_number string COMMENT 'User phone number',
      email string COMMENT 'User email address',
      address string COMMENT 'User address',
      salary decimal(18, 2) COMMENT 'User salary',
      create_time timestamp COMMENT 'User information creation time',
      update_time timestamp COMMENT 'User information update time',
      is_deleted boolean COMMENT 'Flag bit to specify whether user information is deleted'
    );
    -- Insert sensitive data into the table.
    INSERT INTO personal_info VALUES
        (1, 'Zhang San', 18, 'Male', 178.56, '1990-01-01', '13800000000', 'zhangsan@example.com', 'Haidian District of Beijing', 5000.00, '2023-04-19 11:32:00', '2023-04-19 11:32:00', false),
        (2, 'Li Si', 20, 'Female', 162.70, '1992-02-02', '13900000000', 'lisi@example.com', 'Pudong New Area of Shanghai', 6000.00, '2023-04-19 11:32:00', '2023-04-19 11:32:00',false),
        (3, 'Wang Wu', 22, 'Male', 185.21, '1994-03-03', '14000000000', 'wangwu@example.com', 'Nanshan District of Shenzhen', 7000.00, '2023-04-19 11:32:00', '2023-04-19 11:32:00', false);
  2. Configure masking policies.

    • Retain the first character of the username and replace other characters with asterisks (*).

      CREATE data masking policy IF NOT EXISTS masking_name
      TO USER (RAM$xxx@test.aliyunid.com:xxx)
      USING MASKED_STRING_UNMASKED_BA(1, 0);
      
      apply data masking policy masking_name bind TO
      TABLE personal_info COLUMN name;
    • Round up or down user heights.

      CREATE data masking policy IF NOT EXISTS masking_height
      TO USER (RAM$xxx@test.aliyunid.com:xxx)
      USING MASKED_POINT_RESERVE(0);
      
      apply data masking policy masking_height bind TO
      TABLE personal_info COLUMN height;
    • Retain only the year part of user birthdays.

      CREATE data masking policy IF NOT EXISTS masking_birthday
      TO USER (RAM$xxx@test.aliyunid.com:xxx)
      USING MASKED_DATE_YEAR;
      
      apply data masking policy masking_birthday bind TO
      TABLE personal_info COLUMN birthday;
    • Use the SM3 algorithm to hash phone numbers of users by default.

      CREATE DATA MASKING POLICY default_sm3
      TO DEFAULT
      USING MASKED_SM3(0);
      
      apply data masking policy default_sm3 bind TO
      TABLE personal_info COLUMN phone_number;
  3. Use an account to which the masking policies are bound to query the masked data.

    SELECT id, name, height, birthday, phone_number FROM personal_info;
    
    -- Data before masking
    +------------+------+--------+----------+--------------+
    | id         | name | height | birthday | phone_number |
    +------------+------+--------+----------+--------------+
    | 1          | Zhang San | 178.56 | 1990-01-01 | 13800000000  |
    | 2          | Li Si | 162.7  | 1992-02-02 | 13900000000  |
    | 3          | Wang Wu | 185.21 | 1994-03-03 | 14000000000  |
    +------------+------+--------+----------+--------------+
    
    -- Data after masking
    +------------+------------+------------+------------+--------------+
    | id         | name       | height     | birthday   | phone_number | 
    +------------+------------+------------+------------+--------------+
    | 1          | Zhang*        | 179.0      | 1990-01-01 | lvYJaH4ElL2ilpQx/8tfMUw7xP22yblIgmfWp0/msUQ= | 
    | 2          | Li*        | 163.0      | 1992-01-01 | 9fFWacNSwCRZLAjMHqunlfwkqhTbP2ubuDOeOSh4N1c= | 
    | 3          | Wang*        | 185.0      | 1994-01-01 | k/0JoQCSarJg9ATJ5tyVnhQf1jIBxHXRbB+cvUm4OmE= | 
    +------------+------------+------------+------------+--------------+

Mask data for all users and roles by default

When multiple masking policies are hit by users or roles at the same time, the masking policy with a higher priority takes effect.

Use the MASKED_SHA256(5) masking policy for users by default.

CREATE DATA MASKING POLICY  default_hash_policy
TO DEFAULT
USING MASKED_SHA256(5);

Use the UNMASKED policy for Users A and B.

CREATE DATA MASKING POLICY  ab_unmask_policy
TO USER (A, B)
USING UNMASKED;

Result: Users A and B can access plaintext data. Other users can access only the data that is masked by using the SHA-256 hash algorithm.

Note

Users A and B hit both the MASKED_SHA256(5) and UNMASKED policies. The UNMASKED policy takes effect for the users because the policy has a higher priority. For more information, see Priorities of predefined masking policies. Other users hit the MASKED_SHA256(5) policy.

Appendixes

Priorities of predefined masking policies

When users access sensitive data, if multiple data masking policies are hit at the same time, the system masks sensitive data based on the data masking policies with higher priorities.

For example, when User A accesses the col_string column, two masking policies, MASKED_REPLACE_RANDOM(3) and MASKED_SM3, are matched. The priority level of MASKED_REPLACE_RANDOM(3) is 3, and the priority level of MASKED_SM3 is 4. The system performs data masking based on the higher-priority MASKED_REPLACE_RANDOM(3) policy. User A can then view the masking result after random character replacement.

Priority level

Predefined masking policy

0 (highest)

UNMASKED

1

MASKED_POINT_RESERVE(num)

2

MASKED_DATE_YEAR

3

MASKED_STRING_MASKED_BA(before, after)

MASKED_STRING_UNMASKED_BA(before, after)

MASKED_REPLACE_RANDOM(position)

MASKED_REPLACE_RANDOM_BA(before, after)

MASKED_REPLACE_FIXED(position)

4

MASKED_SHA256

MASKED_SHA512

MASKED_MD5

MASKED_SM3

5

MASKED_DV

6 (lowest)

MASKED_NULLIFY

Default values of data types for the MASKED_DV policy

Data type

Default value

bigint

0

double

0.0

decimal

0

string

""

datetime

DATETIME'1970-01-01 00:00:00' (UTC)

boolean

false

tinyint

0

smallint

0

int

0

binary

''

float

0.0

double

0.0

decimal

0

varchar(n)

""

char(n)

" " (fill n spaces)

date

DATE'1970-01-01'

timestamp

TIMESTAMP'1970-01-01 00:00:00' (UTC)

timestamp_ntz

TIMESTAMP'1970-01-01 00:00:00' (UTC)

array

{Default value of a subtype}

map

{key:value} (key and value separately correspond to the default values of a subtype.)

json

""

struct

(Default value of a subtype)