All Products
Search
Document Center

Data Security Center:Configure and implement data masking

Last Updated:Dec 27, 2024

Data Security Center (DSC) supports the static data masking and dynamic data masking methods. If you use the static data masking method, you must create a data masking task that specifies the data assets to be masked, masking rules for matching sensitive fields, and masking algorithms, such as redaction, encryption, or substitution, to process the specific fields. You must also specify a storage destination for the masked data. If you use the dynamic data masking method, you can call the ExecDatamask operation to mask specific fields in the JSON format based on specific masking rules.

Data masking methods

Data masking method

Data source

Scenario

Operation

Static data masking

  • ApsaraDB RDS tables, PolarDB-X tables, MaxCompute tables, PolarDB tables, ApsaraDB for OceanBase tables, AnalyticDB for MySQL tables, and self-managed database tables on an Elastic Compute Service (ECS) instance.

  • Structured TXT, CSV, XLSX, and XLS files in Object Storage Service (OSS) buckets.

  • Structured TXT, CSV, XLSX, and XLS files saved on your local computer.

You want to share specific data sources with other users but you do not want to disclose sensitive fields.

If you use the static data masking method to mask a specific data table or file, you can store the masked data in another data table or file for data sharing. This way, the raw data is not affected.

In the DSC console, create a masking task and configure the data to be masked, masking rules, destination in which the masked data is stored, and task execution cycle.

Dynamic data masking

You can construct data sources in the following JSON format. dataHeaderList specifies the column names of the data to be masked. dataList specifies the data to be masked. The values specified in dataList must correspond to the column names specified in dataHeaderList. ruleList specifies the data masking rules. For more information, see ExecDatamask.

{
    "dataHeaderList": ["name", "age"],
    "dataList": [
        ["lily", 18],
        ["lucy", 17]
    ],
    "ruleList": [1002, null]
}

The dynamic data masking method is more flexible and allows you to construct the data sources to be masked.

You can use API online debugging, Alibaba Cloud SDKs, or custom encapsulated APIs to call the ExecDatamask operation to dynamically mask data. For more information, see ExecDatamask

Examples of data masking results

DSC supports the following masking algorithms: hashing, redaction, substitution, rounding, encryption, and shuffling. The following tables provide examples of data masking results based on different masking algorithms.

Hashing

Applicable sensitive data and scenario

Algorithm description

Algorithm configuration

Raw data

Masked data

The data masking process is irreversible.

You can use common hash algorithms and specify a salt value.

This type of algorithm is applicable to password protection or scenarios in which you must check whether data is sensitive by comparison.

  • Sensitive data: keys

  • Scenarios: data storage

MD5

Set the salt value to test.

123456

d6f82c64df3dc34921d79e5f22e5d43a

SHA-1

59056c7c6faa5eeb7151d30a01c17b25f35b021c

SHA-256

84ca63076a5966e9b726490c8b6a5c9c6d6bdc018bb0a05df754c0c2770aca72

HMAC

ed029027322fedb0ac40b7759ac1521f0121cb018cf0f6f078e61764d810e00f

Redaction

Applicable sensitive data and scenario

Algorithm description

Algorithm configuration

Raw data

Masked data

The data masking process is irreversible.

This type of algorithm redacts specified text in sensitive data with asterisks (*) or number signs (#).

This type of algorithm is applicable to scenarios in which sensitive data is to be shown on a user interface or shared with others.

  • Sensitive data: sensitive personal information

  • Scenarios:

    • Data utilization

    • Data sharing

Keeps the first n characters and the last m characters.

Use asterisks (*) for masking and set n to 1 and m to 1.

123456

1****6

Keeps characters from the Xth position to the Yth position.

Use asterisks (*) for masking and set x to 3 and y to 4.

**34**

Redacts the first n characters and the last m characters.

Use asterisks (*) for masking and set n to 2 and m to 2.

**34**

Redacts characters from the Xth position to the Yth position.

Use asterisks (*) for masking and set x to 2 and y to 5.

1****6

Redacts characters that precede a special character, such as an at (@) sign, a period (.), and an ampersand (&) sign, when the special character appears for the first time.

&

1@34&6

****&6

Redacts characters that follow a special character such as an at (@) sign, a period (.), and an ampersand (&) sign, when the special character appears for the first time.

@

1@****

Substitution

Data is masked based on an algorithm that depends on the configuration of a mapping table. Each masking result is different. For example, if the mobile phone number "13900001234" is replaced with a random value, the masked result may be "13271561461", "18355370496", or "18856540773".

Applicable sensitive data and scenario

Algorithm description

Algorithm configuration

The data masking process is reversible when a specific algorithm is used.

This type of algorithm substitutes the entire value or part of the value of a field with a mapped value by using a mapping table. In this case, raw data can be retrieved after it is masked. This type of algorithm also substitutes the entire value or part of the value of a field randomly based on a random interval. In this case, raw data cannot be retrieved after it is masked.

DSC provides multiple built-in mapping tables and allows you to create custom substitution algorithms.

This type of algorithm can be used to mask fields in fixed formats, such as ID card numbers.

  • Sensitive data:

    • Sensitive personal information

    • Sensitive enterprise information

    • Sensitive device information

  • Scenarios:

    • Data storage

    • Data sharing

Substitutes specific content in ID card numbers with mapped values.

Mapping table for substituting the IDs of administrative regions

Randomly substitutes specific content in ID card numbers.

Mapping table for substituting the IDs of administrative regions

Randomly substitutes specific content in the IDs of military officer cards.

Mapping table for substituting the IDs of administrative regions

Randomly substitutes specific content in passport numbers.

Code table for randomly substituting purpose fields

Randomly substitutes specific content in permit numbers of Exit-Entry Permits for Travelling to and from Hong Kong and Macao.

Code table for randomly substituting purpose fields

Randomly substitutes specific content in bank card numbers.

Code table for randomly substituting Bank Identification Numbers (BINs)

Randomly substitutes specific content in landline telephone numbers.

Mapping table for substituting the IDs of administrative regions

Randomly substitutes specific content in mobile numbers.

Code table for randomly substituting mobile network codes

Randomly substitutes specific content in unified social credit codes.

Code table for randomly substituting the IDs of registration authorities, code table for randomly substituting type codes, and code table for randomly substituting the IDs of administrative regions

Substitutes specific content in the universal reserved format based on a mapping table.

Mapping table for substituting uppercase letters, mapping table for substituting lowercase letters, mapping table for substituting digits, and mapping table for substituting special characters

Randomly substitutes specific content in the universal reserved format.

Code table for randomly substituting uppercase letters, code table for randomly substituting lowercase letters, code table for randomly substituting digits, and code table for randomly substituting special characters

Rounding

Applicable sensitive data and scenario

Algorithm description

Algorithm configuration

Raw data

Masked data

The data masking process is reversible when a specific algorithm is used.

DSC provides two types of rounding algorithms. One algorithm rounds numbers and dates, and raw data cannot be retrieved after it is masked. The other algorithm bit-shifts text, and raw data can be retrieved after it is masked.

This type of algorithm can be used to analyze and collect statistics on sensitive datasets.

  • Sensitive data: general sensitive information

  • Scenarios:

    • Data storage

    • Data utilization

Number rounding:

Numbers are rounded to the Nth digit before the decimal point. Valid values of N: 1 to 19.

N=4

12345.6789

12000

Date rounding: Dates are rounded to the year, month, day, hour, or minute.

Hours

2023-04-15 14:30:45

2023-04-15 14:00:00

Character offset: number of places by which specific bits are moved to the left or right.

Three bits to the left

test

ttes

Encryption

Applicable sensitive data and scenario

Algorithm description

Algorithm configuration

Raw data

Masked data

The data masking process is reversible.

Common symmetrical encryption algorithms are supported.

This type of algorithm can be used to encrypt sensitive fields that need to be retrieved after encryption.

  • Sensitive data:

    • Sensitive personal information

    • Sensitive enterprise information

  • Scenarios: data storage

Data Encryption Standard (DES) algorithm

Encryption key: 121212

123456

c2TwheTI+rw=

Triple Data Encryption Standard (3DES) algorithm

Encryption keys: 123, 1232131, and 123123

XUwzslGadsk=

Advanced Encryption Standard (AES) algorithm

Encryption key: 123131

YueDcm92UuqvKpVbeS+0Ng==

Shuffling

Applicable sensitive data and scenario

Algorithm description

Algorithm configuration

The data masking process is irreversible.

This type of algorithm extracts values of a field in a specified range from the source table and rearranges the values in a specific column. Alternatively, this type of algorithm randomly selects values from a specific column within the value range and rearranges the selected values. This way, the values are mixed up and masked.

This type of algorithm can be used to mask structured data columns.

  • Sensitive data:

    • Sensitive device information

    • Sensitive location information

  • Scenarios: data storage

Randomly shuffles data.

  • Rearrangement

  • Random selection

For example, the city information about a group of devices is randomly rearranged.

Raw data

Masked data

Device ID

City

Device ID

City

D001

China (Shanghai)

D001

China (Xi'an)

D002

China (Hangzhou)

D002

China (Shanghai)

D003

China (Xi'an)

D003

China (Chengdu)

D004

China (Chengdu)

D004

China (Hangzhou)

Billing overview

Only DSC Enterprise Edition supports the data masking feature. After you purchase DSC Enterprise Edition, you can use the data masking feature. DSC uses the subscription billing method. For more information, see Billing overview. If you use the static data masking method, you may be charged additional fees.

Data masking method

Data source

DSC-side billing

Additional fee

Static data masking

  • ApsaraDB RDS tables, PolarDB-X tables, MaxCompute tables, PolarDB tables, ApsaraDB for OceanBase tables, AnalyticDB for MySQL tables, and self-managed database tables on an ECS instance.

  • Structured TXT, CSV, XLSX, and XLS files in OSS buckets.

The data assets to be masked must grant access permissions to DSC. The number of purchased database protection instances and the storage protection capacity are deducted based on the authorized data assets.

If the cloud service whose data to be masked uses the pay-as-you-go billing method, you are charged based on the amount of the data that you read or write on the cloud service side.

Structured TXT, CSV, XLSX, and XLS files saved on your local computer.

Instance resources are not deducted.

You are not charged additional fees.

Dynamic data masking

Self-constructed data

Instance resources are not deducted.

You are not charged additional fees.

Activate DSC

  • If you do not activate DSC or only activate the free edition of DSC, you can purchase DSC Enterprise Edition. For more information, see Purchase DSC.

  • If you activate the free edition of DSC and want to use the data masking feature, you must upgrade DSC to Enterprise Edition. For more information, see Specification change of subscription DSC.

Important

If you use the static data masking method, you must authorize DSC to access your data assets. Make sure that you have purchased sufficient database protection instances and OSS protection capacity.

Static data masking

Feature description

When you create a static data masking task, you can select a configured masking template. You can also configure masking algorithms for sensitive fields to be masked. For more information about how to configure a data masking template, see Configure data masking templates and algorithms.

image

Prerequisites

If you use the static data masking method to mask a database or an OSS object, you must grant DSC access permissions on the data assets to be masked. For more information, see the following topics:

Important

DSC is authorized to access the required data asset and connected to the data asset by using an account that has read and write permissions on the data asset. This prerequisite applies when you want to store masked data in the tables of ApsaraDB RDS, PolarDB for Xscale (PolarDB-X), MaxCompute, PolarDB, ApsaraDB for OceanBase, AnalyticDB for MySQL, or self-managed databases or want to store masked data in Object Storage Service (OSS) buckets. If you want to store masked data in a database of ApsaraDB RDS, PolarDB for Xscale (PolarDB-X), PolarDB, ApsaraDB for OceanBase, or AnalyticDB for MySQL, you must use the account-based connection mode.

Create a data masking task

Warning

If you enable data masking in the production environment, your database performance may be compromised.

You can create a data masking task to specify the scope and rules for data masking.

  1. Log on to the DSC console.

  2. In the left-side navigation pane, choose Risk Governance > Data Desensitization.

  3. On the Static Desensitization tab, click the Task Configurations tab. Then, click the Add Desensitization Task tab.

  4. Follow the on-screen instructions to configure the parameters for a data masking task.

    1. Configure the parameters in the Basic Task Information step of the Add Desensitization Task wizard and click Next.

      Note

      You can specify a custom task name.

    2. Configure the parameters in the Desensitization Source Configuration step of the Add Desensitization Task wizard and click Next.

      ApsaraDB RDS tables, PolarDB-X tables, MaxCompute tables, PolarDB tables, ApsaraDB for OceanBase tables, AnalyticDB for MySQL tables, and self-managed database tables

      Parameter

      Required

      Description

      Types of data storage

      Yes

      The storage type of the source file for data masking. Set the parameter to ApsaraDB RDS Tables/PolarDB-X Tables/MaxCompute Tables/PolarDB Tables/ApsaraDB for OceanBase Tables/AnalyticDB for MySQL Tables/Self-managed Database Tables.

      Source Service

      Yes

      The service that provides the source table for data masking. Valid values: RDS, PolarDB-X, OceanBase, MaxCompute, ADB-MYSQL, PolarDB, and ECS self-built database.

      Source Database/Project

      Yes

      The database or project that stores the source table.

      SOURCE table name

      Yes

      The name of the source table.

      Source Partition

      No

      If you set the Source Service parameter to MaxCompute, you can configure the Source Partition parameter.

      The name of the partition that stores the data to mask in the source table. If you leave this parameter empty, DSC masks the sensitive data in all partitions of the source table.

      You can specify partitions when you create a MaxCompute table. Partitions define different logical divisions of a table. When you query data, you can specify partitions to improve query efficiency. For more information, see Partition.

      Sample SQL

      No

      If you set the Source Service parameter to RDS, PolarDB-X, OceanBase, or ECS self-built database, you can configure the Sample SQL parameter.

      The SQL statement that specifies the scope of the sensitive data to mask. If you leave this parameter empty, DSC masks all data in the source table.

      OSS objects

      Important

      Only TXT, CSV, XLSX, and XLS objects can be masked.

      Parameter

      Required

      Description

      Types of data storage

      Yes

      The storage type of the source file for data masking. Set the parameter to OSS files.

      File source

      Yes

      The source of the OSS object for data masking. Valid values: Uploaded Local File and OSS Bucket.

      Upload files

      Yes

      If you set the File source parameter to Uploaded Local File, click Select a local file to upload the OSS object.

      OSS Bucket where the source file is located

      Yes

      If you set the File source parameter to OSS Bucket, select the OSS bucket to which the OSS object belongs from the drop-down list. You can enter a keyword to search for and select the OSS bucket from the drop-down list.

      Source file names

      Yes

      If you set the File source parameter to OSS Bucket, specify the name of the source OSS object. The name must contain a file extension.

      • If you want to mask data in an OSS object, specify the name of the OSS object. Example: test.csv.

      • If you want to mask data in multiple OSS objects, turn on Open the pass. The system uses the same masking rule to mask multiple OSS objects. The objects must have the same format and identical column structure.

        After you turn on Open the pass, you can use asterisks (*) as wildcards to specify multiple OSS objects and mask data in the objects at a time. You can use asterisks (*) only in the prefix of an object name. Example: test*.xls, which matches XLS files whose name starts with test.

      Source file description

      No

      If you set the File source parameter to Uploaded Local File, enter a description for the source OSS object.

      Separator selection

      No

      The column delimiter. Select a delimiter based on the delimiter of the source OSS object. This parameter is required for CSV and TXT objects. Valid values:

      • Semicolon ";" (MacOS/Linux default)

      • Comma "," (Windows default)

      • Operator '|'.

      Table contains header rows

      No

      Specifies whether the source OSS object contains header rows.

    3. Configure the parameters in the Desensitization algorithm step and click Next.

      image

      • Select a data masking template from the drop-down list. In the source field list, the switch in the Desensitization column is automatically turned on and masking algorithms are automatically configured for fields based on the data masking template.

        The settings in the rule list of the data masking template must match the source fields that you want to mask. Otherwise, the data masking template does not take effect. For more information about how to configure a data masking template, see Configure data masking templates and algorithms.

      • Find the source field whose data you want to mask in the list, turn on the switch in the Desensitization column, and then select a masking algorithm based on your business requirements.

      You can click View and Modify Parameters in the Select Algorithm column to view and modify the rule of the selected algorithm. For more information about partition formats in algorithm rules, see Partition formats.

      Note

      If you turn on Forcefully Enable Template, you cannot change the algorithm on the page. To change the algorithm, you must modify the relevant template rule.

      Partition formats

      Partition type

      Partition format

      Example

      N weeks later

      Custom partition key column=$[yyyymmdd+7*N]

      time=$[20190710+7*1]. This partition indicates that DSC masks the data that is generated within one week after July 10, 2019.

      N weeks before

      Custom partition key column=$[yyyymmdd-7*N]

      time=$[20190710-7*3]. This partition indicates that DSC masks the data that is generated within three weeks before July 10, 2019.

      N days later

      Custom partition key column=$[yyyymmdd+N]

      time=$[20190710+2]. This partition indicates that DSC masks the data that is generated within two days after July 10, 2019.

      N days before

      Custom partition key column=$[yyyymmdd-N]

      time=$[20190710-5]. This partition indicates that DSC masks the data that is generated within five days before July 10, 2019.

      N hours later

      Custom partition key column=$[hh24mi:ss+N/24]

      time=$[0924mi:ss+2/24]. This partition indicates that DSC masks the data that is generated within 2 hours after 09:00:00 based on a 24-hour clock.

      N hours before

      Custom partition key column=$[hh24mi:ss-N/24]

      time=$[0924mi:ss-1/24]. This partition indicates that DSC masks the data that is generated within 1 hour before 09:00:00 based on a 24-hour clock.

      N minutes later

      Custom partition key column=$[hh24mi:ss+N/24/60]

      time=$[0924mi:ss+2/24/60]. This partition indicates that DSC masks the data that is generated within 2 minutes after 09:00:00 based on a 24-hour clock.

      N minutes before

      Custom partition key column=$[hh24mi:ss-N/24/60]

      time=$[0924mi:ss-2/24/60]. This partition indicates that DSC masks the data that is generated within 2 minutes before 09:00:00 based on a 24-hour clock.

  5. Configure the destination where you want to store masked data and click Test below Write Permission Test. After the test is passed, click Next.

    Important

    The account that is used to connect to the data asset must have the write permissions on the data asset.

    image

  6. Configure the processing logic.

    Parameter

    Required

    Description

    How the task is triggered

    Yes

    The method that is used to run the data masking task. Valid values:

    • Manual Only: If you select this option, you must manually run the data masking task.

    • Scheduled Only: If you select this option, you must configure automatic running of the data masking task. After the configuration is complete, the data masking task is automatically run at a specific point in time on an hourly, daily, weekly, or monthly basis.

    • Manual + Scheduled: If you select this option, you can click Start in the Actions column on the Task Configurations tab to manually run the data masking task. You can also configure automatic running of the data masking task. After the configuration is complete, the data masking task is automatically run at a specific point in time on an hourly, daily, weekly, or monthly basis.

    Turn on incremental desensitization

    No

    Specifies whether to enable incremental masking. If you turn on this switch, DSC masks only the data that is added after the previous data masking task is complete. You must specify a field whose value is increased over time as the incremental identifier. For example, you can specify the creation time field or the auto-increment ID field as the incremental identifier.

    Important

    DSC supports incremental data masking only for ApsaraDB RDS databases.

    Shard field

    No

    The shard field based on which DSC divides the source data into multiple shards. DSC concurrently masks the source data in the shards to improve the efficiency of data masking. You can specify one or more shard fields based on your business requirements.

    • DSC supports incremental data masking only for ApsaraDB RDS databases. We recommend that you use a primary key or a field on which a unique index is created as the shard field.

    • If you leave this parameter empty, a primary key is used as the shard field. DSC divides the source data based on the primary key and masks the source data.

      Important

      If the source data does not have a primary key, you must specify a shard field. Otherwise, the data masking task fails.

    • If you specify excessive shard fields, query performance and data accuracy may deteriorate. Proceed with caution.

    Table name conflict resolution

    Yes

    The method that is used to handle a table name conflict. Valid value:

    • Delete the target table and create a new table with the same name.

    • Attach data to the target table. We recommend that you select this option.

    Row Conflict Resolution

    Yes

    The method that is used to handle row conflicts. Valid value:

    • Keep conflicting rows in the target table and discard the new data. We recommend that you select this option.

    • Delete conflicting rows in the target table and insert the new data.

  7. Click Submit.

Run and view a data masking task

If you set the How the task is triggered parameter to Manual Only, you must manually run the data masking task. If you set the How the task is triggered parameter to Scheduled Only, you must configure automatic running of the data masking task. After the configuration is complete, the data masking task is automatically run at a specific point in time. If you set the How the task is triggered parameter to Manual + Scheduled, you can manually run the data masking task or configure settings to automatically run the data masking task.

  1. On the Static Desensitization tab, click the Task Configurations tab. Then, find the created data masking task and click Start in the Actions column to run the data masking task.

    image

  2. On the Static Desensitization tab, click the Status tab to view the progress and status of the data masking task.

    image

Troubleshoot the errors that occur in a data masking task

If a data masking task fails, you can refer to the following table to troubleshoot the errors.

Error

Cause

The data masking task does not exist. The task may be deleted or closed.

The switch in the Actions column of the data masking task is turned off.

The scheduling settings of the scheduled data masking task are invalid.

The value of Triggered Daily is invalid.

The source instance does not exist.

The instance to which the source table belongs does not exist.

The destination instance does not exist.

The destination instance may be deleted or the destination instance-related permissions may be revoked.

The source table cannot be found.

The source table may be deleted or the source instance-related permissions may be revoked.

The parameters that are configured for the masking algorithm are invalid.

The parameters that are configured for the masking algorithm are invalid.

The partition key column in the source table is empty.

The partition key column in the source table is empty.

The operation that writes data to the destination table failed.

DSC failed to write data to the destination table due to invalid destination settings.

The operation that queries data from the source table failed.

DSC failed to query data from the source table.

The operation that creates the destination table failed.

The destination table does not exist in the destination.

No primary key can be found.

No primary key exists in the ApsaraDB RDS source table.

The MaxCompute table-related partition that is configured for the data masking task is invalid.

Source Partition that is configured in the Desensitization Source Configuration step or the Target Partition that is configured in the Destination Location Configuration step of the Add Desensitization Task wizard is invalid.

Modify or delete a data masking task

You cannot modify or delete a data masking task that is pending execution or is running.

  • Modify a data masking task

    If you want to modify the settings of a data masking task, find the task and click Modify in the Actions column.

  • Delete a data masking task

    Important

    After a data masking task is deleted, it cannot be restored. Proceed with caution.

    If you no longer use a data masking task, you can delete the task. Find the task and click Delete in the Actions column. In the message that appears, click OK.

Dynamic data masking

Feature description

Dynamic data masking tasks rely on configured masking templates to mask the specified data. You can call the ExecDatamask operation to specify the data (Data) that you want to mask and the ID of the masking template (TemplateId ). Then, the data specified in the dataList field of the Data parameter is masked based on the matching mode (field and sensitive type) defined in the masking template.

You can obtain the template ID on the Masking Configurations tab of the Data Governance > Data Desensitization page in the DSC console. You can also create a custom data masking template. For more information, see Configure data masking templates and algorithms.

The following table describes the masking methods in the masking template that is used when you call the ExecDatamask operation to mask data.

Matching mode

Description

Field

Match the field names specified in the dataHeaderList field with the data field names and masking algorithms configured in the Rule list section in the masking template to mask the data in the dataList field.

Sensitive type

Match the rule IDs specified in the ruleList field with the rule IDs in the Rule list section in the masking template to mask the data in the dataList field based on the data field names and masking algorithms in the rules.

If you set the Matching mode parameter to Sensitive type, you can select the fields in the Data Feature column on the Identification Features tab in the Rule list section. You can select built-in or custom fields. Feature names are used as rule names in the Rule list section. You can call the DescribeRules operation and configure the CustomType and Name parameters to query the ID (Id) of a data feature. CustomType specifies the data feature source: built-in or custom. Name specifies the data feature name.

Example

image

Limits

Before you can call the ExecDatamask operation to dynamically mask sensitive data, make sure that the size of the sensitive data that you want to mask is less than 2 MB.

View the call records of the ExecDatamask operation

  1. Log on to the DSC console.

  2. In the left-side navigation pane, choose Risk Governance > Data Desensitization.

  3. On the Data Desensitization page, click the Dynamic desensitization tab.

  4. On the Dynamic desensitization tab, view the call records of the ExecDatamask operation.

    Note

    If you use the same account and IP address to call the ExecDatamask operation multiple times, only one record is retained. The cumulative number of calls is recorded.

Example on how to share masked data based on the static data masking method

DSC provides a static data masking method that can be used to mask sensitive data in structured CSV files in an OSS bucket. You can store the masked data in a specific OSS bucket within the same account for secure data sharing. For more information, see Mask sensitive data in OSS table files.