All Products
Search
Document Center

Data Security Center:Configure and implement data masking

Last Updated:Nov 05, 2024

Data Security Center (DSC) supports static data masking and dynamic data masking, which help you mask sensitive data in databases. This topic describes how to implement static data masking and dynamic data masking.

Data masking methods

Static data masking

You can create a data masking task to redact, encrypt, or replace sensitive data by using masking algorithms and save the result data to a destination that you specify.

image

Dynamic data masking

Compared with static data masking, dynamic data masking is more flexible. You can call the ExecDatamask operation to mask specific data. The size of data that you can mask in a call must be less than 2 MB.

image

Billing description

If the cloud service whose data you want to mask uses the pay-as-you-go billing method, you are charged based on the amount of the data that you read or write on the cloud service side.

Prerequisites

Static data masking

Create a data masking task

Warning

If you enable data masking in the production environment, your database performance may be compromised.

You can create a data masking task to specify the scope and rules for data masking.

  1. Log on to the DSC console.

  2. In the left-side navigation pane, choose Data Governance > Data Desensitization.

  3. On the Static Desensitization tab, click the Task Configurations tab. Then, click the Add Desensitization Task tab.

  4. Follow the on-screen instructions to configure the parameters for a data masking task.

    1. Configure the parameters in the Basic Task Information step of the Add Desensitization Task wizard and click Next.

      Note

      You can specify a custom task name.

    2. Configure the parameters in the Desensitization Source Configuration step of the Add Desensitization Task wizard and click Next.

      ApsaraDB RDS tables, PolarDB-X tables, MaxCompute tables, PolarDB tables, ApsaraDB for OceanBase tables, AnalyticDB for MySQL tables, and self-managed database tables

      Parameter

      Required

      Description

      Types of data storage

      Yes

      The storage type of the source file for data masking. Set the parameter to ApsaraDB RDS Tables/PolarDB-X Tables/MaxCompute Tables/PolarDB Tables/ApsaraDB for OceanBase Tables/AnalyticDB for MySQL Tables/Self-managed Database Tables.

      Source Service

      Yes

      The service that provides the source table for data masking. Valid values: RDS, PolarDB-X, OceanBase, MaxCompute, ADB-MYSQL, PolarDB, and ECS self-built database.

      Source Database/Project

      Yes

      The database or project that stores the source table.

      SOURCE table name

      Yes

      The name of the source table.

      Source Partition

      No

      If you set the Source Service parameter to MaxCompute, you can configure the Source Partition parameter.

      The name of the partition that stores the data to mask in the source table. If you leave this parameter empty, DSC masks the sensitive data in all partitions of the source table.

      You can specify partitions when you create a MaxCompute table. Partitions define different logical divisions of a table. When you query data, you can specify partitions to improve query efficiency. For more information, see Partition.

      Sample SQL

      No

      If you set the Source Service parameter to RDS, PolarDB-X, OceanBase, or ECS self-built database, you can configure the Sample SQL parameter.

      The SQL statement that specifies the scope of the sensitive data to mask. If you leave this parameter empty, DSC masks all data in the source table.

      OSS objects

      Important

      Only TXT, CSV, XLSX, and XLS objects can be masked.

      Parameter

      Required

      Description

      Types of data storage

      Yes

      The storage type of the source file for data masking. Set the parameter to OSS files.

      File source

      Yes

      The source of the OSS object for data masking. Valid values: Uploaded Local File and OSS Bucket.

      Upload files

      Yes

      If you set the File source parameter to Uploaded Local File, click Select a local file to upload the OSS object.

      OSS Bucket where the source file is located

      Yes

      If you set the File source parameter to OSS Bucket, select the OSS bucket to which the OSS object belongs from the drop-down list. You can enter a keyword to search for and select the OSS bucket from the drop-down list.

      Source file names

      Yes

      If you set the File source parameter to OSS Bucket, specify the name of the source OSS object. The name must contain a file extension.

      • If you want to mask data in an OSS object, specify the name of the OSS object. Example: test.csv.

      • If you want to mask data in multiple OSS objects, turn on Open the pass. The system uses the same masking rules to mask multiple OSS objects. The objects must have the same format and identical column structure.

        After you turn on Open the pass, you can use asterisks (*) as wildcards to specify multiple OSS objects and mask data in the objects at a time. You can use asterisks (*) only in the prefix of an object name. Example: test*.xls, which matches XLS files whose name starts with test.

      Source file description

      No

      If you set the File source parameter to Uploaded Local File, enter a description for the source OSS object.

      Separator selection

      No

      The column delimiter. Select a delimiter based on the delimiter of the source OSS object. This parameter is required for CSV and TXT objects. Valid values:

      • Semicolon ";" (MacOS/Linux default)

      • Comma "," (Windows default)

      • Operator '|'.

      Table contains header rows

      No

      Specifies whether the source OSS object contains header rows.

    3. Configure the parameters in the Desensitization algorithm step and click Next.

      image

      • Select a data masking template from the drop-down list. In the source field list, the switch in the Desensitization column is automatically turned on and masking algorithms are automatically configured for fields based on the data masking template.

        The settings in the rule list of the data masking template must match the source fields that you want to mask. Otherwise, the data masking template does not take effect. For more information about how to configure a data masking template, see Configure data masking templates and algorithms.

      • Find the source field whose data you want to mask in the list, turn on the switch in the Desensitization column, and then select a masking algorithm based on your business requirements.

      You can click View and Modify Parameters in the Select Algorithm column to view and modify the rule of the selected algorithm. For more information about partition formats in algorithm rules, see Partition formats.

      Note

      If you turn on Forcefully Enable Template, you cannot change the algorithm on the page. To change the algorithm, you must modify the relevant template rule.

      Partition formats

      Partition type

      Partition format

      Example

      N weeks later

      Custom partition key column=$[yyyymmdd+7*N]

      time=$[20190710+7*1]. This partition indicates that DSC masks the data that is generated within one week after July 10, 2019.

      N weeks before

      Custom partition key column=$[yyyymmdd-7*N]

      time=$[20190710-7*3]. This partition indicates that DSC masks the data that is generated within three weeks before July 10, 2019.

      N days later

      Custom partition key column=$[yyyymmdd+N]

      time=$[20190710+2]. This partition indicates that DSC masks the data that is generated within two days after July 10, 2019.

      N days before

      Custom partition key column=$[yyyymmdd-N]

      time=$[20190710-5]. This partition indicates that DSC masks the data that is generated within five days before July 10, 2019.

      N hours later

      Custom partition key column=$[hh24mi:ss+N/24]

      time=$[0924mi:ss+2/24]. This partition indicates that DSC masks the data that is generated within 2 hours after 09:00:00 based on a 24-hour clock.

      N hours before

      Custom partition key column=$[hh24mi:ss-N/24]

      time=$[0924mi:ss-1/24]. This partition indicates that DSC masks the data that is generated within 1 hour before 09:00:00 based on a 24-hour clock.

      N minutes later

      Custom partition key column=$[hh24mi:ss+N/24/60]

      time=$[0924mi:ss+2/24/60]. This partition indicates that DSC masks the data that is generated within 2 minutes after 09:00:00 based on a 24-hour clock.

      N minutes before

      Custom partition key column=$[hh24mi:ss-N/24/60]

      time=$[0924mi:ss-2/24/60]. This partition indicates that DSC masks the data that is generated within 2 minutes before 09:00:00 based on a 24-hour clock.

  5. Turn on Enable data watermarking and configure the following parameters: Please select the field where the watermark is embedded, Please select a watermark algorithm, and Please enter watermark information. Then, click Next. If you set Source Service to RDS, you can turn on Enable data watermarking.

    If a data leak occurs after a watermark is embedded to your data, DSC can extract the watermark to identify the user who is responsible for the data leak. For more information, see the Extract watermarks section in this topic.

    For example, Employee A needs a copy of order data, and the administrator specifies "Export XX order data to Employee A on a specified day in a specified month of a specified year". for the Please enter watermark information parameter in the Data watermark step of the Add Desensitization Task wizard. The watermark is embedded to the order data when the order data is exported. If a data leak occurs, DSC can extract the watermark from the leaked data and identify the employee who is responsible for the data leak. In this example, Employee A is to blame.

    For more information about the limits on watermarks, see Limits on watermarks.

    image

  6. Configure the destination where you want to store masked data and click Test below Write Permission Test. After the test is passed, click Next.

    Important

    The account that is used to connect to the data asset must have the write permissions on the data asset.

    image

  7. Configure the processing logic.

    Parameter

    Required

    Description

    How the task is triggered

    Yes

    The method that is used to run the data masking task. Valid values:

    • Manual Only: If you select this option, you must manually run the data masking task.

    • Scheduled Only: If you select this option, you must configure automatic running of the data masking task. After the configuration is complete, the data masking task is automatically run at a specific point in time on an hourly, daily, weekly, or monthly basis.

    • Manual + Scheduled: If you select this option, you can click Start in the Actions column on the Task Configurations tab to manually run the data masking task. You can also configure automatic running of the data masking task. After the configuration is complete, the data masking task is automatically run at a specific point in time on an hourly, daily, weekly, or monthly basis.

    Turn on incremental desensitization

    No

    Specifies whether to enable incremental masking. If you turn on this switch, DSC masks only the data that is added after the previous data masking task is complete. You must specify a field whose value is increased over time as the incremental identifier. For example, you can specify the creation time field or the auto-increment ID field as the incremental identifier.

    Important

    DSC supports incremental data masking only for ApsaraDB RDS databases.

    Shard field

    No

    The shard field based on which DSC divides the source data into multiple shards. DSC concurrently masks the source data in the shards to improve the efficiency of data masking. You can specify one or more shard fields based on your business requirements.

    • DSC supports incremental data masking only for ApsaraDB RDS databases. We recommend that you use a primary key or a field on which a unique index is created as the shard field.

    • If you leave this parameter empty, a primary key is used as the shard field. DSC divides the source data based on the primary key and masks the source data.

      Important

      If the source data does not have a primary key, you must specify a shard field. Otherwise, the data masking task fails.

    • If you specify excessive shard fields, query performance and data accuracy may deteriorate. Proceed with caution.

    Table name conflict resolution

    Yes

    The method that is used to handle a table name conflict. Valid values:

    • Delete the target table and create a new table with the same name.

    • Attach data to the target table. We recommend that you select this option.

    Row Conflict Resolution

    Yes

    The method that is used to handle row conflicts. Valid values:

    • Keep conflicting rows in the target table and discard the new data. We recommend that you select this option.

    • Delete conflicting rows in the target table and insert the new data.

  8. Click Submit.

Run and view a data masking task

If you set the How the task is triggered parameter to Manual Only, you must manually run the data masking task. If you set the How the task is triggered parameter to Scheduled Only, you must configure automatic running of the data masking task. After the configuration is complete, the data masking task is automatically run at a specific point in time. If you set the How the task is triggered parameter to Manual + Scheduled, you can manually run the data masking task or configure settings to automatically run the data masking task.

  1. On the Static Desensitization tab, click the Task Configurations tab. Then, find the created data masking task and click Start in the Actions column to run the data masking task.

    image

  2. On the Static Desensitization tab, click the Status tab to view the progress and status of the data masking task.

    image

Troubleshoot the errors that occur in a data masking task

If a data masking task fails, you can refer to the following table to troubleshoot the errors.

Error

Cause

The data masking task does not exist. The task may be deleted or closed.

The switch in the Actions column of the data masking task is turned off.

The scheduling settings of the scheduled data masking task are invalid.

The value of Triggered Daily is invalid.

The source instance does not exist.

The instance to which the source table belongs does not exist.

The destination instance does not exist.

The destination instance may be deleted or the destination instance-related permissions may be revoked.

The source table cannot be found.

The source table may be deleted or the source instance-related permissions may be revoked.

The parameters that are configured for the masking algorithm are invalid.

The parameters that are configured for the masking algorithm are invalid.

The partition key column in the source table is empty.

The partition key column in the source table is empty.

The operation that writes data to the destination table failed.

DSC failed to write data to the destination table due to invalid destination settings.

The operation that queries data from the source table failed.

DSC failed to query data from the source table.

The operation that creates the destination table failed.

The destination table does not exist in the destination.

No primary key can be found.

No primary key exists in the ApsaraDB RDS source table.

The MaxCompute table-related partition that is configured for the data masking task is invalid.

Source Partition that is configured in the Desensitization Source Configuration step or the Target Partition that is configured in the Destination Location Configuration step of the Add Desensitization Task wizard is invalid.

Modify or delete a data masking task

You cannot modify or delete a data masking task that is pending execution or is running.

  • Modify a data masking task

    If you want to modify the settings of a data masking task, find the task and click Modify in the Actions column.

  • Delete a data masking task

    Important

    After a data masking task is deleted, it cannot be restored. Proceed with caution.

    If you no longer use a data masking task, you can delete the task. Find the task and click Delete in the Actions column. In the message that appears, click OK.

Extract a watermark from data

If a data leak occurs after a watermark is embedded to your data during the configuration of static data masking, DSC can extract the watermark from the leaked data. DSC reads the watermark to trace the data flow process and identifies the organization or user that is responsible for the data leak. The watermark that is embedded to the data that is distributed does not affect the use of the data.

Important

You can extract watermarks only from data in ApsaraDB RDS databases.

The watermarking feature has the following characteristics:

  • Security: The watermark that is embedded to the data is not lost even if the data is modified. This ensures that the watermark can be accurately identified.

  • Transparency: The watermark that is embedded to the data is imperceptible to users and does not affect the use of the data.

  • Detectability: DSC can extract the watermark from data fragments and identify the user who is responsible for data-related issues with a high success rate.

  • Robustness: DSC can extract the watermark from the data even if the data is subject to malicious attacks.

  • Low error rate: DSC provides a well-designed rule for extracting watermarks. This minimizes the probability of errors in data tracing.

You can perform the following steps to extract a watermark:

  1. On the Static Desensitization tab, click the Watermark Extraction tab.

  2. On the Watermark Extraction tab, specify the information about the data source.

    Parameter

    Description

    Source Service

    The source service that contains the source file for data masking. Set the value to RDS.

    Source Database/Project

    Required. The source database or project that stores the source table of the required watermark.

    SOURCE table name

    Required. The source table that contains the required watermark.

  3. Click Extract watermark.

    You can view the extracted watermark in the field below Extract watermark.

  4. Click Copy results to copy the extracted watermark.

Dynamic data masking

You can call the ExecDatamask operation to implement dynamic data masking. When you call this operation, you must provide the data that you want to mask (Data) and the ID of the masking template (TemplateId). Then, the data is masked based on the match mode in the masking template.

You can obtain the template ID on the Masking Configurations tab of the Data Desensitization page in the DSC console. You can also create a custom data masking template. For more information, see Configure data masking templates and algorithms.

image

If you set the Matching mode parameter to Sensitive type, you can select the fields in the Data Feature column on the Identification Features tab in the Rule list section. You can select built-in or custom fields.

image

If you set the Matching mode parameter to Sensitive type, you can call the ExecDatamask operation to mask data based on the ID of the data feature. You can call the DescribeRules operation and configure the CustomType and Name parameters to query the ID (Id) of a data feature. CustomType specifies the data feature source. Name specifies the data feature name.

Limits

When you call the ExecDatamask operation to dynamically mask sensitive data, make sure that the size of the sensitive data is less than 2 MB.

View the call records of the ExecDatamask operation

  1. Log on to the DSC console.

  2. In the left-side navigation pane, choose Data Governance > Data Desensitization.

  3. On the Data Desensitization page, click the Dynamic desensitization tab.

  4. On the Dynamic desensitization tab, view the call records of the ExecDatamask operation.

    Note

    If you use the same account and IP address to call the ExecDatamask operation multiple times, only one record is retained. The cumulative number of calls is recorded.