All Products
Search
Document Center

:Configure and implement data masking

Last Updated:Aug 12, 2024

Data Security Center (DSC) supports static data masking and dynamic data masking, which help you mask sensitive data in databases. This topic describes how to implement static data masking and dynamic data masking.

Data masking methods

  • Static data masking: You can create a data masking task to mask, encrypt, or replace sensitive data by using masking algorithms and save the result data to a destination that you specify.

  • Dynamic data masking: Compared with static data masking, dynamic data masking is more flexible. You can call the ExecDatamask operation to mask specific data. The size of data that you can mask in a call must be less than 2 MB.

Billing description

If the cloud service whose data you want to mask uses the pay-as-you-go billing method, you are charged based on the amount of the data that you read or write on the cloud service side.

Prerequisites

Data asset-related authorization is complete. For more information, see Authorize DSC to access databases.

Static data masking

Create a data masking task

Warning

If you enable data masking in the production environment, your database performance may be compromised.

You can create a data masking task to specify the scope and rules for data masking.

  1. Log on to the DSC console.

  2. In the left-side navigation pane, click Data Desensitization.

  3. On the Static Desensitization tab, click the Task Configurations tab. Then, click the Add Desensitization Task tab.

  4. Follow the on-screen instructions to configure the parameters for a data masking task.

    1. Configure the parameters in the Basic Task Information step of the Add Desensitization Task wizard and click Next.

      Note

      You can specify a custom task name.

    2. Configure the parameters in the Desensitization Source Configuration step of the Add Desensitization Task wizard and click Next.

      • Set Types of data storage to ApsaraDB RDS Tables/PolarDB-X Tables/MaxCompute Tables/PolarDB Tables/ApsaraDB for OceanBase Tables/AnalyticDB for MySQL Tables/Self-managed Database Tables.

        Parameter

        Required

        Description

        Types of data storage

        Yes

        The storage type of the source file for data masking. Set the parameter to ApsaraDB RDS Tables/PolarDB-X Tables/MaxCompute Tables/PolarDB Tables/ApsaraDB for OceanBase Tables/AnalyticDB for MySQL Tables/Self-managed Database Tables.

        Source Service

        Yes

        The service that provides the source table for data masking. Valid values: RDS, PolarDB-X, OceanBase, MaxCompute, ADB-MYSQL, PolarDB, and ECS self-built database.

        Source Database/Project

        Yes

        The database or project that stores the source table.

        SOURCE table name

        Yes

        The name of the source table.

        Source Partition

        No

        The name of the partition that stores the data to mask in the source table.

        You can specify partitions when you create a MaxCompute table. Partitions define different logical divisions of a table. When you query data, you can specify partitions to improve query efficiency. For more information, see Partition.

        If you set Source Service to RDS or PolarDB, you do not need to configure Source Partition.

        Source Partition is optional. If you leave this parameter empty, DSC masks the sensitive data in all partitions of the source table.

        Sample SQL

        No

        The SQL statement that specifies the scope of the sensitive data to mask. If you leave this parameter empty, DSC masks all data in the source table.

        Note

        If you set Source Service to MaxCompute or PolarDB, you do not need to configure Sample SQL.

      • Set Types of data storage to OSS files.

        Parameter

        Required

        Description

        Types of data storage

        Yes

        The storage type of the source file for data masking. Set the parameter to OSS files.

        File source

        Yes

        The source of the Object Storage Service (OSS) object for data masking. Valid values: Uploaded Local File and OSS Bucket.

        Upload files

        Yes

        If you set File source to Uploaded Local File, click Select a local file to upload the OSS object.

        Note

        Only TXT, CSV, XLSX, and XLS objects are supported.

        OSS Bucket where the source file is located

        Yes

        If you set File source to OSS Bucket, select the OSS bucket to which the OSS object belongs from the drop-down list. You can enter a keyword to search for and select the OSS bucket from the drop-down list.

        Source file names

        Yes

        If you set File source to OSS Bucket, specify the name of the source OSS object. The name must contain a suffix. Only TXT, CSV, XLSX, and XLS objects are supported.

        If you want to mask data in multiple OSS objects of the same type at a time, you can turn on Open the pass.

        Note

        After you turn on Open the pass, you can use asterisks (*) to specify multiple OSS objects and mask data in the objects at a time. You can use asterisks (*) only in the prefix of an object name. Example: test*.xls. After you specify multiple OSS objects, DSC masks data based on the same rule. Make sure that the objects have the same column structure.

        Source file description

        No

        If you set File source to Uploaded Local File, enter a description for the source OSS object.

        Separator selection

        No

        The column delimiter. Select a delimiter based on the format of the source OSS object. This parameter is required for CSV and TXT objects. Valid values:

        • Semicolon ";" (MacOS/Linux default)

        • Comma "," (Windows default)

        • Operator '|'.

        Table contains header rows

        No

        Specifies whether the source OSS object contains header rows.

    3. Configure the parameters in the Desensitization algorithm step of the Add Desensitization Task wizard and click Next.

      You can use one of the following methods to configure masking algorithms:

      • Turn off Forcefully Enable Template: Select a data masking template from the drop-down list. Alternatively, find a source field whose data you want to mask in the list, turn on the switch in the Desensitization column, and then select a masking algorithm based on your business requirements.

        • You can click View and Modify Parameters in the Select Algorithm column to view and modify the rule of the selected algorithm. For more information about partition formats in algorithm rules, see Partition formats.

          Partition formats

          Partition type

          Partition format

          Example

          N weeks later

          Custom partition key column=$[yyyymmdd+7*N]

          time=$[20190710+7*1]. This partition indicates that DSC masks the data that is generated within a week after July 10, 2019.

          N weeks before

          Custom partition key column=$[yyyymmdd-7*N]

          time=$[20190710-7*3]. This partition indicates that DSC masks the data that is generated within three weeks before July 10, 2019.

          N days later

          Custom partition key column=$[yyyymmdd+N]

          time=$[20190710+2]. This partition indicates that DSC masks the data that is generated within two days after July 10, 2019.

          N days before

          Custom partition key column=$[yyyymmdd-N]

          time=$[20190710-5]. This partition indicates that DSC masks the data that is generated within five days before July 10, 2019.

          N hours later

          Custom partition key column=$[hh24mi:ss+N/24]

          time=$[0924mi:ss+2/24]. This partition indicates that DSC masks the data that is generated within 2 hours after 09:00:00 in the 24-hour clock.

          N hours before

          Custom partition key column=$[hh24mi:ss-N/24]

          time=$[0924mi:ss-1/24]. This partition indicates that DSC masks the data that is generated within 1 hour before 09:00:00 in the 24-hour clock.

          N minutes later

          Custom partition key column=$[hh24mi:ss+N/24/60]

          time=$[0924mi:ss+2/24/60]. This partition indicates that DSC masks the data that is generated within 2 minutes after 09:00:00 in the 24-hour clock.

          N minutes before

          Custom partition key column=$[hh24mi:ss-N/24/60]

          time=$[0924mi:ss-2/24/60]. This partition indicates that DSC masks the data that is generated within 2 minutes before 09:00:00 in the 24-hour clock.

        • If you turn off the switch in the Desensitization column, the selected masking algorithm does not take effect.

      • Turn on Forcefully Enable Template: Select a data masking template from the drop-down list. Then, DSC masks data based on the algorithm specified in the template.

      The settings in the rule list of the data masking template must match the source fields that you want to mask. Otherwise, the data masking template does not take effect. For more information about how to configure a data masking template, see Configure data masking templates and algorithms.

      image

    4. Turn on Enable data watermarking and configure the following parameters: Please select the field where the watermark is embedded, Please select a watermark algorithm, and Please enter watermark information. Then, click Next. If you set Source Service to RDS, you can turn on Enable data watermarking.

      If a data leak occurs after a watermark is embedded to your data, DSC can extract the watermark to identify the user who is responsible for the data leak. For more information, see the Extract watermarks section in this topic.

      For example, Employee A needs a copy of order data, and the administrator specifies "Export XX order data to Employee A on a specified day in a specified month of a specified year". for Please enter watermark information in the Data watermark step of the Add Desensitization Task wizard. The watermark is embedded to the order data when the order data is exported. If a data leak occurs, DSC can extract the watermark from the leaked data and identify the employee who is responsible for the data leak. In this example, Employee A is to blame.

      For more information about the limits on watermarks, see Limits on watermarks.

      image

    5. Configure the destination where you want to store masked data and click Test below Write Permission Test. After the test is passed, click Next.

      image

    6. Configure the processing logic.

      Parameter

      Required

      Description

      How the task is triggered

      Yes

      The method that is used to run the data masking task. Valid values:

      • Manual Only: If you select this option, you must manually run the data masking task on the Static Desensitization tab of the Data Desensitization page.

      • Scheduled Only: If you select this option, you must configure automatic running of the data masking task. After the configuration is complete, the data masking task is automatically run at a specific point in time on an hourly, daily, weekly, or monthly basis.

      • Manual + Scheduled: If you select this option, you can click Start in the Actions column on the Task Configurations tab to manually run the data masking task. You can also configure automatic running of the data masking task. After the configuration is complete, the data masking task is automatically run at a specific point in time on an hourly, daily, weekly, or monthly basis.

      Turn on incremental desensitization

      No

      Specifies whether to enable incremental masking. If you turn on this switch, DSC masks only the data that is added after the previous data masking task is complete. You must specify a field whose value is increased over time as the incremental identifier. For example, you can specify the creation time field or the auto-increment ID field as the incremental identifier.

      Note

      DSC supports incremental data masking only for ApsaraDB RDS databases.

      Shard field

      No

      The shard field based on which DSC divides the source data into multiple shards. DSC concurrently masks the source data in the shards to improve the efficiency of data masking. You can specify one or more shard fields based on your business requirements.

      Note
      • DSC supports incremental data masking only for ApsaraDB RDS databases. We recommend that you use a primary key or a field on which a unique index is created as the shard field.

      • If you leave this parameter empty, a primary key is used as the shard field. DSC divides the source data based on the primary key and masks the source data. If the source data does not have a primary key, you must specify a shard field. Otherwise, the data masking task fails.

      • If you specify excessive shard fields, query performance and data accuracy may deteriorate. Proceed with caution.

      Table name conflict resolution

      Yes

      The method that is used to handle a table name conflict. Valid values:

      • Delete the target table and create a new table with the same name.

      • Attach data to the target table. We recommend that you select this option.

      Row Conflict Resolution

      Yes

      The method that is used to handle a row conflict. Valid values:

      • Keep conflicting rows in the target table and discard the new data. We recommend that you select this option.

      • Delete conflicting rows in the target table and insert the new data.

    7. Click Submit.

Run and view a data masking task

If you set How the task is triggered to Manual Only, you must manually run the data masking task. If you set How the task is triggered to Scheduled Only, you must configure automatic running of the data masking task. After the configuration is complete, the data masking task is automatically run at a specific point in time. If you set How the task is triggered to Manual + Scheduled, you can manually run the data masking task or configure settings to automatically run the data masking task.

  1. On the Static Desensitization tab, click the Task Configurations tab. Then, find the created data masking task and click Start in the Actions column to run the data masking task.

    image

  2. On the Static Desensitization tab, click the Status tab to view the progress and status of the data masking task.

    image

Troubleshoot the errors that occur in a data masking task

If a data masking task fails, you can refer to the following table to troubleshoot the errors.

Error

Cause

The data masking task does not exist. The task may be deleted or closed.

The switch in the Actions column of the data masking task is turned off.

The scheduling settings of the scheduled data masking task are invalid.

The value of Triggered Daily is invalid.

The source instance does not exist.

The instance to which the source table belongs does not exist.

The destination instance does not exist.

The destination instance may be deleted or the destination instance-related permissions may be revoked.

The source table cannot be found.

The source table may be deleted or the source instance-related permissions may be revoked.

The parameters that are configured for the masking algorithm are invalid.

The parameters that are configured for the masking algorithm are invalid.

The partition key column in the source table is empty.

The partition key column in the source table is empty.

The operation that writes data to the destination table failed.

DSC failed to write data to the destination table due to invalid destination settings.

The operation that queries data from the source table failed.

DSC failed to query data from the source table.

The operation that creates the destination table failed.

The destination table does not exist in the destination.

No primary key can be found.

No primary key exists in the ApsaraDB RDS source table.

The MaxCompute table-related partition that is configured for the data masking task is invalid.

Source Partition that is configured in the Desensitization Source Configuration step or the Target Partition that is configured in the Destination Location Configuration step of the Add Desensitization Task wizard is invalid.

Modify or delete a data masking task

You cannot modify or delete a data masking task that is pending execution or is running.

  • Modify a data masking task

    If you want to modify the settings of a data masking task, find the task and click Modify in the Actions column.

  • Delete a data masking task

    Important

    After a data masking task is deleted, it cannot be restored. Proceed with caution.

    If you no longer use a data masking task, you can delete the task. Find the task and click Delete in the Actions column. In the message that appears, click OK.

Extract a watermark from data

If a data leak occurs after a watermark is embedded to your data, DSC can extract the watermark from the leaked data. DSC reads the watermark to trace the data flow process and identifies the organization or user that is responsible for the data leak. The watermark that is embedded to the data that is distributed does not affect the use of the data.

Important

You can extract watermarks only from data in ApsaraDB RDS databases.

The watermarking feature has the following characteristics:

  • Security: The watermark that is embedded to the data is not lost even if the data is modified. This ensures that the watermark can be accurately identified.

  • Transparency: The watermark that is embedded to the data is imperceptible to users and does not affect the use of the data.

  • Detectability: DSC can extract the watermark from data fragments and identify the user who is responsible for data-related issues with a high success rate.

  • Robustness: DSC can extract the watermark from the data even if the data is subject to malicious attacks.

  • Low error rate: DSC provides a well-designed rule for extracting watermarks. This minimizes the probability of errors in data tracing.

You can perform the following steps to extract a watermark:

  1. On the Static Desensitization tab, click the Watermark Extraction tab.

  2. On the Watermark Extraction tab, specify the information about the data source.

    Parameter

    Description

    Source Service

    The source service that contains the source file for data masking. Set the value to RDS.

    Source Database/Project

    Required. The source database or project that stores the source table of the required watermark.

    SOURCE table name

    Required. The source table that contains the required watermark.

  3. Click Extract watermark.

    You can view the extracted watermark in the field below Extract watermark.

  4. Click Copy results to copy the extracted watermark.

Dynamic data masking

You can call the ExecDatamask operation to implement dynamic data masking. When you call this operation, you must specify the ID of a data masking template. You can obtain the ID on the Masking Configurations tab of the Data Desensitization page in the DSC console. You can also create a data masking template and use the ID. For more information, see Configure a data masking template.

image

Limits

When you call the ExecDatamask operation to dynamically mask sensitive data, make sure that the size of the sensitive data is less than 2 MB.

View the call records of the ExecDatamask operation

  1. Log on to the DSC console.

  2. In the left-side navigation pane, click Data Desensitization.

  3. On the Data Desensitization page, click the Dynamic desensitization tab.

  4. On the Dynamic desensitization tab, view the call records of the ExecDatamask operation.

    Note

    If you use the same account and IP address to call the ExecDatamask operation multiple times, only one record is retained. The cumulative number of calls is recorded.