All Products
Search
Document Center

Hologres:Mask data

Last Updated:Oct 31, 2024

Hologres provides the data masking feature to allow you to mask data by column. After you enable this feature, when you query data, sensitive information is masked in query results. This helps protect your sensitive and private data. This topic describes how to enable the data masking feature, and query and delete data masking configurations.

Background information

In the era of big data, new technologies such as big data, cloud computing, and artificial intelligence (AI) are widely applied. These technologies provide powerful support for in-depth data mining and analytics and help enterprises explore the potential value of big data. However, sensitive and private data is difficult to protect in the face of these technologies.

Hologres provides the data masking feature to prevent your sensitive and private data from leaking during data sharing, mining, and analytics.

Limits

  • For security considerations, Hologres does not allow you to import data from tables or columns that have data masking rules configured to tables or columns that do not have data masking rules configured. Otherwise, the following error message is reported:

    ERROR: The insert table has not set SECURITY LABEL
  • You cannot perform UNION or DISTINCT operations on tables that have data masking rules configured. Otherwise, the following error message is reported:

    ERROR: UNION is not support on security item
  • Data masking may affect query performance. The severity of the effect varies based on the masking method and data amount. The query performance may degrade by 10% to 20%, or even by several times in extreme circumstances.

Use the data masking feature

Hologres allows you to configure a data masking rule for a specific column or user. Before you use this feature, you must log on to the database and enable the data masking feature. Then, you can configure a data masking rule for a specific column or user. Procedure:

  1. Enable the data masking feature for a database.

    By default, the data masking feature is disabled. To enable this feature for a database, you must log on to the database as a superuser and execute statements.

    • Syntax

      CREATE EXTENSION IF NOT EXISTS hg_anon; -- Create the hg_anon extension function. 
      ALTER DATABASE <db_name> SET hg_anon_enable = on; -- Enable the data masking feature for the specified database. By default, the value of the hg_anon_enable parameter is off.

    • Example

      CREATE EXTENSION IF NOT EXISTS hg_anon;-- Create the hg_anon extension function. 
      ALTER DATABASE test SET hg_anon_enable = on;-- Enable the data masking feature for the database named test.

    • Parameters

      Parameter

      Description

      hg_anon

      The built-in extension function provided by Hologres. This function is required to enable the data masking feature.

      <db_name>

      The name of the database for which you want to enable the data masking feature. Replace <db_name> with the name of your database.

      hg_anon_enable

      Specifies whether to enable or disable the data masking feature. Valid values:

      • on: enables the data masking feature.

      • off: disables the data masking feature.

      Default value: off.

  2. Configure a data masking rule for a column.

    You can mask data in a specific column as a superuser or the table owner.

    • Syntax

      Note

      To mask data in multiple columns, you must execute the statement multiple times.

      SECURITY LABEL FOR hg_anon ON COLUMN <tablename>.<col_name> IS <label_name>;
    • Example: Mask data in the id column in the holotest table by name.

      SECURITY LABEL FOR hg_anon ON COLUMN holotest.id IS 'name';
    • Parameters

      Parameter

      Description

      hg_anon

      The built-in extension function provided by Hologres. This function is required to enable the data masking feature.

      <tablename>

      The name of the table that contains the column whose data you want to mask. Replace <tablename> with the name of the table.

      <col_name>

      The name of the column whose data you want to mask. Replace <col_name> with the name of the column.

      label_name

      The data masking rule predefined by the system for masking data. You can execute the show hg_anon_labels; statement to query the values of this parameter available for the database.

      The following table describes the default values of the label_name parameter in Hologres V1.1 and later.

      lable_name

      Description

      Example

      name

      Masks the name.

      • Before masking: Li Hua. After masking: *Hua.

      • Before masking: Wang Xiaoqiang. After masking: **qiang.

      email

      Masks the email address.

      Before masking: lihuang@alibaba.com. After masking: lih***@alibaba.com.

      ip

      Masks the IP address.

      Before masking: 192.0.2.1. After masking: 192.*.*.*.

      id

      Masks the ID card number.

      Before masking: 110345188812011234. After masking: 1****************4.

      phone

      Masks the mobile phone number.

      Before masking: 13900001234. After masking: *********34.

      bank_id

      Masks the bank or credit card number.

      Before masking: 2349867902834701928. After masking: ***************1928.

      hash

      Uses the MD5 algorithm for masking.

      Before masking: Wenyi West Road, Hangzhou, Zhejiang. After masking: dbf894b409d4a2ef17dfd9c7fdcafcd8.

      first_mask

      Displays only the first character.

      Before masking: 123456789. After masking: 1********.

    • Sample result结果示例

    • Other related statements

      • Delete the data masking configuration for a column.

        SECURITY LABEL FOR hg_anon ON COLUMN test_hg_anon_demo.name is NULL;
      • Query the columns in which data is masked.

        SELECT c.relname, a.attname, provider, label
        FROM pg_seclabel s 
            INNER JOIN pg_catalog.pg_class c ON s.classoid = c.tableoid AND s.objoid = c.oid
            INNER JOIN pg_catalog.pg_attribute a ON s.objoid = a.attrelid
        WHERE a.attnum = objsubid;

Advanced settings

  • Configure data masking rules for different users.

    By default, the data masking feature takes effect for all users. You can configure data masking rules to unmask data of specific users.

    • Syntax

       SECURITY LABEL FOR hg_anon ON ROLE user_name IS '[label_name|all]:[masked|unmasked]';

      Parameter

      Description

      user_name

      The account ID. You can obtain the account ID on the Basic Information page in the Account Management console.

      label_name

      The data masking rule predefined by the system for masking data. You can execute the show hg_anon_labels; statement to query the values of this parameter available for the database.

      masked

      Masks data.

      unmasked

      Unmasks data.

    • Examples

      • Execute the following statement to unmask the data of the user whose account ID is 1365xxxxxxxxxxxx:

        SECURITY LABEL FOR hg_anon ON ROLE "1365xxxxxxxxxxxx" IS 'all:unmasked';
      • Execute the following statement to unmask data in the phone column for the user whose account ID is 1365xxxxxxxxxxxx:

        SECURITY LABEL FOR hg_anon ON ROLE "1365xxxxxxxxxxxx" IS 'phone:unmasked';
    • Other related statements

      • Query the data masking configuration for a user.

        SELECT usename, label FROM pg_shseclabel s INNER JOIN pg_catalog.pg_user u ON s.objoid = u.usesysid;
      • Delete the data masking configuration for a user.

        SECURITY LABEL FOR hg_anon ON ROLE "1365xxxxxxxxxxxx" IS NULL;
  • Create custom data masking rules.

    If the default label_name values do not meet your requirements, you can modify the Global User Configuration (GUC) parameter hg_anon_labels to create custom data masking rules.

    • Syntax

      ALTER DATABASE <db_name> SET hg_anon_labels = '[
        {"label": <label_name1>, "method", <method1>},
        {"label": <label_name2>, "method", <method2>},
        ... 
      ]'; -- The label_name parameter specifies the name of the data masking rule that you want to create. The method parameter specifies a Hologres built-in method.

      Note

      After you execute the ALTER DATABASE statement, the new settings apply to subsequent sessions, but not the current session. You must reconnect to the database to check the new settings. You can execute the following statement to check whether the new settings take effect:

      SHOW hg_anon_enable;
    • Example

      ALTER DATABASE test_db SET hg_anon_labels = '[
      {"label":"ip", "method":{"desensType":"mask", "type":"ip"}},
      {"label":"email", "method":{"desensType":"mask", "type":"email"}},
      {"label":"name", "method":{"desensType":"mask", "type":"name"}},
      {"label":"first_mask", "method":{"desensType":"mask", "type":"user_define", "before":1, "after":0}},
      {"label":"hash", "method":{"desensType":"hash", "type":"md5", "salt":""}}]';
    • Parameters

      Data masking rule

      Description

      Example

      {"desensType":"mask", "type":"ip"}

      Masks the IP address.

      192.*.*.*

      {"desensType":"mask", "type":"email"}

      Masks the email address.

      abc***@example.net

      {"desensType":"mask", "type":"name"}

      Masks the name.

      *Wu

      {"desensType":"hash", "type":"md5", "salt":""}

      Uses the hash algorithm for masking.

      e086aa137fa19f67d27b39d0eca186103228f322c9c98a125554a24f875f0f7e

      {"label":"first_mask", "method":{"desensType":"mask", "type":"user_define", "before":1, "after":0}}{"label":"last_mask", "method":{"desensType":"mask", "type":"user_define", "before":0, "after":1}}

      Performs custom data masking.

      None

Example

The following procedure provides an example on how to use the data masking feature in Hologres.

  1. Create a database.

    Execute the following statement to create a database:

    CREATE DATABASE hg_anon_demo;
  2. Create a data table.

    Execute the following statements to create a data table named personal_basic_information in the hg_anon_demo database:

    DROP TABLE IF EXISTS personal_basic_information;
    
    CREATE TABLE personal_basic_information
    (
        name TEXT
        ,email TEXT
        ,ip TEXT
        ,id TEXT
        ,phone TEXT
        ,bank_id TEXT
    )
    ;
  3. Insert data into the table.

    Execute the following statement to insert data into the personal_basic_information table:

    INSERT INTO personal_basic_information(name,email,ip,id,phone,bank_id) VALUES
    ('Zhang San','zhangsan@alibaba.com','127.0.0.1','142732199102290022','13900001234','4514610803067088'),
    ('Li Si','lisi@alibaba.com','127.0.0.1','510622198402308000','13900001111','6252470010027800'),
    ('Li Xiaoyao','lixiaoyao@alibaba.com','172.21.4.234','511025188812271696','13900002222','6252470010027800');
  4. Query the data.

    Execute the following statement to query the data before masking:

    SELECT * FROM personal_basic_information;
  5. Configure data masking rules.

    Execute the following statements to configure data masking rules:

    -- Create the hg_anon extension function. 
    CREATE EXTENSION IF NOT EXISTS hg_anon;
    
    -- Enable data masking for the hg_anon_demo database. 
    ALTER DATABASE hg_anon_demo SET hg_anon_enable = on;
    
    -- Configure a data masking rule for each column.
    SECURITY LABEL FOR hg_anon ON COLUMN personal_basic_information.name IS 'name';
    SECURITY LABEL FOR hg_anon ON COLUMN personal_basic_information.id IS 'id';
    SECURITY LABEL FOR hg_anon ON COLUMN personal_basic_information.phone IS 'phone';
    SECURITY LABEL FOR hg_anon ON COLUMN personal_basic_information.email IS 'email';
    SECURITY LABEL FOR hg_anon ON COLUMN personal_basic_information.bank_id IS 'bank_id';
    SECURITY LABEL FOR hg_anon ON COLUMN personal_basic_information.ip IS 'ip';
  6. Query the data.

    Disconnect from the database, reconnect to the database, and then execute the following statement to query the data after masking:

    SELECT * FROM personal_basic_information;

    The following figure shows the data after masking.脱敏后数据

    • To unmask the data of the user whose account ID is 1365xxxxxxxxxxxx, execute the following statement:

      SECURITY LABEL FOR hg_anon ON ROLE "1365xxxxxxxxxxxx" IS 'all:unmasked';

      The following figure shows the query result.设置不脱敏

    • To unmask data in the phone column for the user whose account ID is 1365xxxxxxxxxxxx, execute the following statement:

      SECURITY LABEL FOR hg_anon ON ROLE "1365xxxxxxxxxxxx" IS 'phone:unmasked';

      The following figure shows the query result.phone标签不脱敏

Use Data Security Guard to mask data

Apart from manually configuring data masking rules, you can also use Data Security Guard to mask data.

  • Limits

    • Only Hologres V1.1 and later support Data Security Guard. If the version of your Hologres instance is earlier than V1.1, manually upgrade your Hologres instance in the Hologres console or join a Hologres DingTalk group to apply for an instance upgrade. For more information about how to manually upgrade a Hologres instance, see Instance upgrades. For more information about how to join a Hologres DingTalk group, see Obtain online support for Hologres.

    • To identify sensitive data, Data Security Guard disables the data masking feature for Alibaba Cloud accounts.

    • Data Security Guard samples data at 09:00:00 every day. Then, Data Security Guard identifies sensitive data and configures data masking rules for columns that contain the identified sensitive data.

    • You can use Data Security Guard to mask data in Hologres in the following regions: China (Beijing), China (Zhangjiakou), China (Shanghai), China (Hangzhou), China (Shenzhen), China (Hong Kong), Singapore, Germany (Frankfurt), Malaysia (Kuala Lumpur), Indonesia (Jakarta), and US (Silicon Valley).

  • Procedure

    • Enable the data masking feature for a database.

      By default, data masking is disabled. To enable this feature for a database, you must log on to the database as a superuser and execute the following statements:

      -- Create the hg_anon extension function.
      CREATE EXTENSION IF NOT EXISTS hg_anon;
      -- Enable data masking for the specified database. By default, the value of the hg_anon_enable parameter is off. 
      ALTER DATABASE <db_name> SET hg_anon_enable = on;  

      The db_name parameter specifies the name of the database for which you want to enable data masking.

      Important
      • The hg_anon_enable parameter is a GUC parameter. After you execute the ALTER DATABASE statement, the new settings do not take effect for the current connection.

      • You can execute the following statement to check whether the settings take effect:

        SHOW hg_anon_enable;
    • Configure data masking rules for the database.

      1. Log on to the DataWorks console and go to the Data Security Guard page. For more information, see the "Go to the Data Security Guard page" section in Overview.

      2. In the left-side navigation pane, choose Rule Change > Sensitive data identification to go to the Sensitive data identification page.

      3. On the Sensitive data identification page, create a sensitive field type and configure a sensitive data identification rule. For more information, see Identify sensitive data.

      4. In the left-side navigation pane, choose Rule Change > Data Masking to go to the Data Masking page.

      5. Select Hologres Config(hologres_display_desense_code) from the Desensitization scenario drop-down list and click Select desensitization database.

      6. In the Authorize The Desensitization Of Account dialog box, select the database for which you want to configure data masking rules in the Not desensitized database list and click the rightwards arrow to add the database to the Desensitized database list. Then, select I agree to authorize the data protection umbrella to desensitize the database and click OK.授权账号脱敏

      7. On the Data Masking page, click Create Masking Rule in the upper-right corner and configure related parameters to create data masking rules. For more information, see Create a data masking rule. Then, the system starts to mask data for the specified database.

FAQ

What do I do if data is not masked after the procedure described in the example is performed?

  • Problem description: Data is not masked after the procedure described in the example is performed.

  • Possible causes:

    • Data masking is disabled for specific users.

    • The label_name parameter is not configured.

  • Solution:

    • Execute the following statement to check whether the data masking feature is disabled for specific users:

      SELECT
        usename,
        label
      FROM pg_shseclabel s
      INNER JOIN pg_catalog.pg_user u on s.objoid = u.usesysid;

      By default, the query result is empty, which indicates that the data masking rule is enabled for all users. If the query result is not empty, the data masking rule is disabled for some users. In this case, you must enable data masking for the users. For more information, see Advanced settings in this topic.

    • Execute the following statement to check the label_name values:

      SHOW hg_anon_labels;

      If the query result does not contain values such as ip, execute the following statement to configure the label_name values:

      ALTER DATABASE compress_test SET hg_anon_labels = '[
      {"label":"ip", "method":{"desensType":"mask", "type":"ip"}},
      {"label":"email", "method":{"desensType":"mask", "type":"email"}},
      {"label":"name", "method":{"desensType":"mask", "type":"name"}},
      {"label":"id", "method":{"desensType":"mask", "type":"id"}},
      {"label":"phone", "method":{"desensType":"mask", "type":"phone"}},
      {"label":"bank_id", "method":{"desensType":"mask", "type":"bank_id"}},
      {"label":"hash", "method":{"desensType":"hash", "type":"md5", "salt":""}},
      {"label":"first_mask", "method":{"desensType":"mask", "type":"user_define", "before":1, "after":0}}
      ]';

      For more information about the rules that correspond to the label_name values, see the table in this topic.