Hologres provides the data masking feature to allow you to mask data by column. After you enable this feature, when you query data, sensitive information is masked in query results. This helps protect your sensitive and private data. This topic describes how to enable the data masking feature, and query and delete data masking configurations.
Background information
In the era of big data, new technologies such as big data, cloud computing, and artificial intelligence (AI) are widely applied. These technologies provide powerful support for in-depth data mining and analytics and help enterprises explore the potential value of big data. However, sensitive and private data is difficult to protect in the face of these technologies.
Hologres provides the data masking feature to prevent your sensitive and private data from leaking during data sharing, mining, and analytics.
Limits
For security considerations, Hologres does not allow you to import data from tables or columns that have data masking rules configured to tables or columns that do not have data masking rules configured. Otherwise, the following error message is reported:
ERROR: The insert table has not set SECURITY LABEL
You cannot perform UNION or DISTINCT operations on tables that have data masking rules configured. Otherwise, the following error message is reported:
ERROR: UNION is not support on security item
Data masking may affect query performance. The severity of the effect varies based on the masking method and data amount. The query performance may degrade by 10% to 20%, or even by several times in extreme circumstances.
Use the data masking feature
Hologres allows you to configure a data masking rule for a specific column or user. Before you use this feature, you must log on to the database and enable the data masking feature. Then, you can configure a data masking rule for a specific column or user. Procedure:
Enable the data masking feature for a database.
By default, the data masking feature is disabled. To enable this feature for a database, you must log on to the database as a superuser and execute statements.
Syntax
CREATE EXTENSION IF NOT EXISTS hg_anon; -- Create the hg_anon extension function. ALTER DATABASE <db_name> SET hg_anon_enable = on; -- Enable the data masking feature for the specified database. By default, the value of the hg_anon_enable parameter is off.
Example
CREATE EXTENSION IF NOT EXISTS hg_anon;-- Create the hg_anon extension function. ALTER DATABASE test SET hg_anon_enable = on;-- Enable the data masking feature for the database named test.
Parameters
Parameter
Description
hg_anon
The built-in extension function provided by Hologres. This function is required to enable the data masking feature.
<db_name>
The name of the database for which you want to enable the data masking feature. Replace <db_name> with the name of your database.
hg_anon_enable
Specifies whether to enable or disable the data masking feature. Valid values:
on: enables the data masking feature.
off: disables the data masking feature.
Default value: off.
Configure a data masking rule for a column.
You can mask data in a specific column as a superuser or the table owner.
Syntax
NoteTo mask data in multiple columns, you must execute the statement multiple times.
SECURITY LABEL FOR hg_anon ON COLUMN <tablename>.<col_name> IS <label_name>;
Example: Mask data in the id column in the holotest table by name.
SECURITY LABEL FOR hg_anon ON COLUMN holotest.id IS 'name';
Parameters
Parameter
Description
hg_anon
The built-in extension function provided by Hologres. This function is required to enable the data masking feature.
<tablename>
The name of the table that contains the column whose data you want to mask. Replace <tablename> with the name of the table.
<col_name>
The name of the column whose data you want to mask. Replace <col_name> with the name of the column.
label_name
The data masking rule predefined by the system for masking data. You can execute the
show hg_anon_labels;
statement to query the values of this parameter available for the database.The following table describes the default values of the label_name parameter in Hologres V1.1 and later.
lable_name
Description
Example
name
Masks the name.
Before masking: Li Hua. After masking: *Hua.
Before masking: Wang Xiaoqiang. After masking: **qiang.
email
Masks the email address.
Before masking: lihuang@alibaba.com. After masking: lih***@alibaba.com.
ip
Masks the IP address.
Before masking: 192.0.2.1. After masking: 192.*.*.*.
id
Masks the ID card number.
Before masking: 110345188812011234. After masking: 1****************4.
phone
Masks the mobile phone number.
Before masking: 13900001234. After masking: *********34.
bank_id
Masks the bank or credit card number.
Before masking: 2349867902834701928. After masking: ***************1928.
hash
Uses the MD5 algorithm for masking.
Before masking: Wenyi West Road, Hangzhou, Zhejiang. After masking: dbf894b409d4a2ef17dfd9c7fdcafcd8.
first_mask
Displays only the first character.
Before masking: 123456789. After masking: 1********.
Sample result
Other related statements
Delete the data masking configuration for a column.
SECURITY LABEL FOR hg_anon ON COLUMN test_hg_anon_demo.name is NULL;
Query the columns in which data is masked.
SELECT c.relname, a.attname, provider, label FROM pg_seclabel s INNER JOIN pg_catalog.pg_class c ON s.classoid = c.tableoid AND s.objoid = c.oid INNER JOIN pg_catalog.pg_attribute a ON s.objoid = a.attrelid WHERE a.attnum = objsubid;
Advanced settings
Configure data masking rules for different users.
By default, the data masking feature takes effect for all users. You can configure data masking rules to unmask data of specific users.
Syntax
SECURITY LABEL FOR hg_anon ON ROLE user_name IS '[label_name|all]:[masked|unmasked]';
Parameter
Description
user_name
The account ID. You can obtain the account ID on the Basic Information page in the Account Management console.
label_name
The data masking rule predefined by the system for masking data. You can execute the
show hg_anon_labels;
statement to query the values of this parameter available for the database.masked
Masks data.
unmasked
Unmasks data.
Examples
Execute the following statement to unmask the data of the user whose account ID is 1365xxxxxxxxxxxx:
SECURITY LABEL FOR hg_anon ON ROLE "1365xxxxxxxxxxxx" IS 'all:unmasked';
Execute the following statement to unmask data in the
phone
column for the user whose account ID is 1365xxxxxxxxxxxx:SECURITY LABEL FOR hg_anon ON ROLE "1365xxxxxxxxxxxx" IS 'phone:unmasked';
Other related statements
Query the data masking configuration for a user.
SELECT usename, label FROM pg_shseclabel s INNER JOIN pg_catalog.pg_user u ON s.objoid = u.usesysid;
Delete the data masking configuration for a user.
SECURITY LABEL FOR hg_anon ON ROLE "1365xxxxxxxxxxxx" IS NULL;
Create custom data masking rules.
If the default label_name values do not meet your requirements, you can modify the Global User Configuration (GUC) parameter hg_anon_labels to create custom data masking rules.
Syntax
ALTER DATABASE <db_name> SET hg_anon_labels = '[ {"label": <label_name1>, "method", <method1>}, {"label": <label_name2>, "method", <method2>}, ... ]'; -- The label_name parameter specifies the name of the data masking rule that you want to create. The method parameter specifies a Hologres built-in method.
NoteAfter you execute the ALTER DATABASE statement, the new settings apply to subsequent sessions, but not the current session. You must reconnect to the database to check the new settings. You can execute the following statement to check whether the new settings take effect:
SHOW hg_anon_enable;
Example
ALTER DATABASE test_db SET hg_anon_labels = '[ {"label":"ip", "method":{"desensType":"mask", "type":"ip"}}, {"label":"email", "method":{"desensType":"mask", "type":"email"}}, {"label":"name", "method":{"desensType":"mask", "type":"name"}}, {"label":"first_mask", "method":{"desensType":"mask", "type":"user_define", "before":1, "after":0}}, {"label":"hash", "method":{"desensType":"hash", "type":"md5", "salt":""}}]';
Parameters
Data masking rule
Description
Example
{"desensType":"mask", "type":"ip"}
Masks the IP address.
192.*.*.*
{"desensType":"mask", "type":"email"}
Masks the email address.
abc***@example.net
{"desensType":"mask", "type":"name"}
Masks the name.
*Wu
{"desensType":"hash", "type":"md5", "salt":""}
Uses the hash algorithm for masking.
e086aa137fa19f67d27b39d0eca186103228f322c9c98a125554a24f875f0f7e
{"label":"first_mask", "method":{"desensType":"mask", "type":"user_define", "before":1, "after":0}}{"label":"last_mask", "method":{"desensType":"mask", "type":"user_define", "before":0, "after":1}}
Performs custom data masking.
None
Example
The following procedure provides an example on how to use the data masking feature in Hologres.
Create a database.
Execute the following statement to create a database:
CREATE DATABASE hg_anon_demo;
Create a data table.
Execute the following statements to create a data table named personal_basic_information in the hg_anon_demo database:
DROP TABLE IF EXISTS personal_basic_information; CREATE TABLE personal_basic_information ( name TEXT ,email TEXT ,ip TEXT ,id TEXT ,phone TEXT ,bank_id TEXT ) ;
Insert data into the table.
Execute the following statement to insert data into the personal_basic_information table:
INSERT INTO personal_basic_information(name,email,ip,id,phone,bank_id) VALUES ('Zhang San','zhangsan@alibaba.com','127.0.0.1','142732199102290022','13900001234','4514610803067088'), ('Li Si','lisi@alibaba.com','127.0.0.1','510622198402308000','13900001111','6252470010027800'), ('Li Xiaoyao','lixiaoyao@alibaba.com','172.21.4.234','511025188812271696','13900002222','6252470010027800');
Query the data.
Execute the following statement to query the data before masking:
SELECT * FROM personal_basic_information;
Configure data masking rules.
Execute the following statements to configure data masking rules:
-- Create the hg_anon extension function. CREATE EXTENSION IF NOT EXISTS hg_anon; -- Enable data masking for the hg_anon_demo database. ALTER DATABASE hg_anon_demo SET hg_anon_enable = on; -- Configure a data masking rule for each column. SECURITY LABEL FOR hg_anon ON COLUMN personal_basic_information.name IS 'name'; SECURITY LABEL FOR hg_anon ON COLUMN personal_basic_information.id IS 'id'; SECURITY LABEL FOR hg_anon ON COLUMN personal_basic_information.phone IS 'phone'; SECURITY LABEL FOR hg_anon ON COLUMN personal_basic_information.email IS 'email'; SECURITY LABEL FOR hg_anon ON COLUMN personal_basic_information.bank_id IS 'bank_id'; SECURITY LABEL FOR hg_anon ON COLUMN personal_basic_information.ip IS 'ip';
Query the data.
Disconnect from the database, reconnect to the database, and then execute the following statement to query the data after masking:
SELECT * FROM personal_basic_information;
The following figure shows the data after masking.
To unmask the data of the user whose account ID is 1365xxxxxxxxxxxx, execute the following statement:
SECURITY LABEL FOR hg_anon ON ROLE "1365xxxxxxxxxxxx" IS 'all:unmasked';
The following figure shows the query result.
To unmask data in the
phone
column for the user whose account ID is 1365xxxxxxxxxxxx, execute the following statement:SECURITY LABEL FOR hg_anon ON ROLE "1365xxxxxxxxxxxx" IS 'phone:unmasked';
The following figure shows the query result.
Use Data Security Guard to mask data
Apart from manually configuring data masking rules, you can also use Data Security Guard to mask data.
Limits
Only Hologres V1.1 and later support Data Security Guard. If the version of your Hologres instance is earlier than V1.1, manually upgrade your Hologres instance in the Hologres console or join a Hologres DingTalk group to apply for an instance upgrade. For more information about how to manually upgrade a Hologres instance, see Instance upgrades. For more information about how to join a Hologres DingTalk group, see Obtain online support for Hologres.
To identify sensitive data, Data Security Guard disables the data masking feature for Alibaba Cloud accounts.
Data Security Guard samples data at 09:00:00 every day. Then, Data Security Guard identifies sensitive data and configures data masking rules for columns that contain the identified sensitive data.
You can use Data Security Guard to mask data in Hologres in the following regions: China (Beijing), China (Zhangjiakou), China (Shanghai), China (Hangzhou), China (Shenzhen), China (Hong Kong), Singapore, Germany (Frankfurt), Malaysia (Kuala Lumpur), Indonesia (Jakarta), and US (Silicon Valley).
Procedure
Enable the data masking feature for a database.
By default, data masking is disabled. To enable this feature for a database, you must log on to the database as a superuser and execute the following statements:
-- Create the hg_anon extension function. CREATE EXTENSION IF NOT EXISTS hg_anon; -- Enable data masking for the specified database. By default, the value of the hg_anon_enable parameter is off. ALTER DATABASE <db_name> SET hg_anon_enable = on;
The db_name parameter specifies the name of the database for which you want to enable data masking.
ImportantThe
hg_anon_enable
parameter is a GUC parameter. After you execute the ALTER DATABASE statement, the new settings do not take effect for the current connection.You can execute the following statement to check whether the settings take effect:
SHOW hg_anon_enable;
Configure data masking rules for the database.
Log on to the DataWorks console and go to the Data Security Guard page. For more information, see the "Go to the Data Security Guard page" section in Overview.
In the left-side navigation pane, choose
to go to the Sensitive data identification page.On the Sensitive data identification page, create a sensitive field type and configure a sensitive data identification rule. For more information, see Identify sensitive data.
In the left-side navigation pane, choose Rule Change > Data Masking to go to the Data Masking page.
Select Hologres Config(hologres_display_desense_code) from the Desensitization scenario drop-down list and click Select desensitization database.
In the Authorize The Desensitization Of Account dialog box, select the database for which you want to configure data masking rules in the Not desensitized database list and click the rightwards arrow to add the database to the Desensitized database list. Then, select I agree to authorize the data protection umbrella to desensitize the database and click OK.
On the Data Masking page, click Create Masking Rule in the upper-right corner and configure related parameters to create data masking rules. For more information, see Create a data masking rule. Then, the system starts to mask data for the specified database.
FAQ
What do I do if data is not masked after the procedure described in the example is performed?
Problem description: Data is not masked after the procedure described in the example is performed.
Possible causes:
Data masking is disabled for specific users.
The label_name parameter is not configured.
Solution:
Execute the following statement to check whether the data masking feature is disabled for specific users:
SELECT usename, label FROM pg_shseclabel s INNER JOIN pg_catalog.pg_user u on s.objoid = u.usesysid;
By default, the query result is empty, which indicates that the data masking rule is enabled for all users. If the query result is not empty, the data masking rule is disabled for some users. In this case, you must enable data masking for the users. For more information, see Advanced settings in this topic.
Execute the following statement to check the label_name values:
SHOW hg_anon_labels;
If the query result does not contain values such as ip, execute the following statement to configure the label_name values:
ALTER DATABASE compress_test SET hg_anon_labels = '[ {"label":"ip", "method":{"desensType":"mask", "type":"ip"}}, {"label":"email", "method":{"desensType":"mask", "type":"email"}}, {"label":"name", "method":{"desensType":"mask", "type":"name"}}, {"label":"id", "method":{"desensType":"mask", "type":"id"}}, {"label":"phone", "method":{"desensType":"mask", "type":"phone"}}, {"label":"bank_id", "method":{"desensType":"mask", "type":"bank_id"}}, {"label":"hash", "method":{"desensType":"hash", "type":"md5", "salt":""}}, {"label":"first_mask", "method":{"desensType":"mask", "type":"user_define", "before":1, "after":0}} ]';
For more information about the rules that correspond to the label_name values, see the table in this topic.