Data Security Center (DSC) allows you to collect and analyze data asset information, and identify, classify, and add tags to sensitive data in the cloud. This topic describes how to identify, classify, and add tags to sensitive data in the DSC console in an efficient manner.
Prerequisites
A database that you want to connect to DSC is available. For more information about the database types and regions supported by DSC, see Supported database types and Supported regions.
In this example, the tables of an ApsaraDB RDS for SQL Server instance in the China (Zhangjiakou) region are used. For more information about how to create an ApsaraDB RDS for SQL Server instance, a database management account, and a database, see Create an ApsaraDB RDS for SQL Server instance and Create accounts and databases.
If you use a Resource Access Management (RAM) user to purchase and use DSC, the RAM user is granted the permissions to access DSC. For more information, see Authorize a RAM user to access DSC.
Step 1: Purchase DSC and complete authorization
DSC provides the free edition. The free edition provides resources of fixed specifications on a monthly basis. For more information, see DSC Free Edition. In this example, the free edition of DSC is used.
If the current account does not qualify for the activation of the free edition of DSC, you can purchase a paid edition of DSC. For more information, see Purchase DSC.
Log on to the DSC console. Click Activate Free Edition.
Authorize DSC to access other cloud resources as prompted. For more information, see Authorize DSC to access Alibaba Cloud resources.
Step 2: Connect a database to DSC
You can identify, classify, and add tags to sensitive data only after you connect your data assets to DSC. In this example, an ApsaraDB RDS database is used.
On the Authorization Management page, click Asset Authorization Management.
In the Asset Authorization Management panel, click RDS in the Unstructured Data section and then click Asset synchronization.
If the ApsaraDB RDS database that you want to connect to DSC is in the asset list, skip this step.
On the Not authorized tab, find the ApsaraDB RDS database that you want to manage and click Authorization in the Actions column.
Go back to the Authorization Management page, find the ApsaraDB RDS database, and then click Connect in the Actions column.
NoteIf you click Connect for a database on the Authorization Management tab, DSC creates a read-only account for the database and uses the account to connect to the database to run data identification tasks. In this case, DSC has the read-only permissions on the database.
In the Connect dialog box, select Scan assets and identify sensitive data now. and click OK. DSC creates and immediately runs the default data identification task.
ImportantWe recommend that you immediately scan data during off-peak hours and monitor your workloads to prevent data identification tasks from affecting the workloads.
Go back to the Authorization Management page, click the icon, wait until data is updated, and then check whether the connection status and feature status of the database are normal. The following figure shows the normal connection status and feature status.
Step 3: View the status of a data identification task
If you click Connect on the Authorization Management page and select Scan assets and identify sensitive data now., DSC creates and immediately runs the default data identification task. DSC automatically uses the main identification template and the common identification template to scan the connected database. By default, the main identification template is the classification template for the Internet industry. You can view the identification results only after the data identification task is complete.
On the Identification Tasks tab of the Tasks page, click Default Tasks.
On the Identify task monitoring page, view the scan status of the default data identification task that is created for the connected database.
The amount of time required for a data identification task to complete varies based on the amount of data that needs to be scanned. An extended period of time is required to scan large amounts of data.
You can view the identification results only when Scan Status is Complete.
Step 4: View identification results
On the Asset Type tab of the Asset Insight page, find the database instance and the database that is scanned. DSC returns the identification results shown in the following figure. The results include the sensitivity level and data tag.
A darker color indicates a higher sensitivity level. N/A indicates that no sensitive data is identified. You can add only Personal information () and Personal sensitive information () tags.
Find the database and click Table details in the Actions column. In the panel that appears, view the statistics and sensitive columns of identified tables.
Summary
You can identify, classify, and add tags to the sensitive data of authorized data assets based on identification results. The tags are Personal information, Personal sensitive information, and General information.
Access data assets
DSC supports the following data assets: ApsaraDB RDS, PolarDB, PolarDB for Xscale (PolarDB-X), PolarDB-X 2.0, ApsaraDB for Redis, ApsaraDB for MongoDB, ApsaraDB for OceanBase, Tablestore, AnalyticDB for MySQL, AnalyticDB for PostgreSQL, Object Storage Service (OSS), MaxCompute, and self-managed databases hosted on Elastic Compute Service (ECS) instances. For more information, see Asset authorization.
Flexible selection of data identification templates
After you authorize DSC to access data assets, DSC automatically uses the main identification template and the common identification template to scan the connected data assets. By default, the main template is the classification template for the Internet industry.
The classification template for the Internet industry is a built-in template that DSC uses to identify sensitive data. You can change the main template to another built-in identification template or a custom identification template on the Identification Configuration page. You can configure custom identification models and custom features for a custom identification template. For more information, see Configure identification templates.
Custom data identification tasks
DSC allows you to enable the main identification template and two other identification templates to identify, classify, and add tags to sensitive data. The default data identification task uses the main identification template. You can create a data identification task on the Tasks page and select an enabled non-main template to scan specific data assets. For more information, see Identification tasks.