All Products
Search
Document Center

Data Security Center:Identification tasks

Last Updated:Aug 12, 2024

You can start an identification task to scan sensitive data in your assets and classify the data. Data Security Center (DSC) provides the default and custom identification task modes. Once authorized, DSC automatically creates a default data identification task for each database or bucket. You also can create a custom identification task. This topic describes how to view the default identification task, create a custom identification task, and revise and export sensitive data identification results.

Overview

Scan templates

A default identification task uses the main identification template, and a custom identification task uses the template specified in the task. By default, the two tasks also use a common identification template.

A common identification template is used to protect personal information security and privacy rights in accordance with the personal information security specification GB/T 35273-2020 issued by the National Standards Committee of China. The common identification template can help organizations implement personal information management and risk control in an effective manner.

Scan speeds

The following content describes the scan speeds of structured and unstructured data. The scan speeds are for reference only:

  • Structured data stored in ApsaraDB RDS for MySQL, ApsaraDB RDS for PostgreSQL, or PolarDB, or data stored in big data systems such as Tablestore or MaxCompute: Large databases that contain more than 1,000 tables are scanned at a rate of 1,000 columns per minute.

  • Unstructured data stored in Object Storage Service (OSS): It takes 6 hours to scan 1 TB of data on average.

Limits on scan for files or tables

To prevent excessively large files or tables in databases from compromising the overall scan progress, DSC imposes the following limits on the size of files or fields that can be scanned:

  • Structured data and data stored in big data systems: The first 200 rows of data in a table is sampled. Only the first 10 KB of data in each row of each field in the sampled data is scanned.

  • Unstructured data stored in OSS:

    • Files that exceed 200 MB in size are not scanned, and those that are not larger than 200 MB in size are fully scanned.

    • For compressed or archived files, only the first 1,000 subfiles are scanned.

Prerequisites

You have completed the authorization and granted DSC the identify permissions on your assets. For more information, see Authorize DSC to access databases.

Default identification tasks

Description

After the authorization is complete, DSC uses the main and common identification templates to create an identification task for each asset instance. The task is called a default identification task. The following table describes the information of a default identification task.

Item

Description

Identification template

An identification template includes the main and common identification templates.

You can configure the main identification template. You can specify a built-in industry template, such as the classification template for the Internet industry and the classification template for the Internet of Vehicles (IoV) industry, or a custom identification template as the main identification template.

Scan cycle (default)

  • After you connect to a database or an OSS bucket, the system automatically creates a default identification task.

    • If you click Connect on the Authorization Management page and select Immediately scan database assets and identify data., DSC immediately executes the default identification task.

    • If you click Connect on the Authorization Management page and do not select Immediately scan database assets and identify data., you must manually execute the default identification task. To execute the task, choose Data Insights > Tasks. On the Identification Tasks tab, click Default Tasks, find the task, and then click Rescan.

      Note

      Only DSC Enterprise Edition supports the rescan operation. DSC Basic Edition does not support the rescan operation.

  • After you connect to a database by using the account and password of the database, the system automatically creates a default identification task. In addition, the system executes the scan operation in the early morning every day from the next day.

The interval between two scans is at least 24 hours.

Scan scope

After you complete the authorization for all assets, all data in databases is scanned for the first time, and only incremental data in the databases is scanned for subsequent scans.

If you change the main identification template, the system does not immediately scan data. A new scan template is used only when the default identification task is executed next time.

Scan results

You can use one of the following methods to view scan results:

Supported operations

The following operations are displayed in the Actions column of a default identification task:

  • Rescan: If you upgrade the identification model, change the main identification template, or see updates in database data, perform the rescan operation to obtain scan results at your earliest opportunity.

  • Suspend: If an exception occurs in your databases, click Suspend in the Actions column corresponding to a default identification task in progress.

  • Terminate: If you perform the terminate operation, the system automatically terminates the execution of subsequent default identification tasks. Ongoing default identification tasks are not affected by terminate operations, but subsequent default identification tasks are not be executed.

  • Enable: If you perform the enable operation, terminated default identification tasks are re-enabled.

Note

Default identification tasks cannot be deleted.

View default identification tasks

  1. Log on to the DSC console.

  2. In the left-side navigation pane, choose Data Insights > Tasks.

  3. On the Identification Tasks tab of the Tasks page, click Default Tasks.

  4. On the Identify task monitoring page, view the default identification task list.

Change the scan settings of a default identification task

You can configure the periodic scan for a default identification task. We recommend that you set the scan cycle to a value that is approximately the same as the frequency of data updates in a database. This allows you to detect sensitive information in changed data. The minimum scan cycle is daily.

  1. Log on to the DSC console.

  2. In the left-side navigation pane, choose Data Insights > Tasks.

  3. On the Identification Tasks tab of the Tasks page, click Default Tasks.

  4. On the Identify task monitoring page, find the identification task for which you want to specify the scan cycle and then click Scan settings.

  5. In the Scan Settings dialog box, specify the scan cycle and scan start time and then click OK.

    Important
    • To minimize the impact of the scan operation on databases, we recommend that you set the scan start time to the off-peak period when data assets are called.

    • During the execution of an identification task, we recommend that you observe the database or business status, such as whether CPU utilization and memory usage have abnormal spikes. If an exception related to the task occurs, we recommend that you suspend or terminate the task. To stop the scan of the task, go to the Tasks page, find the task, and then click Suspend or Terminate in the Actions column.

Custom identification tasks

Create a custom identification task

If you create a custom identification task, the system automatically uses an enabled identification template to scan specified assets. To use an enabled identification template (not the main identification template) to scan a specified database, create a custom identification task. You can use only an enabled identification template to create a custom identification task. If you want to use a disabled identification template to create the task, enable the identification template before you use it. For more information, see Configure an identification template.

  1. Log on to the DSC console.

  2. In the left-side navigation pane, choose Data Insights > Tasks.

  3. On the Identification Tasks tab of the Tasks page, click Create.

  4. In the Create panel, configure the identification task configuration items and then click Next. After the configurations are complete, click OK.

    Category

    Parameter

    Description

    Basic Information

    Task Name

    Enter a task name.

    Scan Type

    Select a task start time. Valid values:

    • Immediate Scan: immediately scans data after you create the identification task.

    • Periodic Scan: periodically scans data after you create the identification task. You can select the scan frequency and scan period from the Scan Frequency and Scan Time drop-down lists. If you want to immediately scan data, select Scan Once Now.

      Note

      Scan Time is effective only for structured data.

    Scope

    Select the scan scope of the identification task. Valid values:

    • Global Scan: scans all assets that are authorized and can be connected within the current Alibaba Cloud account.

    • Data Domain: scans assets in a specified data domain.

    • Asset Type: scans the assets of one or more asset types.

    Identification Template

    Select an identification template used for the scan. Only enabled identification templates are allowed. You can select a maximum of two enabled identification templates. For more information about an identification template, see Configure an identification template.

    Identification Configuration of Structured Data

    Identification Scope of Structured Data

    Select the scan scope of structured data, such as data stored in ApsaraDB RDS or PolarDB. Valid values:

    • Global Scan: scans all structured data specified in Scope.

    • Specify Scan Scope: selects the instance and database that you want to scan. To add multiple instances to be scanned, click Add Identification Scope.

    Identification Scope of Unstructured Data

    Scan Scope

    Select the scan scope of data stored in OSS. Valid values:

    • Global Scan: scans all unstructured data assets specified in Scope.

    • Specify Scan Scope: selects the OSS bucket that you want to scan. You can select only assets specified in Scope. You can select multiple buckets.

      After you specify the bucket file that you want to scan, you can configure filter conditions for a more precise scan scope. You can configure inclusive or exclusive values, such as Prefix, Directory, or Suffix, to specify a more precise filter scan scope.

    Scan Depth

    Select the scan depth of data stored in OSS. Valid values:

    • Global Scan: scans all bucket paths.

    • Specify Scan Depth: scans only the specified bucket path. The path depth is separated by forward slashes (/). Valid values: 1 to 10. We recommend that you set the scan depth to an integer that is less than or equal to 10. For example, if the scan depth is set to 5, OSS bucket paths within five layers are scanned.

    Other Settings

    Tagging Result Overwriting

    Specify the method used to process revised sensitive data that is outdated. Valid values:

    • Skip Manual Tagging Result: retains original revised results. We recommend that you select this method.

    • Overwrite Manual Tagging Result: overwrites the original revised results with new identification results.

    Task notes

    Enter the task notes.

Rescan a custom identification task

If the identification model is upgraded, or database data is changed, you can perform the rescan operation to obtain scan results at your earliest opportunity. The rescan operation executes a full scan for the specified asset. After the rescan operation is performed, the full scan is immediately executed. We recommend that you set the scan start time to the off-peak period when data assets are called.

The rescan operation can be performed only when all identification templates of the custom identification task are enabled. Before you perform the rescan operation, make sure that related identification templates are enabled.

To perform the rescan operation, click Rescan in the Actions column corresponding to the task. You can view the scan progress in the Scan Status column corresponding to the task.

Revision tasks

You can create a revision task to revise sensitive data that is incorrectly tagged or has no tags. This helps enterprises manage and protect data in a more accurate manner. DSC can revise and restore sensitive data identification models. You can perform the following steps to create a revision task:

  1. Log on to the DSC console.

  2. In the left-side navigation pane, choose Data Insights > Tasks.

  3. On the Tasks page, click the Revision Tasks tab.

  4. In the left-side navigation pane, click the asset type that you want to revise.

  5. Click Revise or Restore in the Actions column. corresponding to sensitive data that you want to manage. Then, perform operations as promoted. Finally, click OK.

    After the restoration operation is performed, the identification model before the revision is restored.

Export sensitive data identification results

DSC displays the latest sensitive data that is detected by using the main and common identification templates on the Asset Insight or Data Directory page.

You can create an export task to export sensitive data that is detected by using the main or enabled identification templates. After you create the task, DSC obtains the detection results of the identification template specified in the task for your download. Before you create the task, make sure that the template whose detection results you want to export is scanned.

You can perform the following steps to create an export task and download export results:

  1. Log on to the DSC console.

  2. In the left-side navigation pane, choose Data Insights > Tasks.

  3. On the Tasks page, click the Export Tasks tab.

  4. On the Export Tasks tab, click Create.

  5. Configure an export task and then click OK.

    1. In the Basic Information section of the Create page, enter a task name and select an identification template.

      You can select only an enabled identification template.

    2. In the Export Dimension section of the Create page, select Asset Type or Asset Instance.

      • Asset Type: Select the asset type that you want to export.

      • Asset Instance: Select the instances that contain data to be exported.

    After you create the export task, you can view the status of the task in the export task list. A larger amount of data requires a longer export period.

  6. After the task is in the Completed state, click Download in the Actions column corresponding to the task.

    Important

    After the export is complete, download the exported data within three days. The task expires after three days. In this case, you cannot download the exported sensitive data.

References