View sensitive data identification tasks - Data Security Center

Data Security Center (DSC) provides robust data discovery and classification capabilities. By managing Identification Tasks, DSC scans your connected data assets to detect sensitive information. It then classifies and grades this data based on its sensitivity level and type. This visibility allows you to implement precise access controls and improve your overall data security posture. This topic describes how to configure and manage identification tasks to identify sensitive data.

Prerequisites

You must authorize DSC to access the data assets you intend to scan. For more information, see asset authorization.

Identification task overview

An identification task scans data in connected assets using specific identification rules (defined in Identification Templates). It generates scan results and automatically tags identified sensitive data. For details on templates, see View and configure identification templates.

Task types

DSC supports two types of identification tasks: default tasks and custom identification tasks.

Default tasks

Once you authorize an asset, DSC automatically creates a scan task for that asset instance using the main identification templates. These are referred to as default tasks.

The following table describes the logic and behavior of default tasks.

Setting	Description
Identification template	Default tasks always use the main identification template configured for DSC. This cannot be changed for individual tasks. Main identification template: The global main template configured in DSC. It can be either a built‑in industry template (for example, Internet Industry Classification, Vehicle Internet Classification) or a custom template. Common identification template: When the main identification template is a built‑in industry template, DSC also applies the common identification template, which is based on the Personal Information Security Specification (GB/T 35273‑2020) standard.
Scan trigger and cycle (default)	One-click connection (Databases/OSS/SLS): If you select Scan assets and identify sensitive data now. during connection, the default task runs immediately. If you did not select Scan assets and identify sensitive data now. during connection, you must manually trigger the scan. Go to Classification and Grading > Tasks > Identification Tasks, select the Default Tasks list and click Rescan. Account/Password Connection (Databases): When you connect a database with an account and password, the system creates a default task upon connection. Periodic scans then run automatically every day starting from the next day (usually during early morning hours). Note: The minimum interval between two scans is 24 hours.
Scan scope	For all authorized assets, the scan scope is applied as follows: Databases and OSS: Performs a full scan of all data initially, and scans only incremental data in subsequent runs. SLS: Each scan covers data stored between 00:00 and 24:00 on the previous day relative to the scan execution time. To scan specific historical data in SLS, you must create a custom identification task. Changes to the main identification template do not trigger an immediate scan. The new rules apply only to the next scheduled run.

Custom identification tasks

Create a custom identification task when you need to scan specific data assets by using one or more enabled identification templates. If a template is currently disabled, you must enable it before it can be selected for a custom task. For more information, see Enable an identification template.

Scan mechanism and limits

Limits

To balance detection coverage with system performance, DSC applies specific sampling rules and limits.

Structured data/Big data (RDS, PolarDB, Tablestore, MaxCompute): By default, the first 200 rows of a table are sampled. You can manually increase this limit to a maximum of 1,000 rows. Within the sampled rows, only the first 10 KB of data per field is scanned.
Unstructured data (OSS or Simple Log Service):
- Files larger than 200 MB are skipped. Files 200 MB or smaller are scanned.
- Data stored in OSS:
  - You can manually set the maximum file size for scanning, up to 1,000 MB per file.
  - For compressed or archived files, only the first 1,000 child files are scanned.
  - When scanning a single OSS bucket, a maximum of 4 objects are scanned concurrently.
  - QPS limit: A single scan task is limited to 100 API requests per second against the OSS bucket.
  - Bandwidth limit: A single scan task is limited to 200 MB/s of internal outbound bandwidth from the OSS bucket.
- DSC supports scanning for over 800 OSS file types, including text, office documents, images, design files, code, binaries, archives, applications, audio, video, and chemical structure files. For more information, see OSS file types that can be identified.

For comprehensive details on identification task limits, see Limits.

Scanned objects

Database assets: <Instance>/<Database>/<Table>. Each table represents a single data object.
Big data: <Instance>/<Table>. Each table represents a single data object.
OSS assets: <Bucket>/<File>. Each file represents a single data object.
Simple Log Service assets: <Project>/<Logstore>/<Time Segment>. Data is processed in 5-minute segments. Each 5-minute segment represents a single data object.

Scan speed

The following estimates are provided for reference only. Actual scan speeds may vary based on system load and data complexity.

Structured data (RDS, PolarDB) and big data (Tablestore, MaxCompute): For large databases containing over 1,000 tables, the scan rate is approximately 1,000 columns per minute (based on a sample size of 200 rows per column).
Unstructured data (OSS, SLS): Scanning 1 TB of data typically takes between 6 to 48 hours, averaging 24 hours. The actual duration depends on the distribution of file types within the dataset.

Scan logic

Task type

First scan

Subsequent scans

Default task

Performs a full scan of all authorized data in the asset.

Scans new or modified data objects.

You can trigger scans manually or configure an automatic scan cycle.

Custom identification task

Scans data according to the specified identification scope.

Scans only new or modified data objects within the specified scope.

Data objects that have not changed since the last scan are skipped.

Scan results

Sensitivity levels are determined by the identification rules matched in the task's template. If a data object matches multiple rules, the highest sensitivity level takes precedence. DSC classifies sensitive data on a scale from S1 to S10, where a higher number indicates a higher level of sensitivity. A result of N/A indicates that no sensitive data was detected.

The valid range of sensitivity levels depends on the configuration of the associated identification template. For more information, see Set the sensitivity level for an identification template.

Recommendations

Recommendation	Description
Confirm scan scope and priority	Prioritize high-risk assets. If you cannot immediately scan all data due to volume, evaluate your assets first. Prioritize scanning data that has a higher risk profile, such as assets that are frequently accessed, modified, or subject to unknown operations.
Specify scope of the first scan	Perform a targeted pilot scan. Limit your initial scan to a specific database or OSS bucket. This allows you to validate and tune your identification rules before a full-scale rollout. Optimize identification rules. Do not enable all identification features indiscriminately. Generic rules (e.g., Date, Time, URL) can generate excessive false positives in large datasets. Enable only the specific rules relevant to your business context. Ensure sufficient sampling. For structured data, ensure the sample size is large enough to capture representative data; otherwise, sensitive information may be missed.
Specify a task start time	Align schedules with data updates. Configure tasks to run automatically (daily, weekly, or monthly) based on your data update frequency. Regular scanning ensures timely detection of new sensitive data and helps identify trends or anomalies in your security posture. We recommend scheduling scans during off-peak hours to minimize performance impact.

Manage default identification tasks

View default tasks

Log on to the Data Security Center console.
In the navigation pane on the left, select Classification and Grading > Tasks.
On the Tasks page, click the Identification Tasks tab, and then click Default Tasks.
On the Discovery Task Monitoring page, view the list of default tasks.
You can perform the following operations on a default identification task:
- Rescan: Triggers an immediate scan to update results. Use this if you have updated the main identification template, upgraded the identification model, or if significant data changes occurred.
- Pause: Temporarily halts a running default task (e.g., if you detect database performance issues).
- Terminate: Stops the execution of the current task and prevents the default task from running in future cycles.
- Enable: Re-activate a terminated task.
Note
Default tasks cannot be deleted.

Configure scan settings

You can customize the schedule for default tasks. We recommend aligning the scan cycle with your data update frequency (minimum: Daily).

On the Discovery Task Monitoring page, select the check box of the task for which you want to configure the scan cycle, and then click Scan Settings above the task list.
In the Scan Settings dialog box, configure the scan cycle and automatic scan start time, and then click OK.
Important
- Set the start time to off-peak hours to minimize database impact.
- Monitor CPU/Memory usage during scans. If abnormalities occur, pause or terminate the task immediately.

Create custom identification tasks

Create a custom task when you need to scan specific assets using a specific (non-main) identification template, or to scan historical SLS data.

Important

The system supports a maximum of 5 active identification tasks (tasks with a periodic schedule). Once you reach this limit, you cannot create additional periodic tasks.

Create a custom identification task

In the navigation pane on the left, select Classification and Grading > Tasks.
On the Identification Tasks tab, select the Asset Type for which you want to create an identification task, and then click Create.

In the Create panel, configure the parameters and click OK.

Category	Parameter	Description
Basic Information	Asset Type	Displays the asset type selected in the previous step. It cannot be modified.
	Task Name	Enter a task name.
	Task notes	Enter the task notes.
	Task and Plan	Select a task start time. Valid values: Immediate Scan: Runs the task immediately upon creation. Periodic Scan: Runs the task on a scheduled frequency. You must configure the Scan Frequency and Scan Time (Structured Data Only). To trigger an immediate run alongside the schedule, select Scan Once Now. Note The Scan Time setting applies only to structured data assets. Unstructured data scans run according to system resource availability.
	Identification Template	Select up to two enabled identification templates to apply to this scan. For For details on enabling templates, see Use identification templates.
Identification Scope	Identification Scope of Structured Data	Select the scope for structured assets (e.g., RDS, PolarDB): Global Scan: Scans all authorized structured assets. Specify Scan Scope: Allows granular selection of specific Instances and Databases. Instance Name and Database name: To add multiple instances, click Add Identification Scope. Scan Limit: Defines the number of rows sampled per table. Defaults to the first 200 rows. Maximum value is 1,000 rows.
	Unstructured Data OSS Identification Scope	Select Object, Sampling Method, Scan Depth, and Scan Limit for unstructured data (OSS). Valid values of Object: Global Scan: Scans all authorized OSS buckets. Specify Scan Scope: Select specific buckets. You can apply filters (Prefix, Directory, Suffix) to include or exclude specific files. After you specify the objects to scan, you can configure filters for fine‑grained scanning. You can set Prefix, Directory, and Suffix to include or exclude specific values to filter the scan scope. Sampling Method: Retrieves data from OSS by using the `ListObjects` API and scans data based on the configuration. Global Scan: Scans all data. Custom Depth: Sampling Rate. Scans data based on the sampling ratio. Note For example, if you set Sampling Rate to 1/10, the system scans the 1st file, skips 9, and scans the 11th. Valid values of Scan Depth: Global Scan: Scans the full directory path. Specify Scan Scope: Limits the directory depth (Levels 1–10). For example, "5" scans only the top 5 directory levels. Scan Limit: Limits the scanned size per file. Default: 200 MB. Max: 1,000 MB. Excess data beyond the limit is skipped. Synchronize All Identification Results to SLS: Select this option to send full logs to Simple Log Service.
	Unstructured Data SLS Identification Scope	Set Asset Scope and Time Range for SLS. Valid values of Asset Scope: Global Scan: Scans all authorized SLS Projects. Specify Scan Scope: Select specific Projects and Logstores. Valid values of Time Range: Last 15 Minutes, Last 1 hour, Yesterday, Last 1 Day, Last 7 Days, or Last 30 Days. Custom: The unit of the time range is minute, and the step size is 5 minutes.
Other Settings	Tagging Result Overwriting	Determines how to handle conflicts with previously manually corrected data: Skip Manual Tagging Result: Preserves your manual corrections. The system will not overwrite them. Overwrite Manual Tagging Result: Replaces manual corrections with the new system scan results.

Modify or delete a custom identification task

Edit: Reconfigures the custom identification task. You can modify all parameters.
> Delete: Deletes redundant custom identification tasks.

Manage task operations

Rescan a task

If the identification model is upgraded or your database data has changed, you can perform a rescan to update the results immediately. A Rescan triggers an immediate full scan of the specified asset. Perform this operation during off-peak hours to minimize the impact on system performance.

Before performing a Rescan, ensure that the relevant identification templates are enabled.

Note

The Rescan operation is not supported for custom tasks with the Scan Type set to Immediate Scan.

On the Identification Tasks tab, perform the rescan operation:
- Rescan a custom identification task: In the task list, click Rescan in the Actions column of the custom identification task.
- Rescan a default task: Click Default Tasks, find the target asset, and then click Rescan in the Actions column.
You can view the scan progress in the Scan Status column of the identification task.

Pause or Terminate a task

Pause: Temporarily halts a running task. Useful during service anomalies. Click Pause in the Actions column of the custom identification task.
Terminate: Stops the execution of the current and subsequent identification tasks (custom identification tasks and default tasks).

Correct identification results

If DSC incorrectly identifies data (false positive) or misses data (false negative), you can manually correct the results. This "manual correction" teaches the system to be more accurate.

On the Tasks page, click the Revision Tasks tab.
In the navigation pane on the left, click the asset type that you want to manage.
Click Revision or Resume in the Actions column of the target sensitive data. Follow the instructions on the page to modify Revised Model, and then click OK.
After you perform the restoration operation, the previous identification model is restored.

View and export results

The latest scan results generated by the main identification template are visible on the Data Classification > Asset Insight page. For more information, see View sensitive data identification results.

Create an export task to download sensitive data identification results detected. Specify the target identification template and data assets to generate and download a report.

Important

You can only export results for assets and templates that have completed a successful identification task.

Create an export task

Follow this prodecure to create an export task and download the results:

On the Tasks page, click the Export Tasks tab.
On the Export Tasks tab, click Create.
Configure the export task and click OK.
1. In the Basic Information section, enter a task name and select an identification template that is used by the identification task.
  You can select only an enabled identification template.
2. In the Export Dimension section, select Asset Type or Asset Instance.
  - Asset Type: Select the asset types that you want to export.
  - Asset Instance: Select the asset instances that you want to export.
After you create the export task, its status is displayed in the export task list. A larger amount of data requires a longer export period.

Download exported results

After the Export Status becomes Finished, click Download in the Actions column of the target export task.

Important

After the export is complete, download the exported data within three days. After three days, the task expires and you can no longer download the exported sensitive data.

References

View and configure identification templates - Describes identification templates used in identification tasks and the types of sensitive data that can be identified.
Supported data asset types - Lists the types of data assets from which DSC can identify sensitive data.
Data scanning and identification - Provides common issues and troubleshooting guidance for identification tasks.

Data Security Center:Use identification tasks to identify sensitive data

Prerequisites

Identification task overview

Task types

Default tasks

Custom identification tasks

Scan mechanism and limits

Limits

Scanned objects

Scan speed

Scan logic

Scan results

Recommendations

Manage default identification tasks

View default tasks

Configure scan settings

Create custom identification tasks

Create a custom identification task

Modify or delete a custom identification task

Manage task operations

Rescan a task

Pause or Terminate a task

Correct identification results

View and export results

Create an export task

Download exported results

References

FAQ