FAQ about sensitive data scan and identification - Data Security Center

This topic provides answers to some frequently asked questions about sensitive data scan and identification.

Does data scan affect the performance of my database?

Data Security Center (DSC) supports full scans, incremental scans, and scheduled scans. A full can has minimal impacts on the performance of your database and does not affect your database workloads. An incremental scan focuses only on modified data files and has negligible impacts on database performance.

DSC scans full data only if you complete asset authorization and manually initiate a full scan or when the scheduled time for a full scan arrives. DSC scans incremental data only if files or tables in your database are modified. To reduce the impacts of data scans on your database performance, take note of the following suggestions when you specify the scan cycle:

Extend the full scan cycle in DSC.
Set the scan time to an off-peak hour.

What types of data assets can DSC scan?

DSC can scan structured data and unstructured data in data assets of the following types:

Structured data: ApsaraDB RDS, PolarDB, PolarDB for Xscale (PolarDB-X), PolarDB-X 2.0, ApsaraDB for Redis, ApsaraDB for MongoDB, ApsaraDB for OceanBase, and self-managed databases.
Unstructured data: Object Storage Service (OSS) and Simple Log Service.
Big data: Tablestore, MaxCompute, AnalyticDB for MySQL, and AnalyticDB for PostgreSQL

For more information, see Supported data asset types.

How long does it take to complete a scan after I authorize DSC to access a data asset?

DSC starts to scan data within 2 hours after you authorize DSC to access a data asset. The required amount of time varies based on the total size of the data. If a data asset contains a large number of tables, such as more than 10,000 tables, or the total size of OSS objects is extremely large, such as more than 1 petabytes, the required amount of time increases. When DSC scans data, the scan results are updated on the Workbench page in the DSC console. For more information, see View information on the Workbench page.

How does DSC scan unstructured data in OSS and Simple Log Service?

DSC scans unstructured data and identities sensitive data based on the scan results.

Asset type

Scan scope

Scanned data object

OSS

First scan: After you authorize DSC to scan an OSS bucket, DSC scans all objects in the bucket.
Incremental scan: If you add objects to or modify existing objects in an OSS bucket, DSC scans the added or modified objects.

<OSS bucket>/<Object name>.

Each object is used as a data object.

Simple Log Service

During each scan, all data in authorized data assets that is stored between 00:00 and 24:00 on the previous day is scanned based on the time when the scan is performed.

If you want to scan more data, you can create a custom identification task and specify the scan scope. For more information, see Create a custom identification task.

<Simple Log Service project>/<Logstore>/<Time interval>.

Each 5-minute period is considered a time interval. The data stored in each time interval is used as a data object.

What are the billing rules for DSC to scan unstructured data in OSS and Simple Log Service?

DSC uses the subscription billing method. Data scan and identification consume the purchased resource specifications. The deduction rules vary based on the purchased edition.

Enterprise Edition: You are charged for basic features based on the amount of data that you want to protect. The storage protection capacity is deducted based on the sizes of authorized OSS buckets and 50% of the sizes of authorized Simple Log Service projects.
Value-added Plan: DSC does not support using the data scanning and identification functions.

For more information, see Billing overview.

Can DSC re-scan a recently scanned OSS object?

Yes, DSC can re-scan the OSS object. If the object is not modified, DSC does not re-scan the object. If the object is modified, DSC re-scans the object within 24 hours after the modification.

You can manually re-scan OSS objects based on your business requirements. For more information, see Identification tasks.

How does DSC scan structured data in a data asset, such as a MaxCompute project?

DSC scans the names and values of fields in databases or projects and identities sensitive data based on the scan results. For example, DSC scans the name and values of the age field. If DSC cannot determine whether a field value is sensitive based on the values of the field, DSC also checks the name of the field.

First scan: After you authorize DSC to access a database or project, DSC scans all tables in the database or project.
Incremental scan: If you add tables to the database or project, DSC scans the added tables. If you modify the schema of an existing table, DSC re-scans the table.

Does DSC log on to a data asset to obtain data?

If DSC is authorized to access a data asset, DSC logs on to the data asset and performs data sampling to identify sensitive data. DSC does not save data from the data asset.

What scenarios require a re-scan task?

The following table describes the scenarios in which DSC automatically re-scans data in an authorized data asset.

Scenario	Scan logic	Impact on billing
The first time you authorize DSC to access a data asset.	DSC scans all data in the data asset.	You are charged for the full scan in the data asset.
You modify the data in a data asset after you authorize DSC to access and scan the data asset.	If you add columns to or remove columns from a MaxCompute or database table, DSC automatically re-scans the table. If you add rows to or remove rows from a table, DSC does not automatically re-scan the table.	You are charged for the full scan in the data asset.
	If you add objects to or modify existing objects in an OSS bucket, DSC automatically re-scans the added or modified objects. Note If you remove objects from an OSS bucket, DSC does not automatically re-scan the bucket.	You are charged for scanning the added or modified objects.
You modify sensitive data identification rules. For example, you create, delete, enable, or disable a rule.	DSC automatically re-scans all data in all authorized data assets.	You are charged for the full scan in all authorized data assets.

Can DSC identify authorized data assets that are encrypted?

If transparent data encryption is enabled for a data asset, the data asset can be identified.

Does a full table scan in ApsaraDB for MongoDB significantly affect I/O operations and online services?

A full table scan has minimal impacts on the performance of your database and does not affect your database workloads.

To reduce the impacts of asset scans on your database performance, you can extend the full scan cycle or set the scan time to an off-peak hour.

Can DSC identify sensitive data in compressed packages and text files in OSS?

Yes, DSC can identify sensitive data in compressed packages and text files in OSS. You can view the OSS file types that can be identified by DSC on the File Type tab of the Identification Configuration page.

Does DSC support exporting sensitive data identification results?

Yes, DSC allows you to export sensitive data identification results. For more information, see Identify sensitive data by using identification tasks.

Can the scan results of ApsaraDB for MongoDB be accurate to specific fields?

No, the scan results cannot be accurate to specific fields. ApsaraDB for MongoDB is a database that is based on distributed file storage. The minimum storage unit is a document.

Can I call API operations to query sensitive data such as instance names, database names, table names, column names, and risk levels?

Yes, you can call API operations to query sensitive data.

Operation	Description
DescribeOssObjects: queries OSS objects.	You can obtain the ID of the instance to which the object belongs (InstanceId), the bucket name (BucketName), the object ID (FileId), and the risk level ID (RiskLevelId).
DescribeInstances: queries data asset instances.	You can obtain the ID (Id) and name (Name) of the data asset.
DescribeTables: queries tables.	You can obtain information (Items) about tables, including the table name (Name) and risk level (RiskLevelId).

Does ApsaraDB for Redis support sensitive data identification?

No, ApsaraDB for Redis does not support sensitive data identification. DSC provides only the baseline check feature for ApsaraDB for Redis. For more information, see Security baseline check.