The data traceability feature provided by DataWorks allows you to extract the watermark information of data in a leaked data file. This helps you trace users who caused data leaks. This topic describes how to create a data tracing task and use the task to trace users who caused data leaks.
Prerequisites
Sensitive data identification rules are created. For more information, see Configure sensitive data identification rules.
The data watermark feature is enabled for the desired sensitive data identification rule. For more information, see Create a data masking rule.
Background information
After you enable the data watermark feature for a sensitive data identification rule on the Data Masking Management page in Data Security Guard of DataWorks, the watermark information is automatically generated for the data that meets the conditions of the sensitive data identification rule when an operation, such as querying or downloading data, is performed on the data. Watermarks uniquely record user activities. If the data is leaked, you can use the data traceability feature to extract the watermark of the leaked data. Then, you can trace the users who may cause data leaks based on the watermark information.
Limits
DataWorks supports data tracing only for CSV files that are less than 200 MB in size.
You can trace only the operations that are performed after the data watermark feature is enabled.
NoteFor example, if you query Table A before the data watermark feature is enabled, you cannot trace the data query operation by using the data traceability feature.
Create and run a data tracing task
In the left-side navigation pane, click Data Traceability. The Data Traceability page appears.
Create a data tracing task.
Click Create Data Traceability Task.
In the Data Traceability Task dialog box, click Upload File to upload the file for which you want to trace leak sources.
NoteDataWorks supports data tracing only for CSV files that are less than 200 MB in size.
You can export or download a data file from DataWorks to your on-premises machine and upload the file to a data tracing task for tracing leak sources. You can also save data from an external system to a CSV file and upload the CSV file to a data tracing task for tracing leak sources.
After the file is uploaded, you can replace or download the file.
Click Start Traceability to start the data tracing task.
NoteWait until the tracing task is complete.
View possible leak sources
On the Data Traceability page, you can view the information about all tracing tasks, including the time when the tasks were run and the leaked data files. You can also click the View Details icon in the Actions column of a tracing task to view the possible leak sources.
You can sort all tracing tasks in chronological or reverse chronological order based on the Trace Date column.
You can search for a tracing task based on the name of a leaked data file. Fuzzy match is supported. After you enter a keyword in the search field and press the Enter key, DataWorks displays the tracing tasks whose names contain the keyword.
To view the details of a tracing task, find the tracing task and click the icon in the Actions column. You can identify the user who most likely leaked the data based on the analysis result of DataWorks, such as the probability, the operation time, and the command.
FAQ
No possible leak source is found after the tracing task is run. What do I do?
Cause 1: The data amount of the leaked data file is insufficient. As a result, the watermark information cannot be restored.
Solution: Make sure that the leaked data file contains sufficient data. This way, after the tracing task is run, the watermark generated by the data watermark feature can be reliably restored. Then, the users that may cause data leaks can be traced. We recommend that you upload a leaked data file that contains more than 500 unduplicated data records.
Cause 2: The leaked data does not belong to the current tenant.
Solution: Check the data source and make sure that the leaked data belongs to the current tenant.
Cause 3: The leaked data file does not contain watermark information.
Solution:
Check whether the data watermark feature is enabled for the file for which you want to trace leak sources. You can trace only the operations that are performed after the data watermark feature is enabled. For information about how to view and enable the data watermark feature, see Create a data masking rule.
The leaked data file contains no leaked data. The data may be leaked due to operations from external systems.