By Liu Tianyuan and Jia Xi, Product Managers of DataWorks
This article is a part of the One-stop Big Data Development and Governance DataWorks Use Collection.
The history of data usage can be divided into three phases: 1.0, 2.0, and 3.0. The 1.0 phase is characterized by a single data user. Therefore, data security protection is also single, using institutional norms and post-audit for data security management. At the 2.0 phase, data work began to be structured, forming multiple roles, such as data development, BI, and mining modeling. Data security protection adopts virtual machine connection. At the 3.0 phase, the amount of data becomes larger, and the data role becomes more complex. Discovering value from the data requires more participants, including operation, product, research, and development personnel. At this time, data security management is generally based on the classification and grading of data. On this basis, authority control, desensitization, encryption, auditing, and other access controls are carried out.
Enterprises are facing three main problems:
The data security governance system is carried out from three aspects: system, product, and operation. It requires the cooperation of the three parts, and none is indispensable.
What can and cannot be done is stipulated by the system. Then, the products are used to visualize these systems. Finally, the operation department should reward or punish and perform optimization according to these systems. This way, a closed loop of the data security governance system with system, product, and operation coordination can be formed.
Therefore, Alibaba Cloud DataWorks combines various engines to provide enterprises with overall out-of-the-box security capabilities. These capabilities include several important data security processes described in the Data Security Capability Maturity Model GB/T37988-2019 (DSMM): transmission, storage, processing, exchange, and general use.
From another perspective, the product security capabilities also cover a combination of pre-event, in-event, and post-event work cycles. It provides comprehensive data risk control capabilities for enterprises, including pre-event and in-event standardized development and production, data availability and invisibility, data risk behavior control, and post-event sensitive data control.
DataWorks' production and development isolation, RABC role permission system, and visualization data permission management capabilities combined with the engine's security features, such as fine-grained authorization, data encryption storage, and data backup, can solve the preceding pre-event security and border security problems, allowing enterprises to solve the first problem of security governance.
At the same time, you can also use quick diagnostics to understand the configuration items that may be risky in daily development work, such as publishing without testing, developing and publishing by yourself, do not control download permissions, and other risky behaviors.
Most of the requirements in DSMM are met using simple configurations.
DataWorks Data Security Guard provides automated and intelligent sensitive data protection in common scenarios. As shown in the figure below, an automated classification and grading management system is generated first based on the data itself and metadata, which automatically identifies and classifies the sensitive data of users. In this process, security personnel may need to make the configuration rules. After automated classification and grading, the final result will generate a classification and grading database, which can be corrected by business personnel and managed by security officers. Based on this database, data security control can be done at the top security control layer or in the whole process of data usage based on classification and grading. For example, data display scenarios need to be desensitized, data usage scenarios need to be controlled by permissions, data output scenarios need to be reviewed, and all scenarios need to be audited.
DataWorks data security capabilities mainly solve four questions:
The core advantage of DataWorks Data Security Guard is that it can provide a wide range of identification rules and configuration methods.
First, DataWorks Data Security Guard has built-in 50 types of personal sensitive information identification models, such as mobile phone numbers, ID numbers, and bank card numbers. Secondly, there is a customized recognition function. You can define regular expressions or train some recognition models. In addition, you can define metadata identification by yourself. There are some sensitive data types, and the characteristics are not clear, such as salary, which only contains some numbers. You can use some special naming conventions when creating tables for such data. You can also specify that a column of a table of a certain item is sensitive data of this type.
Then, you carry out a certain lineage spread according to these defined rules. A new table may not match the defined data rules, but its original table hits one of the sensitive data types. In this case, it can be spread to the new table.
Finally, operations can be performed based on these core advantages, classification, desensitization, and watermarking. You can also display the statistical results on the DataWorks Data Security Guard page so users can see these charts.
Data access in MaxCompute, E-MapReduce, and other engines is aggregated on the big data platform, DataWorks. In the scenarios of data query, migration, and download, you can flexibly configure data on the DataWorks Data Security Guard page. You can configure what desensitization to perform on what types of sensitive data and which scenarios to desensitize. DataWorks Data Security Guard offers covering, Hash, and pseudonyms for desensitization. Covering desensitization is mainly used in BI scenarios. BI staff needs to analyze the data. For example, the analysis shows it is a mobile phone number, and the middle four digits can be replaced by four asterisks. ETL scenarios may need to publish some production tasks and perform a Join operation. At this time, you do not need to know the data characteristics. This scenario is suitable for Hash desensitization. For example, you can desensitize the original mobile phone number into a string of Hash values. However, it is necessary to know the data characteristics for the algorithm model. This scenario is suitable for desensitization with pseudonyms. The original mobile phone number can be desensitized into another fake mobile phone number, but it still looks like a mobile phone number.
The common users can see who operated which part at what time, and all records can be viewed, but you do not know which operations are risky. DataWorks Data Security Guard can provide behavior detection. You can also customize risk rules. Some built-in expert models can determine normal operations and operations that may be problematic based on the characteristics, environment, history, and account of user operations.
Regarding big data engines, most data operations are completed in DataWorks, an overall big data development and governance platform. In various scenarios, whether it is downloading data, exporting data in some way, or querying the data and taking a photo, these situations will cause a data breach. DataWorks Data Security Guard will embed the data watermark and generate an operation database for each queried data no matter which way. When data is leaked, the user takes the leaked data and returns it to the DataWorks Data Security Guard by page to query the operation database. DataWorks Data Security Guard can help trace back who may have written what SQL leak at what time. This solution can trace the source after a data breach.
The preceding are the main functions of DataWorks Data Security Guard and how it combines with the system and operation of an enterprise to form an enterprise's data security best practice.
Some enterprises tend to manage one set of local accounts instead of managing sub-accounts on another set of cloud accounts.
DataWorks meets the requirements of allowing you to act as a RAM role by using a local account to log on to the Alibaba Cloud console in role-playing mode to use DataWorks. A RAM Role can be added to the DataWorks space as a member played by multiple people or one person. This way, enterprises can realize unified authentication management and achieve an autonomous and controllable system.
DataWorks allows you to define fine-grained data permission control processes and control processes for publishing data service APIs and exporting data synchronization tasks.
How can enterprise security managers formulate data security policies is the top priority of security planning. Many problems need to be carefully considered, such as What are the high risky behaviors? Who is likely to conduct high-risk behaviors? How can you avoid risky behaviors? Who can supervise the high-risk behaviors of involved personnel?
DataWorks predefines control solutions for security managers when facing several typical high-risk scenarios. In addition to naturally supporting data refinement (column level, Download/Update/Drop/Alter/Select/Desc) permission control capabilities, it allows managers to define different approval processes for data at different security levels. Besides, you can set different management and control processes for certain high-risk behaviors, such as data download, data export, and data service API publish. The preceding methods are used to enhance data security.
Case 1: Enterprise data is divided into C1, C2, and C3 sensitivity levels according to the risk level from high to low. When developers need to apply for access to C1 data, they can be defined as only table owner approval. When applying for access to C2, it can be defined as table owner and department head approval. When accessing C3, it can be defined as owner, department head, and CIO approval.
Case 2: If an enterprise requires strict approval to synchronize data out of a data warehouse, the administrator can customize the security policy of the Data Integration task source to target. Let's assume that once the rule of MaxCompute data source to MySQL data source is hit, a predefined approval process must be passed.
DataWorks supports the best practice-based productization capability of digital production that isolates production and development environments.
The figure below shows the development process in a standard mode. First, one DW space corresponds to two engine environments, one for development and one for production.
In the data modeling process, the administrator defines the data standards that may be used in the modeling process. Then, the modeler designs the model, submits the model, and publishes it to the production environment after being verified by the supervisor, O&M, or deployment personnel.
In the data development and production process, developers execute code development, dependency configuration and debugging in the development environment, and submit a publishing application after the smoke testing is done. In this case, an O&M, deployment, or administrator role should perform the code.
Diff Review: After confirming its correctness, users can execute and publish it to the production environment, allowing standardized and safe code to run regularly in the production environment to produce data.
How to Set up an Alibaba Cloud Private DNS Zone for Internal Name Resolution
1,042 posts | 256 followers
FollowAlibaba Cloud Community - March 29, 2022
Alibaba Cloud Community - March 21, 2022
Alibaba Cloud Community - March 29, 2022
Alibaba Cloud Community - March 21, 2022
Alibaba Cloud New Products - August 20, 2020
Alibaba Cloud Community - March 7, 2022
1,042 posts | 256 followers
FollowThis solution helps you easily build a robust data security framework to safeguard your data assets throughout the data security lifecycle with ensured confidentiality, integrity, and availability of your data.
Learn MoreAlibaba Cloud provides big data consulting services to help enterprises leverage advanced data technology.
Learn MoreProtect, backup, and restore your data assets on the cloud with Alibaba Cloud database services.
Learn MoreAlibaba Cloud experts provide retailers with a lightweight and customized big data consulting service to help you assess your big data maturity and plan your big data journey.
Learn MoreMore Posts by Alibaba Cloud Community