DataWorks provides workspaces in basic and standard modes. This allows you to develop data based on different security control requirements. This topic describes the differences between workspaces in basic mode and workspaces in standard mode from various aspects, including the differences in physical architectures and impacts on node development.
Background information
This topic consists of the sections that are described in the following table.
Section | Description |
Describes the physical architectures of workspaces in basic and standard modes. | |
Impacts of different workspace modes on development and O&M of nodes in the production environment | Describes mechanisms for node development and O&M based on the physical attributes of the workspace to which nodes belong. |
Describes the advantages and disadvantages of different workspace modes. | |
Diagram of impacts of workspaces in standard mode on usage processes | Describes the process control that is implemented based on collaboration among users that are assigned different roles in a workspace in standard mode. |
Describes the data sources that are used when you perform operations on different DataWorks service modules in workspaces in basic and standard modes. Workspaces in basic mode provide only the production environment. Workspaces in standard mode provide both the development and production environments. | |
How to isolate data between development and production environments for a workspace in basic mode | If you use a workspace in basic mode and want to isolate data between development and production environments, you can refer to this section. |
Precautions
Workspaces in different modes have different requirements for the addition of a data source. For example, for a workspace in standard mode, you must separately add data sources to the workspace in the development and production environments. This way, data can be physically isolated between the environments. For information about how to add data sources to a DataWorks workspace, see Add and manage data sources.
The characteristics of the data source that you add to a DataWorks workspace determine whether resources can be accessed across projects or databases. If you add different data sources to a workspace in the development and production environments, the characteristics of the data sources determine whether you can access objects, such as tables, resources, and functions in the production environment from the development environment.
By default, for a workspace in standard mode, nodes in the development environment are not periodically scheduled, and nodes are periodically scheduled only after they are deployed to the production environment.
Workspaces in basic and standard modes
The following table compares the physical architectures of workspaces in basic and standard modes from various aspects.
You can create a workspace in basic or standard mode based on your business requirements. We recommend that you create a workspace in standard mode to develop data because it can meet your different requirements. For example, if you use a workspace in standard mode, code, computing resources, permission management, and node deployment process control are isolated between the development and production environments.
If you use a workspace in basic mode and you want to retain the code in the current workspace, you can upgrade the mode of your workspace. For more information, see Scenario: Upgrade a workspace from the basic mode to the standard mode.
The following table describes the differences in the physical architecture between workspaces in basic and standard modes from various aspects.
Aspect | Basic mode | Standard mode (recommended) |
Number of added data sources | One DataWorks workspace corresponds to one data source. | One DataWorks workspace corresponds to two data sources. This way, the data sources are isolated between the development and production environments. Note You must separately add data sources to the workspace in the development and production environments to physically isolate data between the environments. |
DataWorks environment | One data source serves as the DataWorks production environment. | One of the data sources serves as the DataWorks development environment, and the other data source serves as the DataWorks production environment. Note You can add different data sources to a workspace in the development and production environments. Example:
|
Impacts of different workspace modes on development and O&M of nodes in the production environment
Item | Basic mode | Standard mode (recommended) |
Differences in development process control for nodes in the production environment | After you commit a node, the node enters the scheduling system. Then, the node is periodically scheduled to generate data. (Commit to the production environment) | You must first commit a node to the development environment and then deploy the node to the production environment for automatic scheduling. (Commit to the development environment and deploy to the production environment) Note For a workspace in standard mode, only nodes in the production environment can be automatically scheduled. |
Differences in O&M permission management for nodes in the production environment | Developers can directly modify code of nodes in the production environment. | Developers can only modify and commit node code on the DataStudio page, but cannot directly deploy node code to the production environment. You can deploy node code only after you are assigned the Project Owner, Workspace Manager, or O&M role.
|
Differences in permission management for data in the production environment | Developers can directly perform tests by using production data. This cannot ensure the security of production data. | Developers can perform tests by using test data in the development environment. Developers can also verify features by using production table data in the development environment after the developers obtain the required permissions or after their request to perform the operations is approved in Security Center. Note
|
Differences in data access identities | A unified identity is used to directly perform operations in the production environment. Access identities for data sources such as MaxCompute, Hologres, E-MapReduce (EMR), and Cloudera's Distribution including Apache Hadoop (CDH): Alibaba Cloud account, RAM user, RAM role (supported only by MaxCompute), and node owner. Note If you add a data source other than the preceding types of data sources, such as an AnalyticDB for MySQL or AnalyticDB for PostgreSQL data source, to a workspace in this mode, only the database account that you specified when you configure the data source can perform operations in a specific environment. The permissions of this account in the DataWorks workspace are the same as those in the AnalyticDB for MySQL or AnalyticDB for PostgreSQL database. |
Note MaxCompute, Hologres, EMR, and CDH
If you add a data source other than the preceding types of data sources, such as an AnalyticDB for MySQL or AnalyticDB for PostgreSQL data source, to a workspace in this mode, only the database account that you specified when you configure the data source can perform operations in a specific environment. The permissions of this account in the DataWorks workspace are the same as those in the AnalyticDB for MySQL or AnalyticDB for PostgreSQL database. |
Advantages and disadvantages of different workspace modes
Item | Basic mode | Standard mode |
Advantages | Workspaces in this mode are simple and easy to use. You need to only assign the Development role to development engineers to complete all data warehouse development operations. | Workspaces in this mode are secure and standardized.
|
Disadvantages | The risks of instability and low data security may arise in the production environment.
| The data development and production process is more complex. In most cases, the process involves more than one developer. |
Sample scenario: Impacts of workspaces in standard mode on usage processes
The development and production isolation feature of a workspace in standard mode affects processes such as data modeling design, data processing, and code deployment.
Appendix: Data sources that are used when you perform operations on different DataWorks service modules in workspaces in basic and standard modes
You can view the information about data sources that are associated with the DataStudio service of a workspace on the Data Source page in DataStudio. The following table describes the data sources that are used when you perform operations on different DataWorks service modules in workspaces in basic and standard modes.
Service module | Standard mode | Basic mode |
DataStudio | The data source such as an instance, project, or database in the development environment is used. | The data source such as an instance, project, or database in the production environment is used. |
Operation Center |
|
Appendix: How to isolate data between development and production environments for a workspace in basic mode
Requirement: You use a workspace in basic mode and want to isolate data between the development and production environments.
Solution: Prepare two workspaces in basic mode. Workspace 1 serves as the development environment and Workspace 2 serves as the production environment. Use the cross-workspace deployment method to deploy nodes in Workspace 1 to Workspace 2. This way, data can be isolated between the environments.
Disadvantages: You can directly modify production code in DataStudio in the workspace that serves as the production environment. This results in inconsistency of code update entries in the production environment and affects the entire development process.
Suggestion: We recommend that you upgrade your workspace from basic mode to standard mode for better control of the development process. For more information, see Scenario: Upgrade a workspace from the basic mode to the standard mode.