If a large number of physical tables exist in your compute engine and you want to use DataWorks Data Modeling to manage all your tables in a centralized manner, you can use the reverse modeling feature provided by DataWorks Data Modeling to perform reverse modeling on the physical tables. The reverse modeling feature helps simplify data modeling operations, create tables in an efficient manner, and reduce time costs. This topic describes how to perform reverse modeling.
Prerequisites
A large number of physical tables are created within your compute engine. A data source is created and is associated with the desired workspace. For more information about how to create a data source and associate the data source with a workspace, see Create and manage a data source.
The required data layers are created. The data layers are used to store tables. For more information, see Create a data layer.
A common layer is used to process and integrate common data that comes from a data import layer. You can create a unified metric dimension and construct reusable fact data and aggregate data that is used for analysis and statistical collection at a common layer. An application layer is used to reconstruct the data that is processed and integrated at a common layer based on your business requirements. You can store statistical data of specific application scenarios or specified products at an application layer.
Tables can belong to a common layer or an application layer. To create tables at a common layer or an application layer, the following preparations must be made:
Common layer:
A data domain is created. The data domain is used to determine the scope of business data that is collected in tables. For information about how to create a data domain, see Data domain.
A business process is created. The business process is used to determine the business activity whose data is collected and analyzed in tables. For information about how to create a business process, see Business process.
Application layer:
A data mart is created. The data mart is used to determine the category of business data that is collected in tables from specific application scenarios or specific products. For information about how to create a data mart, see Data mart.
A subject area is created. The subject area is used to determine the subject of business data that is collected in tables. For information about how to create a subject area, see Subject area.
Limits
You can perform reverse modeling only on tables within a MaxCompute or E-MapReduce (EMR) Hive compute engine in the production environment.
Reverse modeling process
You can perform reverse modeling on physical tables that exist in a compute engine. The system creates tables in DataWorks Dimensional Modeling based on the physical tables. The following figure shows the reverse modeling process.
Configure a reverse modeling policy.
Modeling scope: Confirm information about the physical tables on which you want to perform reverse modeling based on your business requirements.
The information includes the workspace and compute engine to which the physical tables belong and the rule based on which you want to match physical tables. The matching rules include fuzzy match and exact match.
Modeling rule: Specify the method used to standardize the names of tables that are created after reverse modeling and specify the data layer to which the tables belong.
You can use a checker or configure a custom naming convention to standardize the names of tables that are created after reverse modeling. This way, you can make sure that tables at the same data layer are named based on the same naming convention, and you can know the information about a table, such as the business category and data granularity, based on the table name. For information about how to configure and use a checker, see Configure and use a checker at a data layer.
Execution method: Specify whether to create all required tables or create only the tables that do not exist on the Dimensional Modeling page.
NoteThe reverse modeling operation is irreversible, and a reverse modeling policy cannot be modified after it is created and used. We recommend that you plan a reverse modeling policy in advance based on your business requirements.
For more information, see Configure a reverse modeling policy.
Parse physical tables and create tables on the Dimensional Modeling page based on the matched physical tables.
The system parses physical tables and creates tables on the Dimensional Modeling page based on the physical tables and the reverse modeling policy that you configure.
Confirm information about the matched physical tables.
The created tables may not meet your business requirements. You can modify the information about the matched physical tables to ensure that tables created based on the matched physical tables meet your business requirements. For example, you can change the data domain or business process for the matched physical tables. For more information, see Confirm information about the matched physical tables.
View reverse modeling results.
After reverse modeling is complete, you can view the types and numbers of the created tables. If specific tables fail to be created, you can check the error messages that are reported and troubleshoot issues at the earliest opportunity.
NoteThe created tables are automatically materialized to the related compute engine. You do not need to manually publish the tables.
You can manage the created tables on the Dimensional Modeling page. For more information, see Publish and manage a table.
Procedure
Go to the Reverse Modeling page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. Then, choose in the left-side navigation pane. On the page that appears, select the desired workspace from the drop-down list and click Go to Data Modeling.
In the top navigation bar of the Data Modeling page, click Dimensional Modeling to go to the Dimensional Modeling page.
In the left-side navigation pane of the Dimensional Modeling page, click Reverse Modeling to go to the Reverse Modeling page.
Start reverse modeling.
If this is the first time you use the reverse modeling feature, click Start Now on the Reverse Modeling page.
For subsequent use, click Create in the upper-right corner of the Modeling Tasks page.
Configure a reverse modeling policy.
NoteThe reverse modeling operation is irreversible, and a reverse modeling policy cannot be modified after it is created and used. We recommend that you plan a reverse modeling policy in advance based on your business requirements.
In the Create Reverse Policy step, configure the parameters.
Parameter
Description
Workspace
The DataWorks workspace to which the physical tables on which you want to perform reverse modeling belong.
NoteYou can select only a workspace to which the current logon account is added as a member. For more information, see Manage permissions on workspace-level services.
Compute Engine Type
The compute engine type of the compute engine within which the physical tables that will be used for reverse modeling exist. The compute engine is in the production environment.
Compute Engine Instance
The name of the compute engine within which the physical tables that will be used for reserve modeling exist.
Table Name Matching Rule
The rule based on which the system matches the names of physical tables in the selected compute engine. The system creates tables on the Dimensional Modeling page based on the matched physical tables. Valid values:
Fuzzy Match: If you select this matching rule and enter a keyword in the field, the system matches all physical tables whose names contain the keyword.
Exact Match: If you select this matching rule, you must enter the complete names of all physical tables on which you want to perform reverse modeling.
NoteIf you select Exact Match for the Table Name Matching Rule parameter and you want to enter multiple table names, you must separate the table names with semicolons (;) that are not followed by spaces.
If no physical tables are matched, reverse modeling fails, and no tables are created.
Data Layer of Model After Reverse Modeling
Common Layer: If you want to create fact tables, dimension tables, or aggregate tables, set the parameter to this value.
Application Layer: If you want to create application tables or dimension tables, set the parameter to this value.
Table Naming Rule
The naming convention based on which the system parses the matched physical tables and standardizes the names of tables to be created during reverse modeling. The system stores the created tables at appropriate data layers based on the parsing results.
Parsing rules
The system parses the physical tables matched by using the method specified by the Table Name Matching Rule parameter, and the number of underscores (_) contained in each table name.
A table name can contain a maximum of nine underscores (_) and can contain items such as the business process name, data domain name, and custom content between two underscores (_).
If the system identifies that the name of a physical table contains the name of a data warehousing level, the system creates a table with the same name and stores the table at the data warehousing level.
NoteIf the name of a physical table does not contain the name of a data warehousing level, such as the name of a data domain or business process, the data warehousing level of the physical table displayed in the Confirm Model Information step is empty. In this case, you can select a data warehousing level for the physical table in the Confirm Model Information step.
Parsing methods
Use a table name checker: You can select a created checker. This way, the system parses the names of physical tables based on the naming convention defined in the checker. For information about how to create and use a checker, see Configure and use a checker at a data layer.
Configure a custom table naming convention: If you set the Table Naming Rule parameter to Custom Rule, you can combine options such as Business Process, Data Domain, Business Category, and Custom into a table naming convention based on your business requirements. Then, the system parses the names of physical tables based on the naming convention.
Execution Method
The method used to create tables. Valid values:
Full Update: If you set the parameter to this value, the system creates tables based on all matched physical tables on the Dimensional Modeling page.
If you want the system to create tables based on all matched physical tables on the Dimensional Modeling page, select this method.
NoteIf you select Full Update for the Execution Method parameter and a table that meets the matching rule already exists on the Dimensional Modeling page, the system deletes the existing table and re-creates a table.
Incremental Update: If you set the parameter to this value, the system performs the following operations on the matched physical tables:
Identifies and retains the existing tables that meet the matching rule on the Dimensional Modeling page.
Creates only the tables that do not exist on the Dimensional Modeling page.
If some tables have been created on the Dimensional Modeling page and the tables remain unchanged, you can select this method.
Click Create Model. The system parses physical tables based on the reverse modeling policy.
Confirm information about the matched physical tables.
Confirm information about the matched physical tables.
The system creates tables based on the matched physical tables and the reverse modeling policy. You can modify information such as the table type, data layer, and data domain for the matched physical tables based on your business requirements. If you no longer need to perform reverse modeling on a physical table, you can remove the table from the Parsed Tables section in the Confirm Model Information step.
Click Generate Model to finalize the tables.
View reverse modeling results.
After reverse modeling is complete, you can view the numbers of different types of tables that are created and view the details of the tables that fail to be created in the Completed step. You can click Error Logs in the Actions column of a table that fails to be created to view error information and troubleshoot issues at the earliest opportunity.
NoteThe created tables are automatically materialized to the related compute engine. You do not need to manually publish the tables.
You can manage the created tables on the Dimensional Modeling page. For more information, see Publish and manage a table.
View reverse modeling tasks
You can view the details and operation logs of reverse modeling tasks on the Modeling Tasks page.
No. | Description |
1 | In the area marked with 1, you can specify the following filter conditions to search for the desired reverse modeling tasks: Task ID, Operator, and Operation Date. |
2 | In the area marked with 2, you can view the settings of the Table Name Matching Rule parameter and the reverse modeling results.
|
What to do next
After reverse modeling is complete, you can perform the following operations:
Go to the Dimensional Modeling page and manage the tables that are displayed in the left-side navigation tree. For information about how to publish and manage a table, see Publish and materialize a table.
Go to the DataStudio page and perform data development operations. For information about the features on the DataStudio page, see Features on the DataStudio page.