You can create and manage MaxCompute tables by executing SQL statements in the MaxCompute client or by performing operations in the DataWorks console. It is more convenient to create and manage MaxCompute tables in the DataWorks console. This topic describes how to create and manage MaxCompute tables in the DataWorks console.
Prerequisites
A MaxCompute data source is added to DataWorks and is associated with DataStudio. For more information, see Add a MaxCompute data source and Preparations before data development: Associate a data source or a cluster with DataStudio.
Background information
MaxCompute table operation principles
When you create MaxCompute tables or perform operations on MaxCompute tables in DataWorks, you must comply with the basic requirements for operations performed on MaxCompute tables. For example, you cannot delete fields in a MaxCompute table after you create the table. For more information, see Limits on MaxCompute table operations.
Changing of physical properties of MaxCompute tables
You can change the physical properties of MaxCompute tables by executing MaxCompute SQL statements. For more information, see Table operations. Permission management is implemented when you access MaxCompute tables from DataWorks. If you access MaxCompute tables from DataWorks as a RAM user, an error may be reported in some scenarios. For more information, see Manage permissions on data in a MaxCompute compute engine instance.
Metadata update latency
Latency may exist when you perform operations on metadata. After you create or update tables in DataWorks, you may not find the tables on the UI. After you remove tables from DataWorks, you may still find the tables on the UI. In these cases, you can use a manual synchronization tool in Data Map to manually synchronize the operation results. For more information, see Manual synchronization tool.
Go to the DataStudio page
Log on to the DataWorks console. In the top navigation bar, select the desired region. Then, choose in the left-side navigation pane. On the page that appears, select the desired workspace from the drop-down list and click Go to DataStudio.
Create a MaxCompute table
You can create MaxCompute tables in the Workspace Tables pane of the DataStudio page in the DataWorks console. You can also create MaxCompute tables in a specific workflow in DataStudio. DataWorks allows you to separately create tables in the development environment and in the production environment by executing DDL statements or by using the codeless UI.
Entry points of creating tables
Create a MaxCompute table in the Workspace Tables pane
In the left-side navigation pane of the DataStudio page, click Workspace Tables. In the Workspace Tables pane, click the icon to create a MaxCompute table. For more information about how to manage tables, see Manage tables. If the Workspace Tables module is not displayed in the left-side navigation pane, you can add the module. For more information about how to add a module to the DataStudio page, see Scenario: adjust the displayed DataStudio modules.
Create a MaxCompute table in a specific workflow in DataStudio
DataWorks organizes and manages code by workflow. You can create tables related to the business in a specific workflow. If no workflow is available, you can create one. For information about how to create a workflow, see Create a workflow. If workflows are available, right-click the name of the desired workflow and choose . In the dialog box that appears, configure the required information and click Create.
Configure the basic information about the MaxCompute table
After you perform the preceding operations, you can configure the information about the table on the table configuration tab. On the tab, you can configure the information about the table by executing a DDL statement or by using the codeless UI in DataWorks.
Configure the table by executing a DDL statement
You can click the icon on the table configuration tab and execute a standard table creation statement of MaxCompute to generate the table schema. For information about the standard table creation statements of MaxCompute, see Table operations. After the table schema is generated, DataWorks automatically fills out the configurations for the table on the table configuration tab. You can use this method if you are accustomed to writing code to develop tables.
NoteThe name of the table is configured before you access the table configuration tab. You cannot change the table name in the DDL statement. If you change the table name in the DDL statement, an error is reported.
You can define only the physical properties of the table by executing the DDL statement. After the table is created and the configurations of the table are automatically filled out on the table configuration tab, you can go to the table configuration tab again to configure the business-related properties of the table.
Configure the table by using the codeless UI in DataWorks
You can configure related properties of the table on the table configuration tab based on the configuration guide. You can use this method if you want to use the codeless UI. The following content shows how to configure the related properties of the table by using the codeless UI.
Configure the parameters in the General section. The following table describes the parameters.
Parameter
Description
Display Name
The display name of the table.
Theme
The folders that are used to store and manage the table. You can specify the level-1 and level-2 folders to store the table. The Level-1 Theme and Level-2 Theme parameters can be used to categorize tables based on business categories. You can store tables of the same business category in the same folder.
NoteThe level-1 and level-2 themes in the Workspace Tables pane of the DataStudio page help you better manage tables in folders. You can quickly find the current table in the Workspace Tables pane by theme. If no theme is available, you can create one. For information about how to create a theme, see the Create or manage folders for tables section of the "Manage settings for tables" topic.
Configure the parameters in the Physical Model section. The following table describes the parameters.
Parameter
Description
Level
The physical data layer to which the table belongs. By default, a data warehouse in DataWorks is divided into the following data layers: operational data store (ODS), dimension (DIM), data warehouse detail (DWD), data warehouse summary (DWS), and application data service (ADS). You can store the table at a specific data layer based on the business category.
NoteYou can also create a data layer based on your business requirements. For more information, see Manage settings for tables.
Category
The business category to which the table belongs. Categories help you classify tables in a finer-grained manner based on your business requirements. For example, you can use a basic business category, an advanced business category, or a category for another purpose to classify your tables in most cases.
NoteYou can also create custom categories based on your business requirements. For more information, see Manage settings for tables.
TTL
The time-to-live (TTL) period of the table.
For more information about the TTL period of a MaxCompute table, see Lifecycle.
For more information about the TTL-related operations on a MaxCompute table, see Lifecycle management operations.
Partition Type
Specifies whether the table is a partitioned or non-partitioned table. For more information about partitioned and non-partitioned MaxCompute tables, see Partition.
Table Type
Specifies whether the table is an internal or external table. If you query data in an internal table, the query speed is fast because real data is imported to the internal table. If you query data in a table by using an external table, you can query data without the need to import data to the external table. Less memory is used when you query data by using an external table. For more information about internal and external MaxCompute tables, see Table.
Configure the parameters in the Schema section. The following table describes the parameters.
Parameter
Description
Data Type
The data type of a field. You can select only data types that are supported by MaxCompute from the Data Type drop-down list. For more information about the data types supported by MaxCompute, see Data type editions.
Field Security Level
The security level of a field. This parameter is required only if you enable the label-based access control in a MaxCompute project. For more information about the security levels of fields in a MaxCompute table, see Label-based access control.
Primary Key Field
The primary key of the table. MaxCompute tables do not support primary keys. In this case, the primary key you specify in this parameter is used to manage the table only from the business perspective.
Commit and deploy the MaxCompute table
After you define the schema of the table, you must commit the table to the development and production environments.
Items that you need to take note of when you commit and deploy MaxCompute tables in workspaces in basic mode or workspaces in standard mode:
For workspaces in basic mode: You need to only commit MaxCompute tables to the production environment.
For workspaces in standard mode: Only users who are assigned the Workspace Administrator or O&M role can deploy MaxCompute tables to the production environment. If you want to deploy MaxCompute tables to the production environment, you must be granted the required permissions. For information about authorization, see Add a RAM user to a workspace as a member and assign roles to the member.
For more information about workspace modes, see Differences between workspaces in basic mode and workspaces in standard mode.
Operation | Description |
Load from Development Environment | Load the table information from the development environment and present the table information on the current page. Note You can perform this operation only after the table is committed to the development environment. After you perform this operation, the table information in the development environment overwrites the table information on the current page. |
Commit to Development Environment | Commit the table to the development environment of the workspace. This indicates that you must create the current table in the MaxCompute project that is added to DataWorks as a MaxCompute data source in the development environment. After the table is committed, you can perform the following operations:
|
Load from Production Environment | Load the table information from the production environment and present the table information on the current page. Note You can perform this operation only after the table is committed to the production environment. After you perform this operation, the table information in the production environment overwrites the table information on the current page. |
Commit to Production Environment | Commit the table to the production environment of the workspace. This indicates that you must create the current table in the MaxCompute project that is added to DataWorks as a MaxCompute data source in the production environment. After the table is committed, you can perform the following operations:
|
Write data to and export data from the MaxCompute table
This section describes how to write data to and export data from the MaxCompute table.
Write data to the MaxCompute table
You can use a data synchronization task and a task on a MaxCompute node to write data to the MaxCompute table. DataWorks also allows you to use the data upload feature to import data from your on-premises machine to the MaxCompute table.
Export data from the MaxCompute table
After you query data in the MaxCompute table in DataStudio, you can use the download feature on the query result page to download the data to your on-premises machine. By default, you can download a maximum of 10,000 data records at a time. DataWorks allows tenant administrators and tenant security administrators to configure the maximum number of data records that can be downloaded at a time on the Data query and analysis control page in Security Center.
If the number of data records that you want to export exceeds 10,000, you can run Tunnel commands on the MaxCompute client to export the data.
Query data in the MaxCompute table
You can use an ODPS SQL node or the ad hoc query feature to query data in the MaxCompute table by executing SQL statements.
Preset data access permissions
If you use a workspace in basic mode, fine-grained permission management and isolation of data between development and production environments are not supported. In this example, a workspace in standard mode is used.
The following table describes the preset data access permissions for RAM users that are added to a workspace as members.
Permission type | Description |
Permissions on a MaxCompute project in the development environment | After you assign a RAM user a built-in workspace-level role in your workspace and associate a MaxCompute project with the workspace in the development environment, the RAM user is automatically granted the permissions of the mapped role of the MaxCompute project. By default, the RAM user has the permissions of the MaxCompute project in the development environment. However, the RAM user does not have the permissions of the MaxCompute project that is associated with the workspace in the production environment. |
Permissions on a MaxCompute project in the production environment | The RAM user that is used as a scheduling access identity has high permissions on a MaxCompute project in the production environment. Other RAM users do not have permissions on the MaxCompute project in the production environment. To perform operations on MaxCompute tables in the production environment, you must go to Security Center to request the required permissions. DataWorks provides a default request processing procedure. DataWorks also allows users that are granted administrative permissions to customize request processing procedures. |
For more information about permission management for MaxCompute, see Manage permissions on data in a MaxCompute compute engine instance.
Data access behaviors
MaxCompute allows you to query tables across projects. You can query data in a MaxCompute project that is associated with a workspace in the production environment by specifying the project name on the DataStudio page. The following table describes the methods to query tables across projects and the accounts that can be used to access the tables in different environments.
On the Data Source page in DataStudio, you can view the MaxCompute projects that are added to DataWorks as MaxCompute data sources in the development and production environments and the accounts that are used to configure environments for the MaxCompute projects. For more information, see Preparations before data development: Associate a data source or a cluster with DataStudio.
For a workspace in standard mode, the personal identity of a task executor is used to run MaxCompute tasks in the development environment by default, and an Alibaba Cloud account is used as the scheduling access identity to run MaxCompute tasks in the production environment. For more information, see Add a MaxCompute data source.
Sample code | Execution account in the development environment (DataStudio and Operation Center in the development environment) | Execution account in the production environment (Operation Center in the production environment) |
Access tables in the MaxCompute project in the development environment:
| The personal Alibaba Cloud account of a task executor is used to access tables in the MaxCompute project in the development environment.
| The scheduling access identity is used to access tables in the MaxCompute project in the development environment. |
Access tables in the MaxCompute project in the production environment:
| The personal Alibaba Cloud account of a task executor is used to access tables in the MaxCompute project in the production environment. Note Due to security control on data in the production environment, a personal Alibaba Cloud account cannot be used to access tables in the MaxCompute project in the production environment. To use a personal Alibaba Cloud account to access tables in the MaxCompute project in the production environment, go to Security Center to request the permissions. DataWorks provides a default request processing procedure. DataWorks also allows users that are granted management permissions to customize request processing procedures. | The scheduling access identity is used to access tables in the MaxCompute project in the production environment. |
Execute the following statement in the MaxCompute project in the desired environment such as the development environment to access tables in the MaxCompute project:
| If the statement is executed in the MaxCompute project in the development environment, you can use the personal Alibaba Cloud account of a node executor to access tables in the MaxCompute project in the development environment. | If the statement is executed in the MaxCompute project in the production environment, you can use the scheduling access identity to access tables in the MaxCompute project in the production environment. |
View MaxCompute data assets
This section describes how to view MaxCompute data assets.
View tables in the production environment within a tenant
In the Tenant Tables pane of the DataStudio page in the DataWorks console, you can view all tables in the production environment of your Alibaba Cloud account in the current region.
View metadata
You can go to Data Map to view the details and metadata of a MaxCompute table.
View table lineages
You can view the ancestor and descendant lineages for a table on the details page of the table in Data Map.
To locate the tasks in which the table is used, you can use the code search feature to search for the tasks.
Manage multiple MaxCompute tables at the same time
This section describes how to manage multiple MaxCompute tables at the same time.
Delete multiple MaxCompute tables at the same time
You can go to the My Data page in Data Map and delete multiple MaxCompute tables of which you are the owner at the same time.
Due to security control for data in the production environment, you cannot directly run commands to delete tables in the production environment. If you want to delete a table that belongs to another user, you must go to Security Center to request the required permissions.
Change the owners of multiple tables at the same time
You can go to the My Data page in Data Map and transfer the ownership of multiple tables to another user at the same time.
Change the TTL periods of multiple tables at the same time
You can go to the My Data page in Data Map and change the TTL periods of multiple MaxCompute tables of which you are the owner at the same time.
You can go to the Workspace Tables pane to modify the TTL period of a single table. For more information, see Create a MaxCompute table.
Manage tables in a specific workflow
You can add tables that are created in the Workspace Tables pane to a specific workflow in DataStudio by importing the tables. You can manage the tables by workflow and view the tables related to the business in the workflow. The following figure shows the steps to add tables of a MaxCompute compute engine to a specific workflow.
Referenced: indicates that a specific table is queried by using a SELECT statement in a specific workflow.
Written: indicates that data is written to a specific table by using an INSERT OVERWRITE statement in a specific workflow.