You can create and manage MaxCompute tables by executing SQL statements in the MaxCompute client or by performing operations in the DataWorks console. It is more convenient to create and manage MaxCompute tables in the DataWorks console. This topic describes how to create and manage MaxCompute tables in the DataWorks console.
Background information
MaxCompute table operation principles
When you create MaxCompute tables or perform operations on MaxCompute tables in DataWorks, you must comply with the basic requirements for operations performed on MaxCompute tables. For example, you cannot delete fields in a MaxCompute table after you create the table. For more information, see Limits on MaxCompute table operations.
Changing of physical properties of MaxCompute tables
You can change the physical properties of MaxCompute tables by executing MaxCompute SQL statements. For more information, see Table operations. Permission management is implemented when you access MaxCompute tables from DataWorks. If you access MaxCompute tables from DataWorks as a RAM user, an error may be reported in some scenarios. For more information, see Manage permissions on data in a MaxCompute compute engine instance.
Metadata update latency
Latency may exist when you perform operations on metadata. After you create or update tables in DataWorks, you may not find the tables on the UI. After you remove tables from DataWorks, you may still find the tables on the UI. In these cases, you can use a manual synchronization tool in Data Map to manually synchronize the operation results. For more information, see Manual synchronization tool.
Go to the DataStudio page
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to Data Development.
Create a MaxCompute table
You can create MaxCompute tables in the Workspace Tables pane of the DataStudio page in the DataWorks console. You can also create MaxCompute tables in a specific workflow in DataStudio. DataWorks allows you to separately create tables in the development environment and in the production environment by executing DDL statements or by using the codeless UI.
Entry points of creating tables
Create a MaxCompute table in the Workspace Tables pane
In the left-side navigation pane of the DataStudio page, click Workspace Tables. In the Workspace Tables pane, click the
icon to create a MaxCompute table. For more information about how to manage tables, see Manage tables. If the Workspace Tables module is not displayed in the left-side navigation pane, you can add the module. For more information about how to add a module to the DataStudio page, see Adjust the displayed DataStudio modules.
Create a MaxCompute table in a specific workflow in DataStudio
DataWorks organizes and manages code by workflow. You can create tables related to the business in a specific workflow. If no workflow is available, you can create one. For information about how to create a workflow, see Create a workflow. If workflows are available, right-click the name of the desired workflow and choose . In the dialog box that appears, configure the required information and click Create.
Configure the basic information about the MaxCompute table
After you perform the preceding operations, you can configure the information about the table on the table configuration tab. On the tab, you can configure the information about the table by executing a DDL statement or by using the codeless UI in DataWorks.
Commit and deploy the MaxCompute table
After you define the schema of the table, you must commit the table to the development and production environments.
Note
Items that you need to take note of when you commit and deploy MaxCompute tables in workspaces in basic mode or workspaces in standard mode:
For workspaces in basic mode: You need to only commit MaxCompute tables to the production environment.
For workspaces in standard mode: Only users who are assigned the Workspace Administrator or O&M role can deploy MaxCompute tables to the production environment. If you want to deploy MaxCompute tables to the production environment, you must be granted the required permissions. For information about authorization, see Add a RAM user to a workspace as a member and assign roles to the member.
For more information about workspace modes, see Differences between workspaces in basic mode and workspaces in standard mode.
Operation | Description |
Load from Development Environment | Load the table information from the development environment and present the table information on the current page. Note You can perform this operation only after the table is committed to the development environment. After you perform this operation, the table information in the development environment overwrites the table information on the current page. |
Commit to Development Environment | Commit the table to the development environment of the workspace. This indicates that you must create the current table in the MaxCompute project that is added to DataWorks as a MaxCompute data source in the development environment. After the table is committed, you can perform the following operations: View the schema of the table Develop a task on an ODPS SQL node in DataStudio, and run the desc tablename command to view the schema of the table. View information about the MaxCompute project View information about the MaxCompute project that is added to DataWorks as a MaxCompute data source in the development environment and information about the created table. For more information, see Add a MaxCompute data source.
|
Load from Production Environment | Load the table information from the production environment and present the table information on the current page. Note You can perform this operation only after the table is committed to the production environment. After you perform this operation, the table information in the production environment overwrites the table information on the current page. |
Commit to Production Environment | Commit the table to the production environment of the workspace. This indicates that you must create the current table in the MaxCompute project that is added to DataWorks as a MaxCompute data source in the production environment. After the table is committed, you can perform the following operations: View the schema of the table Develop a task on an ODPS SQL node in DataStudio, and run the desc projectname.tablename command to view the schema of the table. View information about the MaxCompute project View information about the MaxCompute project that is added to DataWorks as a MaxCompute data source in the production environment and information about the created table. For more information, see Add a MaxCompute data source.
|
Write data to and export data from the MaxCompute table
This section describes how to write data to and export data from the MaxCompute table.
Write data to the MaxCompute table
You can use a data synchronization task and a task on a MaxCompute node to write data to the MaxCompute table. DataWorks also allows you to use the data upload feature to import data from your on-premises machine to the MaxCompute table.
Export data from the MaxCompute table
After you query data in the MaxCompute table in DataStudio, you can use the download feature on the query result page to download the data to your on-premises machine. By default, you can download a maximum of 10,000 data records at a time. DataWorks allows tenant administrators and tenant security administrators to configure the maximum number of data records that can be downloaded at a time on the Data query and analysis control page in Security Center.
Note
If the number of data records that you want to export exceeds 10,000, you can run Tunnel commands on the MaxCompute client to export the data.
Query data in the MaxCompute table
You can use an ODPS SQL node or the ad hoc query feature to query data in the MaxCompute table by executing SQL statements.
Preset data access permissions
If you use a workspace in basic mode, fine-grained permission management and isolation of data between development and production environments are not supported. In this example, a workspace in standard mode is used.
The following table describes the preset data access permissions for RAM users that are added to a workspace as members.
Permission type | Description |
Permission type | Description |
Permissions on a MaxCompute project in the development environment | After you assign a RAM user a built-in workspace-level role in your workspace and associate a MaxCompute project with the workspace in the development environment, the RAM user is automatically granted the permissions of the mapped role of the MaxCompute project. By default, the RAM user has the permissions of the MaxCompute project in the development environment. However, the RAM user does not have the permissions of the MaxCompute project that is associated with the workspace in the production environment. |
Permissions on a MaxCompute project in the production environment | The RAM user that is used as a scheduling access identity has high permissions on a MaxCompute project in the production environment. Other RAM users do not have permissions on the MaxCompute project in the production environment. To perform operations on MaxCompute tables in the production environment, you must go to Security Center to request the required permissions. DataWorks provides a default request processing procedure. DataWorks also allows users that are granted administrative permissions to customize request processing procedures. |
For more information about permission management for MaxCompute, see Manage permissions on data in a MaxCompute compute engine instance.
Data access behaviors
MaxCompute allows you to query tables across projects. You can query data in a MaxCompute project that is associated with a workspace in the production environment by specifying the project name on the DataStudio page. The following table describes the methods to query tables across projects and the accounts that can be used to access the tables in different environments.
Note
On the Computing Resource page in DataStudio, you can view the MaxCompute projects that are added to DataWorks as MaxCompute data sources in the development and production environments and the accounts that are used to configure environments for the MaxCompute projects. For more information, see Preparations before data development: Associate a data source or a cluster with DataStudio.
For a workspace in standard mode, the personal identity of a task executor is used to run MaxCompute tasks in the development environment by default, and an Alibaba Cloud account is used as the scheduling access identity to run MaxCompute tasks in the production environment. For more information, see Add a MaxCompute data source.
Sample code | Execution account in the development environment (DataStudio and Operation Center in the development environment) | Execution account in the production environment (Operation Center in the production environment) |
Sample code | Execution account in the development environment (DataStudio and Operation Center in the development environment) | Execution account in the production environment (Operation Center in the production environment) |
Access tables in the MaxCompute project in the development environment:
select col1 from projectname_dev.tablename;
| The personal Alibaba Cloud account of a task executor is used to access tables in the MaxCompute project in the development environment. If a RAM user runs a task, the personal Alibaba Cloud account of the RAM user is used to access tables in the MaxCompute project in the development environment. If an Alibaba Cloud account is used to run a task, the Alibaba Cloud account is used to access tables in the MaxCompute project in the development environment.
| The scheduling access identity is used to access tables in the MaxCompute project in the development environment. |
Access tables in the MaxCompute project in the production environment:
select col1 from projectname.tablename;
| The personal Alibaba Cloud account of a task executor is used to access tables in the MaxCompute project in the production environment. Note Due to security control on data in the production environment, a personal Alibaba Cloud account cannot be used to access tables in the MaxCompute project in the production environment. To use a personal Alibaba Cloud account to access tables in the MaxCompute project in the production environment, go to Security Center to request the permissions. DataWorks provides a default request processing procedure. DataWorks also allows users that are granted management permissions to customize request processing procedures. | The scheduling access identity is used to access tables in the MaxCompute project in the production environment. |
Execute the following statement in the MaxCompute project in the desired environment such as the development environment to access tables in the MaxCompute project:
select col1 from tablename;
| If the statement is executed in the MaxCompute project in the development environment, you can use the personal Alibaba Cloud account of a node executor to access tables in the MaxCompute project in the development environment. | If the statement is executed in the MaxCompute project in the production environment, you can use the scheduling access identity to access tables in the MaxCompute project in the production environment. |
View MaxCompute data assets
This section describes how to view MaxCompute data assets.
View tables in the production environment within a tenant
In the Tenant Tables pane of the DataStudio page in the DataWorks console, you can view all tables in the production environment of your Alibaba Cloud account in the current region.
View metadata
You can go to Data Map to view the details and metadata of a MaxCompute table.
View table lineages
You can view the ancestor and descendant lineages for a table on the details page of the table in Data Map.
Note
To locate the tasks in which the table is used, you can use the code search feature to search for the tasks.
Manage multiple MaxCompute tables at the same time
This section describes how to manage multiple MaxCompute tables at the same time.
Delete multiple MaxCompute tables at the same time
You can go to the My Data page in Data Map and delete multiple MaxCompute tables of which you are the owner at the same time.
Note
Due to security control for data in the production environment, you cannot directly run commands to delete tables in the production environment. If you want to delete a table that belongs to another user, you must go to Security Center to request the required permissions.
Change the owners of multiple tables at the same time
You can go to the My Data page in Data Map and transfer the ownership of multiple tables to another user at the same time.
Change the TTL periods of multiple tables at the same time
You can go to the My Data page in Data Map and change the TTL periods of multiple MaxCompute tables of which you are the owner at the same time.
Note
You can go to the Workspace Tables pane to modify the TTL period of a single table. For more information, see Create a MaxCompute table.
Manage tables in a specific workflow
You can add tables that are created in the Workspace Tables pane to a specific workflow in DataStudio by importing the tables. You can manage the tables by workflow and view the tables related to the business in the workflow. The following figure shows the steps to add tables of a MaxCompute compute engine to a specific workflow.
Referenced: indicates that a specific table is queried by using a SELECT statement in a specific workflow.
Written: indicates that data is written to a specific table by using an INSERT OVERWRITE statement in a specific workflow.