MaxCompute provides the data lakehouse solution 2.0. The solution allows you to build management objects that define the metadata and data access methods of foreign servers. You can use the external schema mapping mechanism to access all tables within a database or schema of a foreign server. The solution allows you to build a data management platform that combines data lakes and data warehouses. The solution integrates the flexibility and broad ecosystem compatibility of data lakes with the enterprise-class deployment of data warehouses. The solution is in public preview.
Background information
Comparison between data warehouses and data lakes
Category
Capability
Data warehouse
Focuses on the management and constraints of structured and semi-structured data that enters data warehouses, and leverages strong centralized management to improve computing performance and standardize management capabilities.
Data lake
Focuses on open data storage and universal data formats and allows multiple engines to produce or consume data on demand. A data lake provides only weak centralized management to ensure flexibility, supports unstructured data, and allows you to create schemas after data is stored. A data lake helps you manage data in a more flexible manner.
MaxCompute data warehouses
MaxCompute is a cloud-native data warehouse service that is built on a serverless architecture. MaxCompute allows you to perform the following operations:
Perform data modeling of data warehouses.
Use the extract, transform, load (ETL) tool to load data to a model table that is defined with a schema for storage.
Use the standard SQL engine to process substantial data in data warehouses and use the online analytical processing (OLAP) engine of Hologres to analyze data.
Data lake and federated computing scenarios supported by MaxCompute
In data lake scenarios, data can be produced or consumed by using multiple engines. MaxCompute is one of the engines that are used to process data. MaxCompute needs to read data in multiple mainstream open source formats from the upstream of a data lake, compute data, and then continuously produce data for the downstream.
MaxCompute provides secure, high-performance, and cost-effective data warehouses to aggregate high-value data. MaxCompute also needs to obtain metadata and data from data lakes, compute external data, and perform federated computing on the data of warehouses and data lakes.
Additionally, MaxCompute needs to obtain data from various foreign servers, such as Hadoop and Hologres, to perform federated computing. In federated computing scenarios, MaxCompute also needs to read metadata and data from external systems.
Data lakehouse solution 2.0 of MaxCompute
The data lakehouse solution 2.0 of MaxCompute allows you to access metadata or storage services of Alibaba Cloud over the cloud product interconnection network, or foreign servers in virtual private clouds (VPCs) over a dedicated connection. The solution allows you to build management objects that define the metadata and data access methods of foreign servers. You can use an external schema to map the database or schema of a foreign server to access all tables within the database or schema.
Network connection
For information about network connections, see the "Access over a VPC (dedicated connection)" section of the Network connection process topic. MaxCompute can use a network connection to access data sources, such as E-MapReduce (EMR) clusters and ApsaraDB RDS instances (available soon), in VPCs. For Data Lake Formation (DLF), Object Storage Service (OSS), and Hologres that are deployed in the cloud product interconnection network, MaxCompute can access data in the services without the need to configure a network connection.
Foreign server
A foreign server contains the metadata and data access information and the information that is used to access data source systems, such as identity authentication information, location information, and connection protocol descriptions. A foreign server is a tenant management object and is defined by the tenant administrator.
If the project-level tenant resource access control feature is enabled for a project, the tenant administrator mounts a foreign server to the project, and the project administrator uses a policy to grant the permissions on the foreign server to the project members.
External schema
An external schema is a special schema in MaxCompute data warehouse projects. As shown in the preceding figure, an external schema can be mapped to a database or schema of a data source, which allows you to access tables and data within the database or schema of the data source. The tables that are generated from the schema mapping mechanism are called federated external tables.
Federated external tables do not store metadata in MaxCompute. Instead, MaxCompute obtains metadata in real time by using the metadata service in foreign server objects. When you query data, you do not need to execute a DDL statement to create an external table in the data warehouse project. You can query data of a source table by referencing the names of the project, the external schema, and the source table. When the schema or data of the source table changes, the federated external table can immediately reflect the latest status of the source table. The data source hierarchy that can be mapped by an external schema is determined by the system hierarchy between the hierarchy defined by the foreign server and the table hierarchy in the data source. The hierarchy defined by the foreign server is determined by the data source hierarchy that can be accessed by the authenticated identity.
External project
In the data lakehouse solution 1.0, an external project uses a 2-layer architecture and can be used to map to a database or schema of a data source in the same manner as an external schema. An external project must use a data warehouse project to read and compute external data. However, external projects are located at higher hierarchy levels. A large number of external projects may be used to map to databases or schemas of the data source, and cannot be integrated with 3-layer data warehouse projects. You can use external schemas instead of external projects in the data lakehouse solution 1.0.
In the data lakehouse solution 2.0, external schemas provide all the capabilities of external projects in the data lakehouse solution 1.0. An external project can be mapped to a catalog or database of a 3-layer data source. This way, you can view databases in a DLF catalog or schemas in a Hologres database and use a federated external table to access the data source. The data lakehouse solution 2.0 will soon add support for external projects. For more information, see the official documentation.
Data source type
Hierarchy of foreign servers
Mapping hierarchy of external schemas
Mapping hierarchy of external projects in the data lakehouse solution 2.0 (available soon)
Mapping hierarchy of external projects in the data lakehouse solution 1.0 (to be discontinued)
Authentication method
DLF+OSS
Region-level DLF and OSS
Catalog.Database in DLF
Catalog in DLF
Catalog.Database in DLF
RAMRole
Hive+HDFS
EMR cluster
Database in Hive
Not supported
Database in Hive
Authentication-free
Hologres
Database in Hologres
Schema
Database
Not supported
RAMRole
NoteDifferent types of authentication methods are available for different data sources. MaxCompute will gradually provide multiple authentication methods in later versions, such as the current user identity method for accessing Hologres data and the Kerberos authentication method for accessing Hive data.
Limits
The data lakehouse solution 2.0 is supported only in the following regions: China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Shenzhen), China (Hong Kong), Singapore, and Germany (Frankfurt).
MaxCompute must be deployed in the same region as DLF and OSS.
External schema operations must be performed within the internal project, so the internal project must enable the three-layer model. You can perform these operations in the console. For more information, see Enable the schema feature.
Precautions
You must enable the schema mode supported by project-level metadata for the MaxCompute project, which is the target project for creating the external schema. For more details, see Schema mode supported by project-level metadata.
You must enable the schema mode supported by SQL syntax before performing schema operations or querying data from an external schema. For more details, see Schema mode supported by SQL syntax.
When you execute the
SHOW tables IN <external_schema_name>
command for accessing data source system:If you are running this command in the DataWorks Data Development IDE and the system fails to resolve the schema, you also need to ensure that the MaxCompute CMD integrated with the DataWorks resource group is version 0.46.8 or higher. If the CMD version is lower than 0.46.8, submit a ticket to contact MaxCompute technical support for an upgrade.
If you are running this command in the MaxCompute client (odpscmd), you also need to upgrade it to version 0.46.8 or higher.
NoteYou can use the
SHOW VERSION;
command to view the current version.The external schema you create supports viewing the list of schemas, viewing schema information, and deleting schemas, with commands and usage methods identical to those for internal schemas. For more details, see Schema-related operations.
View the created foreign servers
You can view the external tables and data sources that are created on the Foreign Server page. The following table describes the parameters.
Parameter | Description |
Data Source Name | The name of the foreign server. |
Type | The type of the foreign server. Valid values: DLF+OSS, Hive+HDFS, and Hologres. |
Network Connection | The name of the network connection that is used by the foreign server. Only the Hive+HDFS type supports network connections. |
Owner Account | The ID of the account that is used to create the foreign server. External schemas use the foreign server to access the data source system based on the identity information specified by the creator of the foreign server.
|
Mounted Projects | The number of projects to which the foreign server is mounted. |
Creation Time | The time when the foreign server was created. |
Update Time | The time when the editable properties of the foreign server were last modified and saved. |
Actions - Mount Project | Configures the mounting relationship between the foreign server and projects.
|
Actions - Details | Views the properties of the foreign server. |
Actions - Edit | Edits the properties of the foreign server. Important If you modify information such as permissions, the authorization relationship configured for the project may become invalid due to the change of visible objects. |
Actions - Delete | Deletes the foreign server. Important After you delete a foreign server, all tasks that depend on the foreign server no longer have the permissions to access external systems, and the mounting relationship between the foreign server and all projects is deleted. |
Create and use a DLF+OSS data lakehouse
Data sources of the DLF+OSS type use OSS as the data lake storage service and DLF as the metadata management service. MaxCompute allows you to create foreign servers of the DLF+OSS type. MaxCompute can be used together with DLF and OSS to integrate data warehouses and data lakes to provide more flexible and efficient data management and processing capabilities.
If your account has not enabled the tenant-level schema syntax switch, you must add the statement SET odps.namespace.schema=true;
before your SQL statements to ensure that schema-related statements can be executed properly.
Step 1: Authorize MaxCompute to access your cloud resources
If you want to build the data lakehouse by using MaxCompute, DLF, and OSS, complete authorization by using one of the following methods. The account that is used to create the MaxCompute project cannot access DLF without authorization. You can use one of the following methods to authorize MaxCompute to access DLF:
One-click authorization: If you use the same account to create the MaxCompute project and deploy DLF, we recommend that you perform one-click authorization on the Cloud Resource Access Authorization page in the Resource Access Management (RAM) console.
Custom authorization: You can use this method regardless of whether the same account is used to create the MaxCompute project and deploy DLF. For more information, see Authorize a RAM user to access DLF.
Step 2: Create a foreign server of the DLF+OSS type
Log on to the MaxCompute console. In the upper-left corner of the console, select a region.
In the left-side navigation pane, choose Tenants > Foreign Server.
On the Foreign Server page, click Create Foreign Server.
In the Add Foreign Server dialog box, configure the parameters as prompted. The following table describes the parameters.
Parameter
Description
Foreign Server Type
The type of the foreign server. Select DLF+OSS.
Foreign Server Name
The name of the foreign server. The name must meet the following requirements:
The name contains lowercase letters, underscores (_), and digits and starts with a lowercase letter.
The name can be up to 128 characters in length.
Foreign Server Description
The description of the foreign server.
Region
The region in which you want to create the foreign server. This parameter is automatically set to the current region.
DLF Endpoint
The DLF endpoint of the current region.
OSS Endpoint
The OSS endpoint of the current region.
RoleARN
The Alibaba Cloud Resource Name (ARN) of a Resource Access Management (RAM) role. The RAM role must have the permissions to access DLF and OSS.
You can log on to the RAM console, choose Identities > Roles in the left-side navigation pane, and then click the name of the RAM role to obtain the ARN in the Basic Information section.
For example,
acs:ram::124****:role/aliyunodpsdefaultrole
.Foreign Server Supplemental Properties
The properties of the foreign server. After you specify this parameter, the tasks that use the foreign server can access the data source system based on the behavior specified by this parameter.
NoteMore properties will be supported by this parameter as service capabilities evolve. For more information, see the official documentation.
Click OK.
Step 3: Create an external schema
Execute the following statement to create an external schema that references a foreign server of the DLF+OSS type:
CREATE EXTERNAL SCHEMA IF NOT EXISTS <external_schema_name>
with <dlfoss_foreign_server_name>
ON '<dlf_calaog_id>.<database_name>';
Parameters:
external_schema_name: the name of the external schema.
dlfoss_foreign_server_name: the name of the foreign server. The project to which the external schema belongs must be deployed in the same region as the foreign server.
dlf_calaog_id: the catalog ID of DLF. For information about how to create a catalog, see the "Create a data catalog" section of the Data catalog topic.
database_name: the name of the database in the catalog of DLF. For more information, see the "Create a metadatabase" section of the Metadata management topic.
Step 4: Use SQL syntax to access the data source system
SQL syntax: Show the names of tables in DLF and OSS schemas.
Method 1:
SHOW tables IN <external_schema_name>; --The <external_schema_name> parameter specifies the name of the external schema.
Method 2:
USE SCHEMA <external_schema_name>; --The <external_schema_name> parameter specifies the name of the external schema. SHOW tables;
Examples:
Query the names of all tables in the external schema whose name is
es_dlf
.USE SCHEMA es_dlf; SHOW TABLES;
Sample result:
ALIYUN$xxx@test.aliyunid.com:hellocsv ALIYUN$xxx@test.aliyunid.com:t1 ALIYUN$xxx@test.aliyunid.com:t2 ALIYUN$xxx@test.aliyunid.com:t3
Query the data of the
hellocsv
table in thees_dlf
schema in thelakehouse47_3
project.SELECT * FROM lakehouse47_3.es_dlf.hellocsv;
Sample result:
+------------+------------+------------+------------+ | col1 | col2 | col3 | col4 | +------------+------------+------------+------------+ | 1 | hello | test | world | +------------+------------+------------+------------+
Copy the data of a federated external table whose name is
hellocsv
from the data source and paste the data to a data warehouse table.-- Copy the data of a federated external table and paste the data to a data warehouse table. CREATE TABLE hellocsv_copy AS SELECT * FROM lakehouse47_3.es_dlf.hellocsv; -- Query the copied data in the data warehouse table. SELECT * FROM hellocsv_copy;
Sample result:
+------------+------------+------------+------------+ | col1 | col2 | col3 | col4 | +------------+------------+------------+------------+ | 1 | hello | test | world | +------------+------------+------------+------------+
Create and use a Hive+HDFS federation
Hive is a common open source big data warehouse solution. In most cases, metadata is stored in Hive Metastore Service (HMS), and data is stored in Hadoop Distributed File System (HDFS). MaxCompute allows you to create foreign servers of the Hive+HDFS type. MaxCompute can be used together with Hive in a federated manner to implement data warehouse access and data aggregation for open source big data systems.
You are not charged for pay-as-you-go SQL federated computing in Hive+HDFS mode during the public preview.
If your account has not enabled the tenant-level schema syntax switch, you must add the statement
SET odps.namespace.schema=true;
before your SQL statements to ensure that schema-related statements can be executed properly.
Step 1: Create a foreign server of the Hive+HDFS type
Log on to the MaxCompute console. In the upper-left corner of the console, select a region.
In the left-side navigation pane, choose Tenants > Foreign Server.
On the Foreign Server page, click Create Foreign Server.
In the Add Foreign Server dialog box, configure the parameters as prompted. The following table describes the parameters.
Parameter
Description
Foreign Server Type
The type of the foreign server. Select Hive+HDFS.
Foreign Server Name
The name of the foreign server. The name must meet the following requirements:
The name contains lowercase letters, underscores (_), and digits and starts with a lowercase letter.
The name can be up to 128 characters in length.
Foreign Server Description
The description of the foreign server.
Network Connection Object
The name of the network connection. You can select or establish a connection between MaxCompute and the VPC of an EMR or Hadoop cluster. For more information, see the Establish a network connection between MaxCompute and the destination VPC step in the "Access over a VPC (dedicated connection)" section of the Network connection process topic.
NoteFor information about network connections, see the "Network connection" section of the Terms topic.
The VPC must be deployed in the same region as the MaxCompute foreign server and the project to which the foreign server is mounted.
Cluster Name
The name of the cluster. For a high-availability (HA) Hadoop cluster, the value of this parameter is the same as the name of a NameNode process. When you create a Hadoop cluster, you can obtain the cluster name by using the
dfs.nameservices
parameter in the hdfs-site.xml file.NameNode Address
The IP addresses and port numbers of the active and standby NameNode processes in the Hadoop cluster. In most cases, the port number is 8020. To obtain the addresses, contact the Hadoop cluster administrator.
HMS Service Address
The HMS IP addresses and port numbers of the active and standby NameNode processes in the Hadoop cluster. In most cases, the port number is 9083. To obtain the addresses, contact the Hadoop cluster administrator.
Authentication Type
The authentication type. Only No Authentication Method is supported.
Foreign Server Supplemental Properties
The properties of the foreign server. After you specify this parameter, the tasks that use the foreign server can access the data source system based on the behavior specified by this parameter.
NoteMore properties will be supported by this parameter as service capabilities evolve. For more information, see the official documentation.
Click OK.
Step 2: Create an external schema
An external schema is an object within a project and can be created by using SQL syntax. If your account does not enable the tenant-level Schema syntax switch, you need to add the statement SET odps.namespace.schema=true;
before your SQL commands to facilitate the execution of Schema-related commands.
Execute the following statement to create an external schema that references a foreign server of the Hive+HDFS type:
CREATE EXTERNAL SCHEMA IF NOT EXISTS <external_schema_name>
WITH <hive_foreign_server_name>
ON '<database_name>' ;
Parameters:
external_schema_name: the name of the external schema.
hive_foreign_server_name: the name of the foreign server.
database_name: the name of the Hive database.
Step 3: Use SQL syntax to access the data source system
SQL syntax: Show the names of tables in Hive schemas.
Method 1:
SHOW tables IN <external_schema_name>; --The <external_schema_name> parameter specifies the name of the external schema.
Method 2:
USE SCHEMA <external_schema_name>; --The <external_schema_name> parameter specifies the name of the external schema. SHOW tables;
Examples:
Query the names of all tables in the external schema whose name is
es_hive3
.USE SCHEMA es_hive3; SHOW TABLES;
Sample result:
ALIYUN$xxx@test.aliyunid.com:t1
Query the data of the
t1
table in thees_hive3
schema in thelakehouse47_3
project.SELECT * FROM lakehouse47_3.es_hive3.t1;
Sample result:
+------------+ | id | +------------+ | 1 | +------------+
Copy the data of a federated external table whose name is
hellocsv
from the data source and paste the data to a data warehouse table.-- Copy the data of a federated external table and paste the data to a data warehouse table. CREATE TABLE t1_copy AS SELECT * FROM lakehouse47_3.es_hive3.t1; -- Query the copied data in the data warehouse table. SELECT * FROM t1_copy;
Sample result:
+------------+ | id | +------------+ | 1 | +------------+
Create and use a Hologres federation
Hologres is an end-to-end real-time data warehouse engine that supports real-time writes, updates, and analysis of large amounts of data. Hologres supports standard SQL syntax that is compatible with PostgreSQL, OLAP and ad hoc analysis of petabytes of data, and high concurrency, low-latency online data services. Hologres is deeply integrated with MaxCompute. Hologres allows you to create, analyze, and query OLAP models in MaxCompute data warehouses. MaxCompute allows you to create foreign servers of the Hologres type. You can use a Hologres federation to allow MaxCompute to perform the following operations:
Read and archive data from ultra-large read-time data warehouses.
Read the dimension data or the data of data mart models and perform computing in association with fact tables at the operational data store (ODS), data warehouse detail (DWD), and data warehouse service (DWS) layers.
Read the data of OLAP models to perform high-performance, low-cost batch processing and return the results to real-time data warehouses by using Hologres external tables or remote function calls for analysis.
If your account has not enabled the tenant-level schema syntax switch, you must add the statement SET odps.namespace.schema=true;
before your SQL statements to ensure that schema-related statements can be executed properly.
Step 1: Create a foreign server of the Hologres type
Log on to the MaxCompute console. In the upper-left corner of the console, select a region.
In the left-side navigation pane, choose Tenants > Foreign Server.
On the Foreign Server page, click Create Foreign Server.
In the Add Foreign Server dialog box, configure the parameters as prompted. The following table describes the parameters.
Parameter
Description
Foreign Server Type
The type of the foreign server. Select Hologres.
Foreign Server Name
The name of the foreign server. The name must meet the following requirements:
The name contains lowercase letters, underscores (_), and digits and starts with a lowercase letter.
The name can be up to 128 characters in length.
Foreign Server Description
The description of the foreign server.
Connection Method
The connection method of the foreign server. Only cloud product interconnection for connecting cloud products is supported.
Host
The hostname of the Hologres instance.
You can log on to the Hologres console, click Instances in the left-side navigation pane, and then click the ID of the Hologres instance to obtain the hostname in the Network Information section of the Instance Details page.
For example,
hgpostcn-cn-3m***-cn-shanghai-internal.hologres.aliyuncs.com
.Port
The port number of the Hologres instance.
You can log on to the Hologres console, click Instances in the left-side navigation pane, and then click the ID of the Hologres instance to obtain the port number in the Network Information section of the Instance Details page. The port is generally
80
.DBNAME
The name of a database in the Hologres instance.
Authentication Method
RAMRole: assumes a RAM role to perform authentication. This method allows you to access Hologres across Alibaba Cloud accounts. Federated external tables support only the RAMRole method. For example, RoleARN:
acs:ram::uid:role/aliyunodpsholorole
. For more information about the role configuration and authentication of thealiyunodpsholorole
, see Create a Hologres external table in STS mode.ExecuteWithUserAuth: allows you to use the same Alibaba Cloud account or RAM user to access authorized tables and data in MaxCompute and Hologres. You can also execute the
CALL EXEC_EXTERNAL_QUERY
statement in MaxCompute and Hologres without the need to configure additional authentication information.ImportantThe ExecuteWithUserAuth method is not supported for federated external tables.
RoleARN
The ARN of the RAM role. The RAM role must have the permissions to access Hologres. This parameter is required when you set the Authentication Method parameter to RAMRole.
You can log on to the RAM console, choose Identities > Roles in the left-side navigation pane, and then click the name of the RAM role to obtain the ARN in the Basic Information section.
Foreign Server Supplemental Properties
The properties of the foreign server. After you specify this parameter, the tasks that use the foreign server can access the data source system based on the behavior specified by this parameter.
NoteMore properties will be supported by this parameter as service capabilities evolve. For more information, see the official documentation.
Click OK.
Step 2: Create an external schema
An external schema is an object within a project and can be created by using SQL syntax. If your account does not enable the tenant-level Schema syntax switch, you need to add the statement SET odps.namespace.schema=true;
before your SQL commands to facilitate the execution of Schema-related commands.
Execute the following statement to create an external schema that references a foreign server of the Hologres type:
CREATE EXTERNAL SCHEMA IF NOT EXISTS <external_schema_name>
with <holo_foreign_server_name>
ON '<holoschema_name>' ;
Parameters:
external_schema_name: the name of the external schema.
holo_foreign_server_name: the name of the foreign server.
holoschema_name: the name of the Hologres schema to be mapped.
Step 3: Use SQL syntax to access the data source system
SQL syntax: Show the names of tables in Hologres schemas.
Method 1:
SHOW tables IN <external_schema_name>; --The <external_schema_name> parameter specifies the name of the external schema.
Method 2:
USE SCHEMA <external_schema_name>; --The <external_schema_name> parameter specifies the name of the external schema. SHOW tables;
Examples:
Query the names of all tables in the external schema whose name is
es_holo_rolearn_nonl
.SET odps.namespace.schema=true; USE SCHEMA es_holo_rolearn_nonl; SHOW TABLES;
Sample result:
ALIYUN$xxx@test.aliyunid.com:mc_holo_external
Query the data of the
mc_holo_external
table in thees_holo_rolearn_nonl
schema in thelakehouse47_3
project.SELECT * FROM lakehouse47_3.es_holo_rolearn_nonl.mc_holo_external;
Sample result:
+------------+------------+------------+------------+ | col1 | col2 | col3 | col4 | +------------+------------+------------+------------+ | 1 | hello | test | world | +------------+------------+------------+------------+
Insert data into the table.
INSERT INTO lakehouse47_3.es_holo_rolearn_nonl.mc_holo_external VALUES(2,'hello','test','holo'); SELECT * FROM lakehouse47_3.es_holo_rolearn_nonl.mc_holo_external;
Sample result:
+------------+------------+------------+------------+ | col1 | col2 | col3 | col4 | +------------+------------+------------+------------+ | 1 | hello | test | world | | 2 | hello | test | holo | +------------+------------+------------+------------+
Copy the data of a federated external table whose name is
hellocsv
from the data source and paste the data to a data warehouse table.-- Copy the data of a federated external table and paste the data to a data warehouse table. CREATE TABLE mc_holo_external_copy AS SELECT * FROM lakehouse47_3.es_holo_rolearn_nonl.mc_holo_external; -- Query the copied data in the data warehouse table. SELECT * FROM mc_holo_external_copy;
Sample result:
+------------+------------+------------+------------+ | col1 | col2 | col3 | col4 | +------------+------------+------------+------------+ | 1 | hello | test | world | +------------+------------+------------+------------+
Use the current user identity to submit an executable statement
MaxCompute allows you to use a CALL statement that contains the EXEC_EXTERNAL_QUERY() function to submit an executable statement to Hologres.
Syntax
CALL EXEC_EXTERNAL_QUERY ( '<holo_ExecuteWithUserAuth_foreign_server_name>', r"###( <holo_query>)###");
holo_ExecuteWithUserAuth_foreign_server_name
: the name of the foreign server that you created in ExecuteWithUserAuth mode.r
: the executable statement.holo_query
: the Hologres statement that is entered within the boundary.
NoteSimilar to the raw string method in C++, double quotation marks (
""
), parentheses()
, and delimiters form a boundary to prevent conflicts between characters before and after the boundary and special characters in theholo_query
parameter. You can configure custom delimiters and make sure that delimiters are used in pairs in the"[delimiter]( )[delimiter]"
format. The boundary cannot contain angle brackets (<>
). We recommend that you use"###(<holo_query>)###"
.Examples
Example 1: Submit an
INSERT OVERWRITE
statement on the Hologres internal table whose name ispublic.current_user_test
in MaxCompute. The data to be inserted comes from theSELECT current_user;
statement that is executed to query the current UID in Hologres.Execute the following statement in MaxCompute:
CALL EXEC_EXTERNAL_QUERY ( 'fs_holo_ExecuteWithUserAuth_nonl_y', r"###( CALL hg_insert_overwrite( 'public.current_user_test', $$SELECT current_user$$ );)###");
Query the data of the
public.current_user_test
table in Hologres.SELECT * FROM current_user_test;
The same UID that is used to execute the preceding statements is returned. Example:
1117xxxxxx519
.
Example 2: Submit an
INSERT OVERWRITE
statement on partition 2020 of the Hologres internal table whose name ispublic.hologres_parent_insert1
in MaxCompute. The data to be inserted comes from theSELECT * FROM mc_external_table WHERE a='2020';
statement that is executed on a MaxCompute external table in Hologres.Execute the following statement in MaxCompute:
CALL EXEC_EXTERNAL_QUERY ( 'fs_holo_ExecuteWithUserAuth_nonl_y', r"###( CALL hg_insert_overwrite( 'public.hologres_parent_insert1', '2020', $$SELECT * FROM mc_external_table WHERE a='2020'$$ );)###");
Query the data of the
public.hologres_parent_insert1
table in Hologres.-- A row of data is inserted into the Hologres internal table. SELECT * FROM hologres_parent_insert1;
Sample result:
a b c d 2020 1 2024-06-19 10:27:46.201636 a
Configure permissions on foreign servers
A foreign server is a tenant management object in MaxCompute. Whether a RAM user can perform operations on the foreign server is determined by the permissions configured on the foreign server by the tenant administrator in the RAM console. The tenant administrator can choose Permissions > Policies in the left-side navigation pane to create a policy. For more information, see the "Create a custom policy on the JSON tab" section of the Create custom policies topic.
Example: Use an Alibaba Cloud account to create a policy named ForeignServerTest and attach the policy to a RAM user. Sample policy:
{
"Version": "1",
"Statement": [
{
"Effect": "Allow",
"Action": "odps:CreateForeignServer",
"Resource": "acs:odps:*:12xxxxxxxx07:foreignservers/*"
}
]
}
Add the permissions to query a list of foreign servers and the information about a foreign server.
If you want to specify a network connection when you create a foreign server, you must add the network connection to the Resource parameter.
If you specify a RAM role, you must have the
ram:PassRole
permission on the RAM role.
{
"Version": "1",
"Statement": [
{
"Effect": "Allow",
"Action": [
"odps:CreateForeignServer",
"odps:GetForeignServer",
"odps:ListForeignServers"
],
"Resource": "acs:odps:*:12xxxxxxxx07:foreignservers/*"
}
]
}
Options for the Action parameter:
Option for the Action parameter | Description |
CreateForeignServer | Creates a foreign server. |
UpdateForeignServer | Updates a foreign server. |
DeleteForeignServer | Deletes a foreign server. |
GetForeignServer | Queries the information about a foreign server. |
ListForeignServers | Queries a list of foreign servers. |
Attach the created policy to a RAM user. Then, you can perform operations on the authorized foreign server.
Project-level tenant resource access control
The foreign servers that are introduced in the data lakehouse solution 2.0 are tenant-level resources, and the permissions on foreign servers are managed by using RAM policies. However, the permissions on project objects, such as tables, are managed by the project administrator. Whether tenant objects have the permissions to be used by projects and the allocation of permissions within projects can be managed in the following modes:
If the project-level tenant resource access control feature is enabled for a project, the creator can configure the mounting relationship between a foreign server and the project to allow the project to use the foreign server. Then, the project administrator uses a policy to grant the permissions on the foreign server to the project members.
If the project-level tenant resource access control feature is disabled for a project, all users that create an external project or schema can use the foreign server to access external systems based on the RAMRole permissions specified by the creator of the foreign server. Besides foreign servers that are introduced in the data lakehouse solution 2.0, network connections, custom images, and quotas are tenant-level resources and are managed by the project-level tenant resource access control feature.
For more information about the project-level tenant resource access control, see Project-level tenant resource access control. This feature is only available for preview and does not support enabling checks.
To enable the project-level tenant resource access control feature and use a foreign server and an external schema to access federated external tables, perform the following steps:
Click a tenant object, such as a foreign server, and select a project to which you want to mount the tenant object to configure the mounting relationship between them. The mounted tenant object is displayed in the mounted tenant object list of the project.
Configure a policy for the tenant object that is mounted to the project. For more information, see Policy-based access control.
Procedure
Log on to the MaxCompute console. In the left-side navigation pane, choose Workspace > Projects. On the Projects page, find the project and click Manage in the Actions column.
Click the Role Permissions tab. Find the role to which you want to attach a policy and click Edit Role in the Actions column.
In the Edit Role dialog box, set the Authorization Method parameter to Policy.
Modify the role policy in the Policy-based Access Control code editor.
Example
The following example shows a policy that allows a user to use the fs_hive foreign server:
{ "Statement":[ { "Action":[ "odps:Usage" ], "Effect":"Allow", "Resource":[ "acs:odps:*:servers/fs_hive" ] } ], "Version":"1" }
After a user has the permissions on tenant resources, user-level or role-level tenant resource access control can be implemented when the project-level tenant resource access control feature is enabled.
Enable the project-level tenant resource access control feature.
Log on to the MaxCompute console. In the upper-left corner of the console, select a region.
In the left-side navigation pane, choose Workspace > Projects. On the Projects page, find the project and click Manage in the Actions column.
In the Permission Properties section of the Parameter Configuration tab, click Edit.
Turn on Enable Project-level Tenant Resource Access Control and click Submit.
ImportantAfter you enable the project-level tenant resource access control feature, the project checks the permissions on the tenant objects that are in use or to be used, including foreign servers, network connections, custom images, and quotas. We recommend that you do not enable the feature before you complete the mounting relationship configuration between tenant objects and projects and the policy attachment. Otherwise, the tasks that require permissions may fail because they lack the required permissions.
Grant permissions to a federated external table in an external schema
After you create an external schema, all tables in the external schema are owned by the account of the external schema. If you want to grant the permissions on the external schema or its tables to other users, perform the following operations.
MaxCompute does not store the metadata of data sources in data lakehouse mode. In this case, permission policies are managed based on the names of data source objects. If the name of a data source object changes, the authorization becomes invalid, and you must execute the REVOKE statement or delete the policy to remove the authorization. If you fail to promptly remove the authorization, a new data source object that has the same name may inherit the permissions defined in the policy. In this case, the user is unintentionally granted permissions on the new data source object in the external project.
After you enable tenant object authentication for a project, you must mount a foreign server to the current project before you create an external schema. If you do not mount a foreign server in advance, an error occurs when you create an external schema.
Assume that the test_lakehouse_project project is used to create and manage the external schema. The following table describes the operations.
Operation | Required permission | Sample procedure |
Create an external schema |
|
|
Query a list of external schemas | The CreateInstance and List permissions on projects. |
|
Query the information about an external schema | The Describe permission on schemas. |
|
Modify the properties of an external schema | You cannot modify the properties of an external schema. | |
Delete an external schema | The Drop permission on schemas. |
|
Specify an external schema |
| Specify an external schema.
If a project is upgraded from 2-layer mode to 3-layer mode, grant the Describe permission on the specified external schema to a RAM user.
|
Query the data of a table in an external schema | The Select permission on tables in schemas |
|
Import the data of a table in an external schema to an internal table |
|
|