All Products
Search
Document Center

MaxCompute:Use the data lakehouse solution 2.0

Last Updated:Nov 12, 2024

MaxCompute provides the data lakehouse solution 2.0. The solution allows you to build management objects that define the metadata and data access methods of foreign servers. You can use the external schema mapping mechanism to access all tables within a database or schema of a foreign server. The solution allows you to build a data management platform that combines data lakes and data warehouses. The solution integrates the flexibility and broad ecosystem compatibility of data lakes with the enterprise-class deployment of data warehouses. The solution is in public preview.

Background information

  • Comparison between data warehouses and data lakes

    Category

    Capability

    Data warehouse

    Focuses on the management and constraints of structured and semi-structured data that enters data warehouses, and leverages strong centralized management to improve computing performance and standardize management capabilities.

    Data lake

    Focuses on open data storage and universal data formats and allows multiple engines to produce or consume data on demand. A data lake provides only weak centralized management to ensure flexibility, supports unstructured data, and allows you to create schemas after data is stored. A data lake helps you manage data in a more flexible manner.

  • MaxCompute data warehouses

    MaxCompute is a cloud-native data warehouse service that is built on a serverless architecture. MaxCompute allows you to perform the following operations:

    • Perform data modeling of data warehouses.

    • Use the extract, transform, load (ETL) tool to load data to a model table that is defined with a schema for storage.

    • Use the standard SQL engine to process substantial data in data warehouses and use the online analytical processing (OLAP) engine of Hologres to analyze data.

  • Data lake and federated computing scenarios supported by MaxCompute

    In data lake scenarios, data can be produced or consumed by using multiple engines. MaxCompute is one of the engines that are used to process data. MaxCompute needs to read data in multiple mainstream open source formats from the upstream of a data lake, compute data, and then continuously produce data for the downstream.

    MaxCompute provides secure, high-performance, and cost-effective data warehouses to aggregate high-value data. MaxCompute also needs to obtain metadata and data from data lakes, compute external data, and perform federated computing on the data of warehouses and data lakes.

    Additionally, MaxCompute needs to obtain data from various foreign servers, such as Hadoop and Hologres, to perform federated computing. In federated computing scenarios, MaxCompute also needs to read metadata and data from external systems.

  • Data lakehouse solution 2.0 of MaxCompute

    The data lakehouse solution 2.0 of MaxCompute allows you to access metadata or storage services of Alibaba Cloud over the cloud product interconnection network, or foreign servers in virtual private clouds (VPCs) over a dedicated connection. The solution allows you to build management objects that define the metadata and data access methods of foreign servers. You can use an external schema to map the database or schema of a foreign server to access all tables within the database or schema.

    image

    • Network connection

      For information about network connections, see the "Access over a VPC (dedicated connection)" section of the Network connection process topic. MaxCompute can use a network connection to access data sources, such as E-MapReduce (EMR) clusters and ApsaraDB RDS instances (available soon), in VPCs. For Data Lake Formation (DLF), Object Storage Service (OSS), and Hologres that are deployed in the cloud product interconnection network, MaxCompute can access data in the services without the need to configure a network connection.

    • Foreign server

      A foreign server contains the metadata and data access information and the information that is used to access data source systems, such as identity authentication information, location information, and connection protocol descriptions. A foreign server is a tenant management object and is defined by the tenant administrator.

      If the project-level tenant resource access control feature is enabled for a project, the tenant administrator mounts a foreign server to the project, and the project administrator uses a policy to grant the permissions on the foreign server to the project members.

    • External schema

      An external schema is a special schema in MaxCompute data warehouse projects. As shown in the preceding figure, an external schema can be mapped to a database or schema of a data source, which allows you to access tables and data within the database or schema of the data source. The tables that are generated from the schema mapping mechanism are called federated external tables.

      Federated external tables do not store metadata in MaxCompute. Instead, MaxCompute obtains metadata in real time by using the metadata service in foreign server objects. When you query data, you do not need to execute a DDL statement to create an external table in the data warehouse project. You can query data of a source table by referencing the names of the project, the external schema, and the source table. When the schema or data of the source table changes, the federated external table can immediately reflect the latest status of the source table. The data source hierarchy that can be mapped by an external schema is determined by the system hierarchy between the hierarchy defined by the foreign server and the table hierarchy in the data source. The hierarchy defined by the foreign server is determined by the data source hierarchy that can be accessed by the authenticated identity.

    • External project

      In the data lakehouse solution 1.0, an external project uses a 2-layer architecture and can be used to map to a database or schema of a data source in the same manner as an external schema. An external project must use a data warehouse project to read and compute external data. However, external projects are located at higher hierarchy levels. A large number of external projects may be used to map to databases or schemas of the data source, and cannot be integrated with 3-layer data warehouse projects. You can use external schemas instead of external projects in the data lakehouse solution 1.0.

      In the data lakehouse solution 2.0, external schemas provide all the capabilities of external projects in the data lakehouse solution 1.0. An external project can be mapped to a catalog or database of a 3-layer data source. This way, you can view databases in a DLF catalog or schemas in a Hologres database and use a federated external table to access the data source. The data lakehouse solution 2.0 will soon add support for external projects. For more information, see the official documentation.

    Data source type

    Hierarchy of foreign servers

    Mapping hierarchy of external schemas

    Mapping hierarchy of external projects in the data lakehouse solution 2.0 (available soon)

    Mapping hierarchy of external projects in the data lakehouse solution 1.0 (to be discontinued)

    Authentication method

    DLF+OSS

    Region-level DLF and OSS

    Catalog.Database in DLF

    Catalog in DLF

    Catalog.Database in DLF

    RAMRole

    Hive+HDFS

    EMR cluster

    Database in Hive

    Not supported

    Database in Hive

    Authentication-free

    Hologres

    Database in Hologres

    Schema

    Database

    Not supported

    RAMRole

    Note

    Different types of authentication methods are available for different data sources. MaxCompute will gradually provide multiple authentication methods in later versions, such as the current user identity method for accessing Hologres data and the Kerberos authentication method for accessing Hive data.

Limits

  • The data lakehouse solution 2.0 is supported only in the following regions: China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Shenzhen), China (Hong Kong), Singapore, and Germany (Frankfurt).

  • MaxCompute must be deployed in the same region as DLF and OSS.

  • External schema operations must be performed within the internal project, so the internal project must enable the three-layer model. You can perform these operations in the console. For more information, see Enable the schema feature.

Precautions

  • You must enable the schema mode supported by project-level metadata for the MaxCompute project, which is the target project for creating the external schema. For more details, see Schema mode supported by project-level metadata.image

  • You must enable the schema mode supported by SQL syntax before performing schema operations or querying data from an external schema. For more details, see Schema mode supported by SQL syntax.

  • When you execute theSHOW tables IN <external_schema_name>command for accessing data source system:

    • If you are running this command in the DataWorks Data Development IDE and the system fails to resolve the schema, you also need to ensure that the MaxCompute CMD integrated with the DataWorks resource group is version 0.46.8 or higher. If the CMD version is lower than 0.46.8, submit a ticket to contact MaxCompute technical support for an upgrade.

    • If you are running this command in the MaxCompute client (odpscmd), you also need to upgrade it to version 0.46.8 or higher.

    Note

    You can use theSHOW VERSION;command to view the current version.

  • The external schema you create supports viewing the list of schemas, viewing schema information, and deleting schemas, with commands and usage methods identical to those for internal schemas. For more details, see Schema-related operations.

View the created foreign servers

You can view the external tables and data sources that are created on the Foreign Server page. The following table describes the parameters.

Parameter

Description

Data Source Name

The name of the foreign server.

Type

The type of the foreign server. Valid values: DLF+OSS, Hive+HDFS, and Hologres.

Network Connection

The name of the network connection that is used by the foreign server. Only the Hive+HDFS type supports network connections.

Owner Account

The ID of the account that is used to create the foreign server. External schemas use the foreign server to access the data source system based on the identity information specified by the creator of the foreign server.

  • If the project-level tenant resource access control feature is enabled for a project, the creator can configure the mounting relationship between the foreign server and the project to allow the project to use the foreign server. Then, the project administrator uses a policy to grant the permissions on the foreign server to the project members.

  • If the project-level tenant resource access control feature is disabled for a project, all users that create an external project or schema can use the foreign server to access external systems based on the RAMRole permissions specified by the creator of the foreign server.

Mounted Projects

The number of projects to which the foreign server is mounted.

Creation Time

The time when the foreign server was created.

Update Time

The time when the editable properties of the foreign server were last modified and saved.

Actions - Mount Project

Configures the mounting relationship between the foreign server and projects.

  • If the project-level tenant resource access control feature is enabled for a project but the tenant administrator does not configure the mounting relationship between the project and the referenced foreign server, the project cannot use the foreign server to access external systems.

  • If the project-level tenant resource access control feature is disabled for a project, MaxCompute does not check the mounting relationship between the project and the referenced foreign server.

Actions - Details

Views the properties of the foreign server.

Actions - Edit

Edits the properties of the foreign server.

Important

If you modify information such as permissions, the authorization relationship configured for the project may become invalid due to the change of visible objects.

Actions - Delete

Deletes the foreign server.

Important

After you delete a foreign server, all tasks that depend on the foreign server no longer have the permissions to access external systems, and the mounting relationship between the foreign server and all projects is deleted.

Create and use a DLF+OSS data lakehouse

Data sources of the DLF+OSS type use OSS as the data lake storage service and DLF as the metadata management service. MaxCompute allows you to create foreign servers of the DLF+OSS type. MaxCompute can be used together with DLF and OSS to integrate data warehouses and data lakes to provide more flexible and efficient data management and processing capabilities.

Important

If your account has not enabled the tenant-level schema syntax switch, you must add the statement SET odps.namespace.schema=true; before your SQL statements to ensure that schema-related statements can be executed properly.

Step 1: Authorize MaxCompute to access your cloud resources

If you want to build the data lakehouse by using MaxCompute, DLF, and OSS, complete authorization by using one of the following methods. The account that is used to create the MaxCompute project cannot access DLF without authorization. You can use one of the following methods to authorize MaxCompute to access DLF:

  • One-click authorization: If you use the same account to create the MaxCompute project and deploy DLF, we recommend that you perform one-click authorization on the Cloud Resource Access Authorization page in the Resource Access Management (RAM) console.

  • Custom authorization: You can use this method regardless of whether the same account is used to create the MaxCompute project and deploy DLF. For more information, see Authorize a RAM user to access DLF.

Step 2: Create a foreign server of the DLF+OSS type

  1. Log on to the MaxCompute console. In the upper-left corner of the console, select a region.

  2. In the left-side navigation pane, choose Tenants > Foreign Server.

  3. On the Foreign Server page, click Create Foreign Server.

  4. In the Add Foreign Server dialog box, configure the parameters as prompted. The following table describes the parameters.

    Parameter

    Description

    Foreign Server Type

    The type of the foreign server. Select DLF+OSS.

    Foreign Server Name

    The name of the foreign server. The name must meet the following requirements:

    • The name contains lowercase letters, underscores (_), and digits and starts with a lowercase letter.

    • The name can be up to 128 characters in length.

    Foreign Server Description

    The description of the foreign server.

    Region

    The region in which you want to create the foreign server. This parameter is automatically set to the current region.

    DLF Endpoint

    The DLF endpoint of the current region.

    OSS Endpoint

    The OSS endpoint of the current region.

    RoleARN

    The Alibaba Cloud Resource Name (ARN) of a Resource Access Management (RAM) role. The RAM role must have the permissions to access DLF and OSS.

    You can log on to the RAM console, choose Identities > Roles in the left-side navigation pane, and then click the name of the RAM role to obtain the ARN in the Basic Information section.

    For example, acs:ram::124****:role/aliyunodpsdefaultrole.

    Foreign Server Supplemental Properties

    The properties of the foreign server. After you specify this parameter, the tasks that use the foreign server can access the data source system based on the behavior specified by this parameter.

    Note

    More properties will be supported by this parameter as service capabilities evolve. For more information, see the official documentation.

  5. Click OK.

Step 3: Create an external schema

Execute the following statement to create an external schema that references a foreign server of the DLF+OSS type:

CREATE EXTERNAL SCHEMA IF NOT EXISTS <external_schema_name>
with <dlfoss_foreign_server_name>
ON '<dlf_calaog_id>.<database_name>';

Parameters:

  • external_schema_name: the name of the external schema.

  • dlfoss_foreign_server_name: the name of the foreign server. The project to which the external schema belongs must be deployed in the same region as the foreign server.

  • dlf_calaog_id: the catalog ID of DLF. For information about how to create a catalog, see the "Create a data catalog" section of the Data catalog topic.

  • database_name: the name of the database in the catalog of DLF. For more information, see the "Create a metadatabase" section of the Metadata management topic.

Step 4: Use SQL syntax to access the data source system

  • SQL syntax: Show the names of tables in DLF and OSS schemas.

    • Method 1:

      SHOW tables IN <external_schema_name>; --The <external_schema_name> parameter specifies the name of the external schema.

    • Method 2:

      USE SCHEMA <external_schema_name>; --The <external_schema_name> parameter specifies the name of the external schema.
      SHOW tables;
  • Examples:

    1. Query the names of all tables in the external schema whose name is es_dlf.

      USE SCHEMA es_dlf;
      SHOW TABLES;

      Sample result:

      ALIYUN$xxx@test.aliyunid.com:hellocsv
      ALIYUN$xxx@test.aliyunid.com:t1
      ALIYUN$xxx@test.aliyunid.com:t2
      ALIYUN$xxx@test.aliyunid.com:t3
    2. Query the data of the hellocsv table in the es_dlf schema in the lakehouse47_3 project.

      SELECT * FROM lakehouse47_3.es_dlf.hellocsv;

      Sample result:

      +------------+------------+------------+------------+
      | col1       | col2       | col3       | col4       |
      +------------+------------+------------+------------+
      | 1          | hello      | test       | world      |
      +------------+------------+------------+------------+
    3. Copy the data of a federated external table whose name is hellocsv from the data source and paste the data to a data warehouse table.

      -- Copy the data of a federated external table and paste the data to a data warehouse table.
      CREATE TABLE hellocsv_copy AS SELECT * FROM lakehouse47_3.es_dlf.hellocsv;
      
      -- Query the copied data in the data warehouse table.
      SELECT * FROM hellocsv_copy;

      Sample result:

      +------------+------------+------------+------------+
      | col1       | col2       | col3       | col4       |
      +------------+------------+------------+------------+
      | 1          | hello      | test       | world      |
      +------------+------------+------------+------------+

Create and use a Hive+HDFS federation

Hive is a common open source big data warehouse solution. In most cases, metadata is stored in Hive Metastore Service (HMS), and data is stored in Hadoop Distributed File System (HDFS). MaxCompute allows you to create foreign servers of the Hive+HDFS type. MaxCompute can be used together with Hive in a federated manner to implement data warehouse access and data aggregation for open source big data systems.

Important
  • You are not charged for pay-as-you-go SQL federated computing in Hive+HDFS mode during the public preview.

  • If your account has not enabled the tenant-level schema syntax switch, you must add the statement SET odps.namespace.schema=true; before your SQL statements to ensure that schema-related statements can be executed properly.

Step 1: Create a foreign server of the Hive+HDFS type

  1. Log on to the MaxCompute console. In the upper-left corner of the console, select a region.

  2. In the left-side navigation pane, choose Tenants > Foreign Server.

  3. On the Foreign Server page, click Create Foreign Server.

  4. In the Add Foreign Server dialog box, configure the parameters as prompted. The following table describes the parameters.

    Parameter

    Description

    Foreign Server Type

    The type of the foreign server. Select Hive+HDFS.

    Foreign Server Name

    The name of the foreign server. The name must meet the following requirements:

    • The name contains lowercase letters, underscores (_), and digits and starts with a lowercase letter.

    • The name can be up to 128 characters in length.

    Foreign Server Description

    The description of the foreign server.

    Network Connection Object

    The name of the network connection. You can select or establish a connection between MaxCompute and the VPC of an EMR or Hadoop cluster. For more information, see the Establish a network connection between MaxCompute and the destination VPC step in the "Access over a VPC (dedicated connection)" section of the Network connection process topic.

    Note
    • For information about network connections, see the "Network connection" section of the Terms topic.

    • The VPC must be deployed in the same region as the MaxCompute foreign server and the project to which the foreign server is mounted.

    Cluster Name

    The name of the cluster. For a high-availability (HA) Hadoop cluster, the value of this parameter is the same as the name of a NameNode process. When you create a Hadoop cluster, you can obtain the cluster name by using the dfs.nameservices parameter in the hdfs-site.xml file.

    NameNode Address

    The IP addresses and port numbers of the active and standby NameNode processes in the Hadoop cluster. In most cases, the port number is 8020. To obtain the addresses, contact the Hadoop cluster administrator.

    HMS Service Address

    The HMS IP addresses and port numbers of the active and standby NameNode processes in the Hadoop cluster. In most cases, the port number is 9083. To obtain the addresses, contact the Hadoop cluster administrator.

    Authentication Type

    The authentication type. Only No Authentication Method is supported.

    Foreign Server Supplemental Properties

    The properties of the foreign server. After you specify this parameter, the tasks that use the foreign server can access the data source system based on the behavior specified by this parameter.

    Note

    More properties will be supported by this parameter as service capabilities evolve. For more information, see the official documentation.

  5. Click OK.

Step 2: Create an external schema

An external schema is an object within a project and can be created by using SQL syntax. If your account does not enable the tenant-level Schema syntax switch, you need to add the statement SET odps.namespace.schema=true; before your SQL commands to facilitate the execution of Schema-related commands.

Execute the following statement to create an external schema that references a foreign server of the Hive+HDFS type:

CREATE EXTERNAL SCHEMA IF NOT EXISTS <external_schema_name>
WITH  <hive_foreign_server_name>
ON '<database_name>' ;

Parameters:

  • external_schema_name: the name of the external schema.

  • hive_foreign_server_name: the name of the foreign server.

  • database_name: the name of the Hive database.

Step 3: Use SQL syntax to access the data source system

  • SQL syntax: Show the names of tables in Hive schemas.

    • Method 1:

      SHOW tables IN <external_schema_name>; --The <external_schema_name> parameter specifies the name of the external schema.

    • Method 2:

      USE SCHEMA <external_schema_name>; --The <external_schema_name> parameter specifies the name of the external schema.
      SHOW tables;
  • Examples:

    1. Query the names of all tables in the external schema whose name is es_hive3.

      USE SCHEMA es_hive3;
      SHOW TABLES;

      Sample result:

      ALIYUN$xxx@test.aliyunid.com:t1
    2. Query the data of the t1 table in the es_hive3 schema in the lakehouse47_3 project.

      SELECT * FROM lakehouse47_3.es_hive3.t1;

      Sample result:

      +------------+
      | id         |
      +------------+
      | 1          |
      +------------+
    3. Copy the data of a federated external table whose name is hellocsv from the data source and paste the data to a data warehouse table.

      -- Copy the data of a federated external table and paste the data to a data warehouse table.
      CREATE TABLE t1_copy AS SELECT * FROM lakehouse47_3.es_hive3.t1;
      
      -- Query the copied data in the data warehouse table.
      SELECT * FROM t1_copy;

      Sample result:

      +------------+
      | id         |
      +------------+
      | 1          |
      +------------+

Create and use a Hologres federation

Hologres is an end-to-end real-time data warehouse engine that supports real-time writes, updates, and analysis of large amounts of data. Hologres supports standard SQL syntax that is compatible with PostgreSQL, OLAP and ad hoc analysis of petabytes of data, and high concurrency, low-latency online data services. Hologres is deeply integrated with MaxCompute. Hologres allows you to create, analyze, and query OLAP models in MaxCompute data warehouses. MaxCompute allows you to create foreign servers of the Hologres type. You can use a Hologres federation to allow MaxCompute to perform the following operations:

  • Read and archive data from ultra-large read-time data warehouses.

  • Read the dimension data or the data of data mart models and perform computing in association with fact tables at the operational data store (ODS), data warehouse detail (DWD), and data warehouse service (DWS) layers.

  • Read the data of OLAP models to perform high-performance, low-cost batch processing and return the results to real-time data warehouses by using Hologres external tables or remote function calls for analysis.

Important

If your account has not enabled the tenant-level schema syntax switch, you must add the statement SET odps.namespace.schema=true; before your SQL statements to ensure that schema-related statements can be executed properly.

Step 1: Create a foreign server of the Hologres type

  1. Log on to the MaxCompute console. In the upper-left corner of the console, select a region.

  2. In the left-side navigation pane, choose Tenants > Foreign Server.

  3. On the Foreign Server page, click Create Foreign Server.

  4. In the Add Foreign Server dialog box, configure the parameters as prompted. The following table describes the parameters.

    Parameter

    Description

    Foreign Server Type

    The type of the foreign server. Select Hologres.

    Foreign Server Name

    The name of the foreign server. The name must meet the following requirements:

    • The name contains lowercase letters, underscores (_), and digits and starts with a lowercase letter.

    • The name can be up to 128 characters in length.

    Foreign Server Description

    The description of the foreign server.

    Connection Method

    The connection method of the foreign server. Only cloud product interconnection for connecting cloud products is supported.

    Host

    The hostname of the Hologres instance.

    You can log on to the Hologres console, click Instances in the left-side navigation pane, and then click the ID of the Hologres instance to obtain the hostname in the Network Information section of the Instance Details page.

    For example, hgpostcn-cn-3m***-cn-shanghai-internal.hologres.aliyuncs.com.

    Port

    The port number of the Hologres instance.

    You can log on to the Hologres console, click Instances in the left-side navigation pane, and then click the ID of the Hologres instance to obtain the port number in the Network Information section of the Instance Details page. The port is generally80.

    DBNAME

    The name of a database in the Hologres instance.

    Authentication Method

    • RAMRole: assumes a RAM role to perform authentication. This method allows you to access Hologres across Alibaba Cloud accounts. Federated external tables support only the RAMRole method. For example, RoleARN:acs:ram::uid:role/aliyunodpsholorole. For more information about the role configuration and authentication of thealiyunodpsholorole, see Create a Hologres external table in STS mode.

    • ExecuteWithUserAuth: allows you to use the same Alibaba Cloud account or RAM user to access authorized tables and data in MaxCompute and Hologres. You can also execute the CALL EXEC_EXTERNAL_QUERY statement in MaxCompute and Hologres without the need to configure additional authentication information.

      Important

      The ExecuteWithUserAuth method is not supported for federated external tables.

    RoleARN

    The ARN of the RAM role. The RAM role must have the permissions to access Hologres. This parameter is required when you set the Authentication Method parameter to RAMRole.

    You can log on to the RAM console, choose Identities > Roles in the left-side navigation pane, and then click the name of the RAM role to obtain the ARN in the Basic Information section.

    Foreign Server Supplemental Properties

    The properties of the foreign server. After you specify this parameter, the tasks that use the foreign server can access the data source system based on the behavior specified by this parameter.

    Note

    More properties will be supported by this parameter as service capabilities evolve. For more information, see the official documentation.

  5. Click OK.

Step 2: Create an external schema

An external schema is an object within a project and can be created by using SQL syntax. If your account does not enable the tenant-level Schema syntax switch, you need to add the statement SET odps.namespace.schema=true; before your SQL commands to facilitate the execution of Schema-related commands.

Execute the following statement to create an external schema that references a foreign server of the Hologres type:

CREATE EXTERNAL SCHEMA IF NOT EXISTS <external_schema_name>
with  <holo_foreign_server_name>
ON '<holoschema_name>' ;

Parameters:

  • external_schema_name: the name of the external schema.

  • holo_foreign_server_name: the name of the foreign server.

  • holoschema_name: the name of the Hologres schema to be mapped.

Step 3: Use SQL syntax to access the data source system

  • SQL syntax: Show the names of tables in Hologres schemas.

    • Method 1:

      SHOW tables IN <external_schema_name>; --The <external_schema_name> parameter specifies the name of the external schema.

    • Method 2:

      USE SCHEMA <external_schema_name>; --The <external_schema_name> parameter specifies the name of the external schema.
      SHOW tables;
  • Examples:

    1. Query the names of all tables in the external schema whose name is es_holo_rolearn_nonl.

      SET odps.namespace.schema=true;
      USE SCHEMA es_holo_rolearn_nonl;
      SHOW TABLES;

      Sample result:

      ALIYUN$xxx@test.aliyunid.com:mc_holo_external
    2. Query the data of the mc_holo_external table in the es_holo_rolearn_nonl schema in the lakehouse47_3 project.

      SELECT * FROM lakehouse47_3.es_holo_rolearn_nonl.mc_holo_external;

      Sample result:

      +------------+------------+------------+------------+
      | col1       | col2       | col3       | col4       |
      +------------+------------+------------+------------+
      | 1          | hello      | test       | world      |
      +------------+------------+------------+------------+
    3. Insert data into the table.

      INSERT INTO lakehouse47_3.es_holo_rolearn_nonl.mc_holo_external VALUES(2,'hello','test','holo');
      
      SELECT * FROM lakehouse47_3.es_holo_rolearn_nonl.mc_holo_external;

      Sample result:

      +------------+------------+------------+------------+
      | col1       | col2       | col3       | col4       |
      +------------+------------+------------+------------+
      | 1          | hello      | test       | world      |
      | 2          | hello      | test       | holo       |
      +------------+------------+------------+------------+
    4. Copy the data of a federated external table whose name is hellocsv from the data source and paste the data to a data warehouse table.

      -- Copy the data of a federated external table and paste the data to a data warehouse table.
      CREATE TABLE mc_holo_external_copy AS SELECT * FROM lakehouse47_3.es_holo_rolearn_nonl.mc_holo_external;
      
      -- Query the copied data in the data warehouse table.
      SELECT * FROM mc_holo_external_copy;

      Sample result:

      +------------+------------+------------+------------+
      | col1       | col2       | col3       | col4       |
      +------------+------------+------------+------------+
      | 1          | hello      | test       | world      |
      +------------+------------+------------+------------+

Use the current user identity to submit an executable statement

MaxCompute allows you to use a CALL statement that contains the EXEC_EXTERNAL_QUERY() function to submit an executable statement to Hologres.

  • Syntax

    CALL EXEC_EXTERNAL_QUERY (
      '<holo_ExecuteWithUserAuth_foreign_server_name>',
      r"###(
        <holo_query>)###");
    • holo_ExecuteWithUserAuth_foreign_server_name: the name of the foreign server that you created in ExecuteWithUserAuth mode.

    • r: the executable statement.

    • holo_query: the Hologres statement that is entered within the boundary.

    Note

    Similar to the raw string method in C++, double quotation marks (""), parentheses (), and delimiters form a boundary to prevent conflicts between characters before and after the boundary and special characters in the holo_query parameter. You can configure custom delimiters and make sure that delimiters are used in pairs in the "[delimiter]( )[delimiter]" format. The boundary cannot contain angle brackets (<>). We recommend that you use "###(<holo_query>)###".

  • Examples

    • Example 1: Submit an INSERT OVERWRITE statement on the Hologres internal table whose name is public.current_user_test in MaxCompute. The data to be inserted comes from the SELECT current_user; statement that is executed to query the current UID in Hologres.

      1. Execute the following statement in MaxCompute:

        CALL EXEC_EXTERNAL_QUERY (
          'fs_holo_ExecuteWithUserAuth_nonl_y',
          r"###(
            CALL hg_insert_overwrite(
              'public.current_user_test',
              $$SELECT current_user$$
        );)###");
      2. Query the data of the public.current_user_test table in Hologres.

        SELECT * FROM current_user_test;

        The same UID that is used to execute the preceding statements is returned. Example: 1117xxxxxx519.

    • Example 2: Submit an INSERT OVERWRITE statement on partition 2020 of the Hologres internal table whose name is public.hologres_parent_insert1 in MaxCompute. The data to be inserted comes from the SELECT * FROM mc_external_table WHERE a='2020'; statement that is executed on a MaxCompute external table in Hologres.

      1. Execute the following statement in MaxCompute:

        CALL EXEC_EXTERNAL_QUERY (
          'fs_holo_ExecuteWithUserAuth_nonl_y',
          r"###(
            CALL hg_insert_overwrite(
              'public.hologres_parent_insert1',
              '2020',
              $$SELECT * FROM mc_external_table WHERE a='2020'$$
        );)###");
      2. Query the data of the public.hologres_parent_insert1 table in Hologres.

        -- A row of data is inserted into the Hologres internal table.
        SELECT * FROM hologres_parent_insert1;

        Sample result:

        a			b	c														d
        2020	1	2024-06-19 10:27:46.201636	a

Configure permissions on foreign servers

A foreign server is a tenant management object in MaxCompute. Whether a RAM user can perform operations on the foreign server is determined by the permissions configured on the foreign server by the tenant administrator in the RAM console. The tenant administrator can choose Permissions > Policies in the left-side navigation pane to create a policy. For more information, see the "Create a custom policy on the JSON tab" section of the Create custom policies topic.

Example: Use an Alibaba Cloud account to create a policy named ForeignServerTest and attach the policy to a RAM user. Sample policy:

{
    "Version": "1",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "odps:CreateForeignServer",
            "Resource": "acs:odps:*:12xxxxxxxx07:foreignservers/*"
        }
    ]
}

Add the permissions to query a list of foreign servers and the information about a foreign server.

Note
  • If you want to specify a network connection when you create a foreign server, you must add the network connection to the Resource parameter.

  • If you specify a RAM role, you must have the ram:PassRole permission on the RAM role.

{
    "Version": "1",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "odps:CreateForeignServer",
                "odps:GetForeignServer",
                "odps:ListForeignServers"
            ],
            "Resource": "acs:odps:*:12xxxxxxxx07:foreignservers/*"
        }
    ]
}

Options for the Action parameter:

Option for the Action parameter

Description

CreateForeignServer

Creates a foreign server.

UpdateForeignServer

Updates a foreign server.

DeleteForeignServer

Deletes a foreign server.

GetForeignServer

Queries the information about a foreign server.

ListForeignServers

Queries a list of foreign servers.

Attach the created policy to a RAM user. Then, you can perform operations on the authorized foreign server.

Project-level tenant resource access control

The foreign servers that are introduced in the data lakehouse solution 2.0 are tenant-level resources, and the permissions on foreign servers are managed by using RAM policies. However, the permissions on project objects, such as tables, are managed by the project administrator. Whether tenant objects have the permissions to be used by projects and the allocation of permissions within projects can be managed in the following modes:

  • If the project-level tenant resource access control feature is enabled for a project, the creator can configure the mounting relationship between a foreign server and the project to allow the project to use the foreign server. Then, the project administrator uses a policy to grant the permissions on the foreign server to the project members.

  • If the project-level tenant resource access control feature is disabled for a project, all users that create an external project or schema can use the foreign server to access external systems based on the RAMRole permissions specified by the creator of the foreign server. Besides foreign servers that are introduced in the data lakehouse solution 2.0, network connections, custom images, and quotas are tenant-level resources and are managed by the project-level tenant resource access control feature.

Note

For more information about the project-level tenant resource access control, see Project-level tenant resource access control. This feature is only available for preview and does not support enabling checks.

To enable the project-level tenant resource access control feature and use a foreign server and an external schema to access federated external tables, perform the following steps:

  1. Click a tenant object, such as a foreign server, and select a project to which you want to mount the tenant object to configure the mounting relationship between them. The mounted tenant object is displayed in the mounted tenant object list of the project.

  2. Configure a policy for the tenant object that is mounted to the project. For more information, see Policy-based access control.

    • Procedure

      1. Log on to the MaxCompute console. In the left-side navigation pane, choose Workspace > Projects. On the Projects page, find the project and click Manage in the Actions column.

      2. Click the Role Permissions tab. Find the role to which you want to attach a policy and click Edit Role in the Actions column.

      3. In the Edit Role dialog box, set the Authorization Method parameter to Policy.

      4. Modify the role policy in the Policy-based Access Control code editor.

    • Example

      The following example shows a policy that allows a user to use the fs_hive foreign server:

      {
          "Statement":[
              {
                  "Action":[
                      "odps:Usage"
                  ],
                  "Effect":"Allow",
                  "Resource":[
                      "acs:odps:*:servers/fs_hive"
                  ]
              }
          ],
          "Version":"1"
      }

      After a user has the permissions on tenant resources, user-level or role-level tenant resource access control can be implemented when the project-level tenant resource access control feature is enabled.

  3. Enable the project-level tenant resource access control feature.

    1. Log on to the MaxCompute console. In the upper-left corner of the console, select a region.

    2. In the left-side navigation pane, choose Workspace > Projects. On the Projects page, find the project and click Manage in the Actions column.

    3. In the Permission Properties section of the Parameter Configuration tab, click Edit.

    4. Turn on Enable Project-level Tenant Resource Access Control and click Submit.

    Important

    After you enable the project-level tenant resource access control feature, the project checks the permissions on the tenant objects that are in use or to be used, including foreign servers, network connections, custom images, and quotas. We recommend that you do not enable the feature before you complete the mounting relationship configuration between tenant objects and projects and the policy attachment. Otherwise, the tasks that require permissions may fail because they lack the required permissions.

Grant permissions to a federated external table in an external schema

After you create an external schema, all tables in the external schema are owned by the account of the external schema. If you want to grant the permissions on the external schema or its tables to other users, perform the following operations.

Important
  • MaxCompute does not store the metadata of data sources in data lakehouse mode. In this case, permission policies are managed based on the names of data source objects. If the name of a data source object changes, the authorization becomes invalid, and you must execute the REVOKE statement or delete the policy to remove the authorization. If you fail to promptly remove the authorization, a new data source object that has the same name may inherit the permissions defined in the policy. In this case, the user is unintentionally granted permissions on the new data source object in the external project.

  • After you enable tenant object authentication for a project, you must mount a foreign server to the current project before you create an external schema. If you do not mount a foreign server in advance, an error occurs when you create an external schema.

Assume that the test_lakehouse_project project is used to create and manage the external schema. The following table describes the operations.

Operation

Required permission

Sample procedure

Create an external schema

  • The CreateSchema permission on projects.

  • The Usage permission on foreign servers if the project-level tenant resource access control feature is enabled.

  1. Specify a project and grant the CreateSchema permission on the project to a RAM user.

    -- Specify a project.
    use test_lakehouse_project;
    
    -- Grant the CreateSchema permission on the project to a RAM user.
    GRANT CreateSchema ON project test_lakehouse_project TO USER RAM$xxx@test.aliyunid.com:test_user;
  2. (Optional) If the project-level tenant resource access control feature is enabled for the project and a foreign server is mounted to the project, perform the following operations on the project:

    -- Add a RAM account system for the project.
    ADD accountprovider ram;
    
    -- Add a RAM user.
    ADD USER `RAM$xxx@test.aliyunid.com:test_user`;
    
    -- Create a RAM role for the project.
    CREATE role test_lakhouse_role;
    
    -- Attach a policy to the RAM role to allow the RAM role to use the foreign server.
    put policy D:\bin\allow.policy ON role test_lakhouse_role;
    
    -- Grant the permissions of the RAM role to the RAM user.
    GRANT role test_lakhouse_role TO `RAM$xxx@test.aliyunid.com:test_user`;

    Content of the allow.policy file:

    {
      "Version": "1",
      "Statement": [{
        "Action": "odps:Usage",
        "Effect": "Allow",
        "Resource": ["acs:odps:*:servers/fs_hive"]
      }]
    }
  3. Create an external schema.

    CREATE EXTERNAL SCHEMA IF NOT EXISTS es_hive3
    with fs_hive
    ON 'default' ;

Query a list of external schemas

The CreateInstance and List permissions on projects.

  1. Grant the CreateInstance permission on a project to a RAM user.

    GRANT CreateInstance ON project test_lakehouse_project TO USER RAM$xxx@test.aliyunid.com:test_user;
  2. Grant the List permission on the project to the RAM user.

    GRANT List ON project test_lakehouse_project TO USER RAM$xxx@test.aliyunid.com:test_user;
  3. Query a list of schemas.

    SHOW schemas;

Query the information about an external schema

The Describe permission on schemas.

  1. Grant the Describe permission on an external schema to a RAM user.

    GRANT DESCRIBE ON SCHEMA es_hive3 TO USER RAM$xxx@test.aliyunid.com:test_user;
  2. Query the information about the external schema.

    DESC SCHEMA es_hive3;

Modify the properties of an external schema

You cannot modify the properties of an external schema.

Delete an external schema

The Drop permission on schemas.

  1. Grant the Drop permission on an external schema to a RAM user.

    GRANT DROP ON SCHEMA es_hive3 TO USER RAM$xxx@test.aliyunid.com:test_user;
  2. Delete the external schema.

    DROP SCHEMA es_hive3;

Specify an external schema

  • No permissions for 3-layer projects.

  • The Describe permission on schemas if a project is upgraded from 2-layer mode to 3-layer mode.

Specify an external schema.

USE SCHEMA es_hive3;

If a project is upgraded from 2-layer mode to 3-layer mode, grant the Describe permission on the specified external schema to a RAM user.

GRANT DESCRIBE ON SCHEMA es_hive3 TO USER RAM$xxx@test.aliyunid.com:test_user;

Query the data of a table in an external schema

The Select permission on tables in schemas

  1. Grant the Select permission on a table in an external schema to a RAM user.

    GRANT SELECT ON TABLE es_hive3.t1 TO USER RAM$xxx@test.aliyunid.com:test_user;
  2. Query the data of the table in the external schema.

    SELECT * FROM es_hive3.t1;

Import the data of a table in an external schema to an internal table

  • The Select permission on tables in schemas.

  • The CreateTable and CreateInstance permissions on projects.

  1. Grant the CreateTable permission on a project to a RAM user.

    GRANT CreateTable ON project test_lakehouse_project TO USER RAM$xxx@test.aliyunid.com:test_user;
  2. Grant the CreateInstance permission on the project to the RAM user.

    GRANT CreateInstance ON project test_lakehouse_project TO USER RAM$xxx@test.aliyunid.com:test_user;
  3. Grant the Select permission on a table in an external schema to the RAM user.

    GRANT SELECT ON TABLE es_hive3.t1 TO USER RAM$xxx@test.aliyunid.com:test_user;
  4. Import the data of the table in the external schema to an internal table.

    CREATE TABLE default.t1_copy_ram3 AS SELECT * FROM t1;