All Products
Search
Document Center

Hologres:CREATE EXTERNAL DATABASE

Last Updated:Dec 02, 2024

Hologres V3.0 and later support external databases. This topic introduces the definition of external databases. This topic also describes how to use an external database to manage tables in external data sources and implement federated queries by using internal tables and external tables.

Background information

The CREATE EXTERNAL DATABASE statement is used to create an external database in a Hologres instance. You can use an external database to load the metadata of external data sources to Hologres. This allows you to manage internal and external data in Hologres and helps implement centralized metadata management by using the lakehouse architecture.

Internal data refers to the data stored in the standard storage class and Infrequent Access (IA) storage class in Hologres. External data refers to the data stored in MaxCompute and Data Lake Formation (DLF) data lakes. Hologres is seamlessly integrated with MaxCompute. Hologres allows you to directly read data from and write data to MaxCompute to implement high-speed data exchange. Hologres also allows you to use DLF to manage data in Object Storage Service (OSS) data lakes and use Hologres Query Engine (HQE) to achieve high-performance reads and writes of data in data lakes such as Apache Paimon, Hudi, and Delta Lake.

External databases are globally visible. After you connect to a database in Hologres, you can read data from and write data to an external data source by specifying an external table in the ext_db.ext_schema.ext_table format. You can also connect to an external database to execute SQL statements and configure GUC parameters.

Permissions on external databases are managed by the external data sources, such as MaxCompute and DLF. Hologres only passes identities to external data sources. By default, the Hologres service-linked role is used for identity pass-through. Before you access an external data source, you must use your Alibaba Cloud account or a RAM user with the required permissions to grant permissions to the service-linked role. For more information, see Service-linked role for Hologres. Hologres also allows you to use RAM roles to access external data sources. This way, you can customize policies for accessing external data sources. For more information, see RAM role overview. If you use an account which is not an Alibaba Cloud account or a RAM user, such as a BASIC account, you can execute the CREATE USER MAPPING statement to perform account association and authorization. .

Prerequisites

  • Data lake acceleration is enabled for your Hologres instance. To enable data lake acceleration, log on to the Hologres console. On the Instances page, find your Hologres instance and click Data Lake Acceleration in the Actions column. Then, click Enable in the message that appears. You can also enable data lake acceleration on the Instance Details page of your Hologres instance.

  • DLF is activated. For more information, see Getting Started. For information about the regions in which DLF is supported, see Supported regions and endpoints.

  • If you use DLF 1.0, make sure that OSS is activated and required permissions on OSS are granted. If you access data in OSS by using foreign tables, you must be granted access permissions on OSS. Otherwise, you are not allowed to query data in OSS even if foreign tables are created. For more information about how to grant permissions on OSS, see Bucket Policy.

  • Optional. If you want to use OSS-HDFS, make sure that OSS-HDFS is enabled. For more information, see Enable OSS-HDFS and grant access permissions.

Limits

  • You must be granted superuser permissions before you can execute this statement.

  • After you connect to an internal database, you can access tables in external databases by specifying the tables in the ext_db.ext_schema.ext_table format. However, after you connect to an external database, you cannot access tables in the internal database.

  • After you create an external database to map data in an external data source, Hologres caches the metadata of the external data source for five minutes. This helps improve access efficiency. The cache timeout period cannot be changed.

  • If the external data source is a MaxCompute project, only MaxCompute internal projects are supported. External projects are not supported.

Syntax description

MaxCompute data source

Syntax

CREATE EXTERNAL DATABASE <ext_database_name> WITH 
  metastore_type 'maxcompute'
  mc_project 'project_name' 
  [comment 'xxx'];

Parameters

Parameter

Description

Required

Default value

metastore_type

The metadata storage type of the external data source. The value for MaxCompute is maxcompute. Values are not case-sensitive.

Yes

No default value

mc_project

The name of the MaxCompute project whose data you want to map. Only internal MaxCompute projects are supported.

Yes

No default value

comment

The description of the database.

No

No default value

DLF data source

Syntax

Syntax for DLF 2.0
Note
  • DLF 2.0 supports only the managed storage mode. When you create metadata mappings in Hologres by executing the CREATE EXTERNAL DATABASE statement, you only need to specify the DLF-related parameters.

  • DLF 2.0 supports only Apache Paimon catalogs. When you create metadata mappings in Hologres, you must set the metastore_type parameter to dlf-paimon.

  • Based on the managed storage mode of DLF 2.0, Hologres allows you to use the service-linked role or a RAM role to access external data sources. By default, the service-linked role is used for identity pass-through. Before you use the service-linked role, you must grant permissions to the service-linked role AliyunServiceRoleForHologresIdentityMgmt. For more information, see Service-linked role for Hologres. If you want to use a RAM role to define permissions, execute the CREATE USER MAPPING statement.

CREATE EXTERNAL DATABASE <ext_database_name> WITH
  metastore_type ' dlf-paimon'
  catalog_type 'paimon'
  dlf_region ' ' 
  dlf_endpoint ' '
  dlf_catalog ' ' 
  [comment 'xxx']; 
Syntax for DLF 1.0
Note
  • DLF 1.0 supports OSS storage and multiple lake formats, including Apache Paimon, Hudi, Delta Lake, ORC, Parquet, and CSV. For more information, see Real-time data lake solution.

  • If you want to use DLF 1.0 to create an external database, you must set the catalog_type parameter to dlf and provide the DLF and OSS endpoints as well as the AccessKey pair for authentication.

  • DLF 1.0 does not allow you to use the service-linked role or a RAM role for authentication.

CREATE EXTERNAL DATABASE <ext_database_name> WITH
  metastore_type 'dlf'
  dlf_region ' ' 
  dlf_endpoint ' '
  dlf_catalog ' ' 
  oss_endpoint ''
  [comment 'xxx']; 

Parameters

Parameter

Description

Required

Default value

metastore_type

The metadata storage type of the external data source. For DLF 1.0, the value is dlf. For DLF 2.0, the value is in the dlf-xxx format. Currently, only the value dlf-paimon is supported.

Yes

No default value

catalog_type

The type of the table in the external data source whose data you want to map. Valid values for DLF 1.0: paimon, hudi, deltalake, orc, parquet, csv, and sequencefile. Valid values for DLF 2.0: paimon.

Yes

No default value

dlf_region

The ID of the region in which the DLF service resides in the <nation>-<region> format. Example: cn-beijing. For more information about the regions supported by DLF, see

Supported regions and endpoints.

Yes

No default value

dlf_endpoint

The internal endpoint used by MaxCompute and Hologres to access DLF. The value is in the dlf-share.<nationg>-<region>.aliyuncs.com format. For more information, see Supported regions and endpoints.

Yes

No default value

dlf_catalog

The catalog to which the DLF data source belongs.

Yes

No default value

dlf_access_id

The AccessKey ID of the Alibaba Cloud account used to access DLF.

Yes

No default value

dlf_access_key

The AccessKey secret of the Alibaba Cloud account used to access DLF.

Yes

No default value

oss_endpoint

The endpoint used to access OSS.

  • If you want to access native OSS, we recommend that you use the internal endpoint of OSS to improve access performance.

  • You can access OSS-HDFS only by using internal endpoints. For more information about how to obtain endpoints, see Use OSS-HDFS in EMR Hive or Spark.

Yes

No default value

oss_access_id

The AccessKey ID of the Alibaba Cloud account used to access OSS.

Yes

No default value

oss_access_key

The AccessKey secret of the Alibaba Cloud account used to access OSS.

Yes

No default value

COMMENT

The description of the database.

No

No default value

Related operations

Query all external databases in an instance

SELECT database_name, options FROM hologres.hg_external_databases();

Update metadata of an external database

REFRESH CACHE FOR EXTERNAL DATABASE <EXT_DB_NAME> WITH( cache_level = 'metadata');

Examples

MaxCompute data source

Note

For a MaxCompute project that uses the two-layer model, if you create an external database for the MaxCompute project in Hologres, tables in the external database are displayed in the ext_db.ext_schema.mc_table format by default. When you query data from tables in the external database, you must specify the tables in the three-layer format. The schema name is default by default. Example: SELECT xxx FROM ext_db.default.mc_table.

-- Create an external database. 
CREATE EXTERNAL DATABASE ext_database_mc WITH 
  metastore_type 'maxcompute'
  mc_project 'mc_3_layer_project' 
  comment 'mc three layer project';
  
-- Query data. 
SELECT * FROM ext_database_mc.mc_schema.mc_table;

DLF data source

DLF 2.0

-- Create an external database. 
CREATE EXTERNAL DATABASE ext_database_dlf2 WITH
  metastore_type 'dlf-paimon'
  catalog_type 'paimon'
  dlf_region 'cn-beijing' 
  dlf_endpoint 'dlfnext-share.cn-beijing.aliyuncs.com'
  dlf_catalog 'clg-paimon-xxxxxxxxxxx' 
  comment 'DLF2.0 paimon catalog'; 

-- Query data. 
SELECT * FROM ext_database_dlf2.dlf_db.paimon_table;

DLF 1.0

-- Create an external database. 
CREATE EXTERNAL DATABASE ext_database_dlf1 WITH
  metastore_type 'dlf'
  dlf_region 'cn-beijing' 
  dlf_endpoint 'dlf-share.cn-beijing.aliyuncs.com'
  dlf_catalog 'hive_catalog' 
  oss_endpoint 'oss-cn-beijing-internal.aliyuncs.com'
  comment 'DLF1.0 hive catalog'; 

-- Create user mappings for the current user.
CREATE USER MAPPING FOR current_user
EXTERNAL DATABASE ext_database_dlf1
OPTIONS
  (
      -- For DLF 1.0
      dlf_access_id 'LTxxxxxxxxxx',
      dlf_access_key 'y8xxxxxxxxxxxxx',
      oss_access_id 'LTxxxxxxxxxx',
      oss_access_key 'y8xxxxxxxxxxxxx'
  );

-- Query data. 
SELECT * FROM ext_database_dlf1.dlf_db.hive_table;