Hologres V3.0 and later support external databases. This topic introduces the definition of external databases. This topic also describes how to use an external database to manage tables in external data sources and implement federated queries by using internal tables and external tables.
Background information
The CREATE EXTERNAL DATABASE
statement is used to create an external database in a Hologres instance. You can use an external database to load the metadata of external data sources to Hologres. This allows you to manage internal and external data in Hologres and helps implement centralized metadata management by using the lakehouse architecture.
Internal data refers to the data stored in the standard storage class and Infrequent Access (IA) storage class in Hologres. External data refers to the data stored in MaxCompute and Data Lake Formation (DLF) data lakes. Hologres is seamlessly integrated with MaxCompute. Hologres allows you to directly read data from and write data to MaxCompute to implement high-speed data exchange. Hologres also allows you to use DLF to manage data in Object Storage Service (OSS) data lakes and use Hologres Query Engine (HQE) to achieve high-performance reads and writes of data in data lakes such as Apache Paimon, Hudi, and Delta Lake.
External databases are globally visible. After you connect to a database in Hologres, you can read data from and write data to an external data source by specifying an external table in the ext_db.ext_schema.ext_table
format. You can also connect to an external database to execute SQL statements and configure GUC parameters.
Permissions on external databases are managed by the external data sources, such as MaxCompute and DLF. Hologres only passes identities to external data sources. By default, the Hologres service-linked role is used for identity pass-through. Before you access an external data source, you must use your Alibaba Cloud account or a RAM user with the required permissions to grant permissions to the service-linked role. For more information, see Service-linked role for Hologres. Hologres also allows you to use RAM roles to access external data sources. This way, you can customize policies for accessing external data sources. For more information, see RAM role overview. If you use an account which is not an Alibaba Cloud account or a RAM user, such as a BASIC account, you can execute the CREATE USER MAPPING
statement to perform account association and authorization. .
Prerequisites
Data lake acceleration is enabled for your Hologres instance. To enable data lake acceleration, log on to the Hologres console. On the Instances page, find your Hologres instance and click Data Lake Acceleration in the Actions column. Then, click Enable in the message that appears. You can also enable data lake acceleration on the Instance Details page of your Hologres instance.
DLF is activated. For more information, see Getting Started. For information about the regions in which DLF is supported, see Supported regions and endpoints.
If you use DLF 1.0, make sure that OSS is activated and required permissions on OSS are granted. If you access data in OSS by using foreign tables, you must be granted access permissions on OSS. Otherwise, you are not allowed to query data in OSS even if foreign tables are created. For more information about how to grant permissions on OSS, see Bucket Policy.
Optional. If you want to use OSS-HDFS, make sure that OSS-HDFS is enabled. For more information, see Enable OSS-HDFS and grant access permissions.
Limits
You must be granted superuser permissions before you can execute this statement.
After you connect to an internal database, you can access tables in external databases by specifying the tables in the
ext_db.ext_schema.ext_table
format. However, after you connect to an external database, you cannot access tables in the internal database.After you create an external database to map data in an external data source, Hologres caches the metadata of the external data source for five minutes. This helps improve access efficiency. The cache timeout period cannot be changed.
If the external data source is a MaxCompute project, only MaxCompute internal projects are supported. External projects are not supported.
Syntax description
MaxCompute data source
Syntax
CREATE EXTERNAL DATABASE <ext_database_name> WITH
metastore_type 'maxcompute'
mc_project 'project_name'
[comment 'xxx'];
Parameters
Parameter | Description | Required | Default value |
metastore_type | The metadata storage type of the external data source. The value for MaxCompute is | Yes | No default value |
mc_project | The name of the MaxCompute project whose data you want to map. Only internal MaxCompute projects are supported. | Yes | No default value |
comment | The description of the database. | No | No default value |
DLF data source
Syntax
Syntax for DLF 2.0
DLF 2.0 supports only the managed storage mode. When you create metadata mappings in Hologres by executing the
CREATE EXTERNAL DATABASE
statement, you only need to specify the DLF-related parameters.DLF 2.0 supports only Apache Paimon catalogs. When you create metadata mappings in Hologres, you must set the metastore_type parameter to
dlf-paimon
.Based on the managed storage mode of DLF 2.0, Hologres allows you to use the service-linked role or a RAM role to access external data sources. By default, the service-linked role is used for identity pass-through. Before you use the service-linked role, you must grant permissions to the service-linked role AliyunServiceRoleForHologresIdentityMgmt. For more information, see Service-linked role for Hologres. If you want to use a RAM role to define permissions, execute the
CREATE USER MAPPING
statement.
CREATE EXTERNAL DATABASE <ext_database_name> WITH
metastore_type ' dlf-paimon'
catalog_type 'paimon'
dlf_region ' '
dlf_endpoint ' '
dlf_catalog ' '
[comment 'xxx'];
Syntax for DLF 1.0
DLF 1.0 supports OSS storage and multiple lake formats, including Apache Paimon, Hudi, Delta Lake, ORC, Parquet, and CSV. For more information, see Real-time data lake solution.
If you want to use DLF 1.0 to create an external database, you must set the catalog_type parameter to dlf and provide the DLF and OSS endpoints as well as the AccessKey pair for authentication.
DLF 1.0 does not allow you to use the service-linked role or a RAM role for authentication.
CREATE EXTERNAL DATABASE <ext_database_name> WITH
metastore_type 'dlf'
dlf_region ' '
dlf_endpoint ' '
dlf_catalog ' '
oss_endpoint ''
[comment 'xxx'];
Parameters
Parameter | Description | Required | Default value |
metastore_type | The metadata storage type of the external data source. For DLF 1.0, the value is dlf. For DLF 2.0, the value is in the dlf-xxx format. Currently, only the value dlf-paimon is supported. | Yes | No default value |
catalog_type | The type of the table in the external data source whose data you want to map. Valid values for DLF 1.0: paimon, hudi, deltalake, orc, parquet, csv, and sequencefile. Valid values for DLF 2.0: paimon. | Yes | No default value |
dlf_region | The ID of the region in which the DLF service resides in the <nation>-<region> format. Example: cn-beijing. For more information about the regions supported by DLF, see | Yes | No default value |
dlf_endpoint | The internal endpoint used by MaxCompute and Hologres to access DLF. The value is in the dlf-share.<nationg>-<region>.aliyuncs.com format. For more information, see Supported regions and endpoints. | Yes | No default value |
dlf_catalog | The catalog to which the DLF data source belongs. | Yes | No default value |
dlf_access_id | The AccessKey ID of the Alibaba Cloud account used to access DLF. | Yes | No default value |
dlf_access_key | The AccessKey secret of the Alibaba Cloud account used to access DLF. | Yes | No default value |
oss_endpoint | The endpoint used to access OSS.
| Yes | No default value |
oss_access_id | The AccessKey ID of the Alibaba Cloud account used to access OSS. | Yes | No default value |
oss_access_key | The AccessKey secret of the Alibaba Cloud account used to access OSS. | Yes | No default value |
COMMENT | The description of the database. | No | No default value |
Related operations
Query all external databases in an instance
SELECT database_name, options FROM hologres.hg_external_databases();
Update metadata of an external database
REFRESH CACHE FOR EXTERNAL DATABASE <EXT_DB_NAME> WITH( cache_level = 'metadata');
Examples
MaxCompute data source
For a MaxCompute project that uses the two-layer model, if you create an external database for the MaxCompute project in Hologres, tables in the external database are displayed in the ext_db.ext_schema.mc_table
format by default. When you query data from tables in the external database, you must specify the tables in the three-layer format. The schema name is default
by default. Example: SELECT xxx FROM ext_db.default.mc_table
.
-- Create an external database.
CREATE EXTERNAL DATABASE ext_database_mc WITH
metastore_type 'maxcompute'
mc_project 'mc_3_layer_project'
comment 'mc three layer project';
-- Query data.
SELECT * FROM ext_database_mc.mc_schema.mc_table;
DLF data source
DLF 2.0
-- Create an external database.
CREATE EXTERNAL DATABASE ext_database_dlf2 WITH
metastore_type 'dlf-paimon'
catalog_type 'paimon'
dlf_region 'cn-beijing'
dlf_endpoint 'dlfnext-share.cn-beijing.aliyuncs.com'
dlf_catalog 'clg-paimon-xxxxxxxxxxx'
comment 'DLF2.0 paimon catalog';
-- Query data.
SELECT * FROM ext_database_dlf2.dlf_db.paimon_table;
DLF 1.0
-- Create an external database.
CREATE EXTERNAL DATABASE ext_database_dlf1 WITH
metastore_type 'dlf'
dlf_region 'cn-beijing'
dlf_endpoint 'dlf-share.cn-beijing.aliyuncs.com'
dlf_catalog 'hive_catalog'
oss_endpoint 'oss-cn-beijing-internal.aliyuncs.com'
comment 'DLF1.0 hive catalog';
-- Create user mappings for the current user.
CREATE USER MAPPING FOR current_user
EXTERNAL DATABASE ext_database_dlf1
OPTIONS
(
-- For DLF 1.0
dlf_access_id 'LTxxxxxxxxxx',
dlf_access_key 'y8xxxxxxxxxxxxx',
oss_access_id 'LTxxxxxxxxxx',
oss_access_key 'y8xxxxxxxxxxxxx'
);
-- Query data.
SELECT * FROM ext_database_dlf1.dlf_db.hive_table;