Clusters of E-MapReduce (EMR) V5.8.0 or a later minor version (StarRocks 2.3 or later) allow you to query external tables whose metadata is stored in Data Lake Formation (DLF). This topic describes how to access external tables whose metadata is stored in DLF.
Prerequisites
An online analytical processing (OLAP) or custom cluster of EMR V5.8.0 or a later minor version is created, and the StarRocks service is selected for the cluster. For more information, see Create a cluster.
Precautions
This topic applies only to Hive, Hudi, Iceberg, and Delta Lake data sources.
Procedure
Log on to the StarRocks cluster in SSH mode. For more information, see Log on to a cluster.
Run the following command to connect to the StarRocks cluster:
mysql -h127.0.0.1 -P 9030 -uroot
Execute the following statement to create an external catalog.
In this example, a Hive catalog is created.
CREATE EXTERNAL CATALOG hive_catalog properties ( "type" = "hive", "hive.metastore.type" = "DLF" );
Parameter
Required
Description
type
Yes
The type of the data source. Valid values: hive, hudi, iceberg, and deltalake. In this example, set the value to hive.
dlf.catalog.id
No
The ID of the DLF catalog from which you want to read data. If you do not configure this parameter, the ID of the default DLF catalog is used.
hive.metastore.type
Yes
The type of the metastore. Set the value to DLF.
Query data.
Execute the following statement to query the databases in a specified catalog:
SHOW DATABASES FROM hive_catalog;
Execute the following statement to specify the database on which the current session takes effect:
USE hive_catalog.default;
Execute the following statement to query data in a specified table:
SELECT * FROM <table_name>;
References
For more information about how to query data in tables that use the Hive metastore, see Hive data source, Iceberg data source, Hudi data source, and Delta Lake data source.