All Products
Search
Document Center

E-MapReduce:Access external tables whose metadata is stored in DLF

Last Updated:Mar 12, 2024

Clusters of E-MapReduce (EMR) V5.8.0 or a later minor version (StarRocks 2.3 or later) allow you to query external tables whose metadata is stored in Data Lake Formation (DLF). This topic describes how to access external tables whose metadata is stored in DLF.

Prerequisites

An online analytical processing (OLAP) or custom cluster of EMR V5.8.0 or a later minor version is created, and the StarRocks service is selected for the cluster. For more information, see Create a cluster.

Precautions

This topic applies only to Hive, Hudi, Iceberg, and Delta Lake data sources.

Procedure

  1. Log on to the StarRocks cluster in SSH mode. For more information, see Log on to a cluster.

  2. Run the following command to connect to the StarRocks cluster:

    mysql -h127.0.0.1 -P 9030 -uroot
  3. Execute the following statement to create an external catalog.

    In this example, a Hive catalog is created.

    CREATE EXTERNAL CATALOG hive_catalog
    properties
    (
        "type" = "hive",
        "hive.metastore.type" = "DLF"
    );

    Parameter

    Required

    Description

    type

    Yes

    The type of the data source. Valid values: hive, hudi, iceberg, and deltalake. In this example, set the value to hive.

    dlf.catalog.id

    No

    The ID of the DLF catalog from which you want to read data. If you do not configure this parameter, the ID of the default DLF catalog is used.

    hive.metastore.type

    Yes

    The type of the metastore. Set the value to DLF.

  4. Query data.

    • Execute the following statement to query the databases in a specified catalog:

      SHOW DATABASES FROM hive_catalog;
    • Execute the following statement to specify the database on which the current session takes effect:

      USE hive_catalog.default;
    • Execute the following statement to query data in a specified table:

      SELECT * FROM <table_name>;

References

For more information about how to query data in tables that use the Hive metastore, see Hive data source, Iceberg data source, Hudi data source, and Delta Lake data source.