All Products
Search
Document Center

E-MapReduce:Delta Lake data source

Last Updated:Feb 19, 2024

A Delta Lake catalog is an external catalog. You can use a Delta Lake catalog to query data in Delta Lake. This topic describes how to create a Delta Lake catalog in an E-MapReduce (EMR) StarRocks cluster and use the Delta Lake catalog to query data in Delta Lake.

Prerequisites

  • A cluster that contains the Delta Lake service, such as a DataLake cluster or a custom cluster, is created. For more information, see Create a cluster.

  • A StarRocks cluster is created. For more information, see Create a StarRocks cluster.

Create a Delta Lake catalog

Syntax

CREATE EXTERNAL CATALOG <catalog_name>
PROPERTIES
( 
  "key"="value", 
  ...
);

Parameter description

  • catalog_name: the name of the Delta Lake catalog. This parameter is required. The name must meet the following requirements:

    • The name can contain letters, digits, and underscores (_). It must start with a letter.

    • The name must be 1 to 64 characters in length.

  • PROPERTIES: the properties of the Delta Lake catalog. This parameter is required. The configurations of this parameter vary based on the metadata service that is used by the Delta Lake data source. The following information describes the properties that you can configure for different metadata services:

    • Hive Metastore

      Property

      Required

      Description

      type

      Yes

      The type of the data source. Set the value to deltalake.

      hive.metastore.uris

      Yes

      The URI of the Hive metastore. Specify the value in the following format: thrift://<IP address of the Hive metastore>:<Port number>. The default port number is 9083.

    • DLF

      For more information, see Access external tables whose metadata is stored in DLF.

Example

Run the following command to create a Delta Lake catalog named delta_catalog:

CREATE EXTERNAL CATALOG delta_catalog
PROPERTIES
(
    "type" = "deltalake",
    "hive.metastore.uris" = "thrift://xx.xx.xx.xx:9083"
);

Query data in a Delta Lake table

You can execute the following statement to query data in a specific table of a database:

SELECT * FROM <catalog_name>.<database_name>.<table_name>;

References

For more information about Delta Lake, see Overview.