Use Paimon in Hive - E-MapReduce - Alibaba Cloud Documentation Center

E-MapReduce (EMR) allows you to query data of Paimon in Hive. This topic describes how to query data of Paimon in Hive.

Prerequisites

A DataLake or custom cluster that contains the Hive and Paimon services is created. For more information about how to create a cluster, see Create a cluster.

Limits

Only clusters of EMR V3.46.0 or a later minor version, or clusters of EMR V5.12.0 or a later minor version allow you to query data of Paimon in Hive.

Procedure

Query data from tables that are created in a Hive catalog and a Data Lake Formation (DLF) catalog.
After you synchronize metadata from other services to Hive Metastore by using a Hive catalog, you can query data from the tables in the Hive catalog. If you select DLF Unified Metadata for Metadata when you create an EMR cluster, you can use a DLF catalog to synchronize metadata from other services to DLF and query the metadata in Hive.
The following example shows how to use Spark to write data to a Hive catalog and query data of Paimon in Hive.
1. Run the following command to start Spark SQL:
```
spark-sql --conf spark.sql.catalog.paimon=org.apache.paimon.spark.SparkCatalog --conf spark.sql.catalog.paimon.metastore=hive --conf spark.sql.catalog.paimon.uri=thrift://master-1-1:9083 --conf spark.sql.catalog.paimon.warehouse=oss://<yourBucketName>/warehouse
```
  Note
  - spark.sql.catalog.paimon: defines a catalog named paimon.
  - spark.sql.catalog.paimon.metastore: specifies the metadata storage type used by the catalog. If you set this parameter to hive, metadata is synchronized to Hive Metastore.
  - spark.sql.catalog.paimon.uri: specifies the address and port number of Hive Metastore. If you set this parameter to thrift://master-1-1:9083, the Spark SQL client connects to Hive Metastore that runs on the master-1-1 node and whose listening port is 9083 to obtain metadata information.
  - spark.sql.catalog.paimon.warehouse: specifies the actual location of the data warehouse. Configure this parameter based on your business requirements.
2. Execute the following Spark SQL statements to create a Paimon table in the created catalog and write data to the table:
```
-- Switch to the paimon catalog.
USE paimon;

-- Create a test database in the created catalog and use the database. 
CREATE DATABASE test_db;
USE test_db;

-- Create a Paimon table. 
CREATE TABLE test_tbl (
    uuid int,
    name string,
    price double
) TBLPROPERTIES (
    'primary-key' = 'uuid'
);

-- Write data to the Paimon table. 
INSERT INTO test_tbl VALUES (1, 'apple', 3.5), (2, 'banana', 4.0), (3, 'cherry', 20.5);
```
3. Run the following command to start the Hive CLI:
```
hive
```
4. Execute the following Hive SQL statement to query the data that is written to the Paimon table:
```
select * from test_db.test_tbl;
```
Create an external table and query data from the external table.
Hive allows you to create an external table that maps to the Paimon table in a specific path and query data from the external table. Sample code:
```
CREATE EXTERNAL TABLE test_ext_tbl
STORED BY 'org.apache.paimon.hive.PaimonStorageHandler'
LOCATION 'oss:// <yourBucketName>/warehouse/test_db.db/test_tbl';

SELECT * FROM test_ext_tbl;
```