E-MapReduce (EMR) allows you to query data of Paimon in Hive. This topic describes how to query data of Paimon in Hive.
Prerequisites
A DataLake or custom cluster that contains the Hive and Paimon services is created. For more information about how to create a cluster, see Create a cluster.
Limits
Only clusters of EMR V3.46.0 or a later minor version, or clusters of EMR V5.12.0 or a later minor version allow you to query data of Paimon in Hive.
Procedure
Query data from tables that are created in a Hive catalog and a Data Lake Formation (DLF) catalog.
After you synchronize metadata from other services to Hive Metastore by using a Hive catalog, you can query data from the tables in the Hive catalog. If you select DLF Unified Metadata for Metadata when you create an EMR cluster, you can use a DLF catalog to synchronize metadata from other services to DLF and query the metadata in Hive.
The following example shows how to use Spark to write data to a Hive catalog and query data of Paimon in Hive.
Run the following command to start Spark SQL:
spark-sql --conf spark.sql.catalog.paimon=org.apache.paimon.spark.SparkCatalog --conf spark.sql.catalog.paimon.metastore=hive --conf spark.sql.catalog.paimon.uri=thrift://master-1-1:9083 --conf spark.sql.catalog.paimon.warehouse=oss://<yourBucketName>/warehouse
Notespark.sql.catalog.paimon
: defines a catalog named paimon.spark.sql.catalog.paimon.metastore
: specifies the metadata storage type used by the catalog. If you set this parameter tohive
, metadata is synchronized to Hive Metastore.spark.sql.catalog.paimon.uri
: specifies the address and port number of Hive Metastore. If you set this parameter tothrift://master-1-1:9083
, the Spark SQL client connects to Hive Metastore that runs on themaster-1-1
node and whose listening port is 9083 to obtain metadata information.spark.sql.catalog.paimon.warehouse
: specifies the actual location of the data warehouse. Configure this parameter based on your business requirements.
Execute the following Spark SQL statements to create a Paimon table in the created catalog and write data to the table:
-- Switch to the paimon catalog. USE paimon; -- Create a test database in the created catalog and use the database. CREATE DATABASE test_db; USE test_db; -- Create a Paimon table. CREATE TABLE test_tbl ( uuid int, name string, price double ) TBLPROPERTIES ( 'primary-key' = 'uuid' ); -- Write data to the Paimon table. INSERT INTO test_tbl VALUES (1, 'apple', 3.5), (2, 'banana', 4.0), (3, 'cherry', 20.5);
Run the following command to start the Hive CLI:
hive
Execute the following Hive SQL statement to query the data that is written to the Paimon table:
select * from test_db.test_tbl;
Create an external table and query data from the external table.
Hive allows you to create an external table that maps to the Paimon table in a specific path and query data from the external table. Sample code:
CREATE EXTERNAL TABLE test_ext_tbl STORED BY 'org.apache.paimon.hive.PaimonStorageHandler' LOCATION 'oss:// <yourBucketName>/warehouse/test_db.db/test_tbl'; SELECT * FROM test_ext_tbl;