E-MapReduce (EMR) allows you to query data of Paimon in Trino. This topic describes how to query data of Paimon in Trino.
Prerequisites
A DataLake or custom cluster that contains the Trino and Paimon services is created. For more information about how to create a cluster, see Create a cluster.
Limits
Only clusters of EMR V3.46.0 or a later minor version, or EMR V5.12.0 or a later minor version allow you to query data of Paimon in Trino.
Procedure
Modify the warehouse parameter.
Paimon stores data and metadata in a file system such as Hadoop Distributed File System (HDFS) or an object storage system such as OSS-HDFS. The root path for storage is specified by the warehouse parameter.
Go to the Configure tab of the Trino service page.
Log on to the EMR console. In the left-side navigation pane, click EMR on ECS.
In the top navigation bar, select the region in which your cluster resides and select a resource group based on your business requirements.
On the EMR on ECS page, find the desired cluster and click Services in the Actions column.
On the Services tab, find the Trino service and click Configure.
On the Configure tab, click the paimon.properties tab.
Change the value of the warehouse parameter to the root path for storage.
Save the configuration.
Click Save.
In the dialog box that appears, configure the Execution Reason parameter and click Save.
Optional. Modify the metastore parameter.
The type of Metastore that is used by Trino is automatically specified based on the services that you selected when you create the cluster. If you want to change the type of Metastore that is used by Trino, you can change the value of the metastore parameter on the paimon.properties tab of the Configure tab on the Trino service page.
Valid values of the metastore parameter for Paimon:
filesystem: Metadata is stored in a file system or an object storage system.
hive: Metadata is synchronized to the specified Hive Metastore.
dlf: Metadata is synchronized to Data Lake Formation (DLF).
Restart the Trino service.
In the upper-right corner of the Configure tab on the Trino service page, choose More > Restart.
In the dialog box that appears, configure the Execution Reason parameter and click OK.
In the Confirm message, click OK.
Query data of Paimon.
The following example shows how to use Spark to write data to a file system catalog and query data of Paimon in Trino.
Run the following command to start Spark SQL:
spark-sql --conf spark.sql.catalog.paimon=org.apache.paimon.spark.SparkCatalog --conf spark.sql.catalog.paimon.metastore=filesystem --conf spark.sql.catalog.paimon.warehouse=oss://<Domain name of OSS-HDFS>/warehouse
Notespark.sql.catalog.paimon
: defines a catalog named paimon.spark.sql.catalog.paimon.metastore
: specifies the metadata storage type used by the catalog. If you set this parameter tofilesystem
, metadata is stored in your on-premises file system.spark.sql.catalog.paimon.warehouse
: specifies the actual location of the data warehouse. Configure this parameter based on your business requirements.
Execute the following Spark SQL statements to create a Paimon table in the created catalog and write data to the table:
-- Switch to the paimon catalog. USE paimon; -- Create a test database in the created catalog and use the database. CREATE DATABASE test_db; USE test_db; -- Create a Paimon table. CREATE TABLE test_tbl ( uuid int, name string, price double ) TBLPROPERTIES ( 'primary-key' = 'uuid' ); -- Write data to the Paimon table. INSERT INTO test_tbl VALUES (1, 'apple', 3.5), (2, 'banana', 4.0), (3, 'cherry', 20.5);
Run the following command to start Trino:
trino --server master-1-1:9090 --catalog paimon --schema default --user hadoop
Execute the following statements to query the data that is written to the Paimon table:
USE test_db; SELECT * FROM test_tbl;