This topic describes how to use Trino on an E-MapReduce (EMR) cluster to query data stored in OSS-HDFS.
Prerequisites
A cluster of EMR V3.42.0 or later, or EMR V5.8.0 or later is created, with the Trino service selected. For more information, see Create a cluster.
OSS-HDFS is enabled for a bucket and access permissions on OSS-HDFS are granted. For more information about how to enable OSS-HDFS, see Enable OSS-HDFS and grant access permissions.
Procedure
Log on to the E-MapReduce console. In the left-side navigation pane, click EMR on ECS and create an EMR cluster.
When you create the EMR cluster, make sure that you set Product Version to EMR-3.46.2 or later or EMR-5.12.2 or later and Root Storage Directory of Cluster to a bucket for which OSS-HDFS is enabled. Use the default values for other parameters. For more information, see Create a cluster.
Query data stored in OSS-HDFS.
Log on to the Trino server.
Use the following steps to obtain the address and port of the Trino server: Go to EMR on ECS page, click the name of the cluster, and click Services > Trino > Configure.
trino --server <Trino_server_address>:<Trino_server_port> --catalog
Create a schema for OSS.
create schema testDB with (location='oss://<Bucket>.<Endpoint>/<schema_dir>');
Use the schema.
use testDB;
Create a table.
create table tbl (key int, val int);
Insert data into the table.
insert into tbl values (1,666);
Query data in the table.
select * from tbl;