All Products
Search
Document Center

Object Storage Service:Use Trino on an EMR cluster to query data stored in OSS-HDFS

Last Updated:Sep 23, 2024

This topic describes how to use Trino on an E-MapReduce (EMR) cluster to query data stored in OSS-HDFS.

Prerequisites

  • A cluster of EMR V3.42.0 or later, or EMR V5.8.0 or later is created, with the Trino service selected. For more information, see Create a cluster.

  • OSS-HDFS is enabled for a bucket and access permissions on OSS-HDFS are granted. For more information about how to enable OSS-HDFS, see Enable OSS-HDFS and grant access permissions.

Procedure

  1. Log on to the E-MapReduce console. In the left-side navigation pane, click EMR on ECS and create an EMR cluster.

    When you create the EMR cluster, make sure that you set Product Version to EMR-3.46.2 or later or EMR-5.12.2 or later and Root Storage Directory of Cluster to a bucket for which OSS-HDFS is enabled. Use the default values for other parameters. For more information, see Create a cluster.

  2. Query data stored in OSS-HDFS.

    1. Log on to the Trino server.

      Use the following steps to obtain the address and port of the Trino server: Go to EMR on ECS page, click the name of the cluster, and click Services > Trino > Configure.

      trino --server <Trino_server_address>:<Trino_server_port> --catalog
    2. Create a schema for OSS.

      create schema testDB with (location='oss://<Bucket>.<Endpoint>/<schema_dir>');
    3. Use the schema.

      use testDB;
    4. Create a table.

      create table tbl (key int, val int);
    5. Insert data into the table.

      insert into tbl values (1,666);
    6. Query data in the table.

      select * from tbl;