This topic describes how to use Impala on an E-MapReduce (EMR) cluster to query data stored in OSS-HDFS.
Prerequisites
A cluster of EMR V3.42.0 or later, or EMR V5.8.0 or later is created, with the Impala service selected. For more information, see Create a cluster.
OSS-HDFS is enabled for a bucket and access permissions on OSS-HDFS are granted. For more information about how to enable OSS-HDFS, see Enable OSS-HDFS and grant access permissions.
Procedure
Log on to the EMR cluster.
Log on to the EMR console. In the left-side navigation pane, click EMR on ECS.
Click the EMR cluster that you created.
Click the Nodes tab, and then click on the left side of the node group.
Click the ID of the ECS instance. On the Instances page, click Connect next to the instance ID.
For more information about how to log on to a cluster in Windows or Linux by using an SSH key pair or SSH password, see Log on to a cluster.
Run the following command on the terminal to connect to Impala.
For more information, see Connect to Impala.
impala-shell -i core-1-1
Use Impala to query data stored in OSS-HDFS.
Create a database.
CREATE DATABASE store LOCATION 'oss://<bucket-name>.<endpoint>/impala';
Use the database.
USE store;
Create a table.
CREATE EXTERNAL TABLE customer_demographics ( `cd_demo_sk` INT, `cd_gender` STRING, `cd_marital_status` STRING, `cd_education_status` STRING, `cd_purchase_estimate` INT, `cd_credit_rating` STRING, `cd_dep_count` INT, `cd_dep_employed_count` INT, `cd_dep_college_count` INT) STORED AS PARQUET;
Insert data into the table.
INSERT INTO customer_demographics VALUES (1, 'Male', 'Single', 'Graduate', 1000, 'AAA', 2, 1, 1), (2, 'Female', 'Married', 'Undergraduate', 2000, 'BBB', 3, 2, 2);
Query data in the table.
select * from customer_demographics;