×
Community Blog Enable Hive to Write and Read Data from Alibaba Cloud Elasticsearch using ES-Hadoop

Enable Hive to Write and Read Data from Alibaba Cloud Elasticsearch using ES-Hadoop

In this guide, we'll dive deep into leveraging ES-Hadoop to enable Hive to write data to and read from Alibaba Cloud Elasticsearch, transforming your data analytics operations.

Elasticsearch and Hadoop are powerhouse technologies that have revolutionized data storage, processing, and analytics. When combined, especially in the versatile environment of Alibaba Cloud, they unlock incredible potentials for handling big data tasks. In this guide, we'll dive deep into leveraging ES-Hadoop to enable Hive to write data to and read from Alibaba Cloud Elasticsearch, transforming your data analytics operations.

Integrating Hive with Alibaba Cloud Elasticsearch

Elasticsearch-Hadoop (ES-Hadoop) is an open-source tool developed to bridge the gap between Elasticsearch and the Hadoop ecosystem. This integration not only accelerates query responses but also provides a scalable architecture for real-time analytics.

Before you embark on this integration, ensure you have an Alibaba Cloud account and familiarize yourself with their Elasticsearch services (learn more here). Let’s explore how to set up this powerhouse duo to supercharge your data analytics workflow.

Prerequisites

  • A running Alibaba Cloud Elasticsearch cluster.
  • An E-MapReduce (EMR) cluster within the same VPC.

Procedure

Step 1: Prepare Your Environment

Disable Auto Indexing in your Elasticsearch cluster to ensure optimal mapping configurations. Create an index with specified mappings. Consider the following example:

PUT company
{
  "mappings": {
    "_doc": {
      "properties": {
        "id": {"type": "long"},
        "name": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "birth": {"type": "text"},
        "addr": {"type": "text"}
      }
    }
  },
  "settings": {
    "index": {
      "number_of_shards": 5,
      "number_of_replicas": 1
    }
  }
}

Create an EMR cluster in the same VPC as your Elasticsearch setup to ensure seamless connectivity and data transfer.

Step 2: Upload the ES-Hadoop JAR

Obtain the compatible ES-Hadoop package and upload it to HDFS:

hadoop fs -mkdir /tmp/hadoop-es
hadoop fs -put elasticsearch-hadoop-hive-x.x.x.jar /tmp/hadoop-es

Replace x.x.x with the correct version number corresponding to your Elasticsearch version.

Step 3: Creating a Hive External Table

Set up a Hive external table and map its fields to the Elasticsearch index fields:

add jar hdfs:///tmp/hadoop-es/elasticsearch-hadoop-hive-x.x.x.jar;

CREATE EXTERNAL table IF NOT EXISTS company( 
   id BIGINT,
   name STRING,
   birth STRING,
   addr STRING 
)  
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler' 
TBLPROPERTIES(  
    'es.nodes' = 'http://es-cn-xxxxxx.elasticsearch.aliyuncs.com',
    'es.port' = '9200',
    'es.net.ssl' = 'true', 
    'es.nodes.wan.only' = 'true', 
    ...
);

Step 4: Writing and Reading Data

Write data to the index using HiveSQL:

INSERT INTO TABLE company VALUES (1, "zhangsan", "1990-01-01","No.969, WenyiXi Rd, Yuhang, Hangzhou");

Read data from the index:

1SELECT * FROM company;

The integration of Hive with Alibaba Cloud Elasticsearch via ES-Hadoop creates a robust environment for processing and analyzing big data. This setup not only enhances data insights but also optimizes storage and query efficiency.


Conclusion

Integrating Hive with Alibaba Cloud Elasticsearch offers a streamlined pathway for real-time data analytics. Alibaba Cloud provides a comprehensive and scalable platform for your Elasticsearch needs. The synergy between Elasticsearch, Hadoop, and Hive presents a formidable framework for handling large datasets, enabling advanced analytics that drive informed business decisions.
Ready to start your journey with Elasticsearch on Alibaba Cloud? Explore our tailored Cloud solutions and services to take the first step towards transforming your data into a visual masterpiece.

Embark on Your 30-Day Free Trial

0 1 0
Share on

Data Geek

97 posts | 4 followers

You may also like

Comments

Data Geek

97 posts | 4 followers

Related Products